Performance Tips

Tips on writing fast, efficient jobs. Feel free to add to this page if you have more good ideas!

General Advice

These are not condor-specific, but apply to any kind of high-performance computing job. Most of these were originally suggested by Brian High in a post to the UW techsupport list.

  • Keep I/O to a minimum - try to read and write each piece of data only once.
  • Don't unnecessarily load a bunch of data into RAM. Process it in small chunks, just tracking the minimum amount of information (statistics, state, etc.) in memory.
  • Identify what processing can be done independently. Parallelize that. Divide the work up as evenly as possible.
  • Precompile regular expressions. This can be a huge speedup in Perl and Python, especially if you're doing regexp matching in a loop. (Brian posted one simple code snippet that ran 25% faster with precompiled regexps.)
  • If your task requires doing data lookups, an SQL database will probably be faster than a flat text file. (We have an SQL server available in our cluster; contact linghelp@u for details on getting a database set up.)

Condor-specific Advice

  • If you can't avoid loading lots of data into RAM, make sure you tell Condor what your job's memory requirements are so it can match it with a large enough machine. A job running on a machine that's too small will run very inefficiently and may not finish at all.
  • Each new job that's queued takes about 30 seconds to be matched with a machine. Jobs that complete in less than a minute are likely to spend nearly as much time in the queue as they spend actually running; consider refactoring them to do more work in each job.
  • For the same amount of data, writing one large file is faster than writing a bunch of small ones. The bookkeeping required for creating new files creates a performance penalty.
  • See the PerformanceProblems page for specific things to avoid.
Topic revision: r1 - 2010-06-29 - 17:11:00 - brodbd

This site is powered by the TWiki collaboration platformCopyright & by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
Privacy Statement Terms & Conditions