Running Condor jobs with large memory requirements

By default, Condor assigns each process you launch 1 GB of RAM. If your job grows too large, one of two things will happen.

  • Condor may evict it, causing it to return to the queue and stay in the idle ("I") state. This is more likely to happen when there are a lot of other jobs queued. The job may then get relaunched on another machine, but all of its progress up to that point will be lost.
  • If the combination of jobs on a machine exceeds the amount of available RAM and swap, the kernel out of memory killer will kill processes until memory becomes available.

Running jobs larger than 1 GB

If you have a job with processes that consume more than 1 GB of memory, you can tell Condor how much RAM they require by adding the require_memory keyword to your submit file. This value should be specified in megabytes.

Here's an example submit script for an executable called hugejob, which requires at least 7 GB of memory to run:

universe = vanilla
executable = hugejob
getenv = true
input = hugejob.in
output = hugejob.out
error = hugejob.err
log = hugejob.log
require_memory = 7*1024
queue
Edit | Attach | Print version | History: r10 < r9 < r8 < r7 < r6 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r8 - 2011-08-22 - 19:03:08 - brodbd
 

This site is powered by the TWiki collaboration platformCopyright & by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
Privacy Statement Terms & Conditions