Joboptions

Format

cg command ?joboptions? ...

Summary

Several commands understand these job options

Description

Several genomecomb commands (indicated by "supports job options" in their help) consist of subtasks (jobs) that can be run in parallel. If such command is interupted, it can continue processing from the already processed data (completed subtasks) by running the same command. The --distribute or -d option can be used to select alternative processing methods.

direct run

Without a -d option (or -d direct) the command will run jobs sequentially, stopping when an error occurs in a job. It will skip any job where the targets were already completed (and the files it uses as input were not changed after this completion). While runing the command maintains a logfile containing one line with information for each job (such as job name, status, starttime, endtime, duration, target files), which can be useful to check when an error occurs. After a succesful run this logfile is removed unless you specifically specified -d direct or -d 1.

distributed run

An analysis can be run faster by running multiple jobs at the same time using multiple cores/threads on one machine (by giving the number of cores/threads that may be used to the -d option) or nodes on a cluster managed by the grid engine (option sge) or slurm (option slurm)

Some jobs depend on results of previous jobs, and have to wait for these to finish first, so the entire capacity is not allways filled. If an error occurs in a job, the program will still continue with other jobs. Subsequent jobs depending on the results of the failed job will also fail, but others may run succesfully. (In this way -d 1 is different from -d direct).

The command used to start the analysis will produce a logfile (with the extension .submitting) while submitting the jobs. Each job gets a line in the logfile, and also produces a number of small files (runscript, joblogfile, error output) in directories called log_jobs. (The first column in the logfile refers to their location.) These small files are normally removed after a successfull analysis (but not when there were errors, as they may be useful for debugging). When the command is finished submitting, the logfile is updated and gets the extension .running. When the run is completed, the logfile is again updated and gets the extension .finished if no jobs had errors, or .error if it did. You can update the logfile while jobs are running using the command:

cg job_update logfile

local distribution

Using -d number starts up to number jobs in separate processes managed by the submitting command. The submitting command must keep running untill all jobs are finished. it will output some progress, but you can get more info by updating and checking the log file.

grid engine distribution

There are two ways to distribute jobs on a sge cluster: Using "-d sge" the submitting command will submit all jobs to the Grid Engine as grid engine jobs immediately with proper dependencies between the jobs. The program itself will stop when all jobs are submitted. You can check progress/errors by updating and checking the logfile.

Some clusters limit the number of submitted jobs you can have. As the previous method may submit 1000s of jobs, this cannot be used on such cluster. You can use an alternative method by specifying the options -d <number> -dsubmit sge This method will submit a maximum of <number> jobs at a time, managing part of the job management itself. Using this method, the submit command will stay running until all jobs are finished.

Some tools request multiple cores/slots for one job on a grid engine cluster using the parallel environment (pe) local (using -pe local <nr of slots> -R y). This will only work if the pe local is properly defined (it often is, by convention, but not by default). If no pe local is defined, multiple slots are not requested and nodes may be overloaded. if the local pe is defined, but not in a way that allows for this, jobs may not run. The following settings (create with "qconf -ap local" or modify with "qconf -mp local") work on our cluster

pe_name            local
slots              9999999
user_lists         NONE
xuser_lists        NONE
start_proc_args    /bin/true
stop_proc_args     /bin/true
allocation_rule    $pe_slots
control_slaves     TRUE
job_is_first_task  TRUE
urgency_slots      min
accounting_summary FALSE

slurm distribution

Using "-d slurm" the submitting command will submit all jobs to slurm with proper dependencies between the jobs. The program itself will stop when all jobs are submitted. You can check progress/errors by updating and checking the logfile.

The alternate submision method (with a limit of concurrently submitted jobs) can also be used with slurm by specifying the options -d <number> -dsubmit slurm.

Options

List of all options supported by job enabled commands:

-d value (--distribute): distribute method, see in description (also -distribute)
-dsubmit value: alternate distribute method, see in description
-dpriority value: run jobs with the given priority (default is 0; for sge negative numbers decrease priority; for slurm this is given through the --nice option, which gives greater priority to negative values)
-dqueue value: Use this option if you want to run on another queue than the default all.q (for slurm this will specify the "partition(s)" / -p option to sbatch)
-dmaxmem value: limit total amount of (reserved) memory in local distribute to value: do not start new jobs until reserved memory (total of running jobs and new job) stays below this value (unless it would be the only job)
-submitoptions value: can be used to add any options to the submit command. The text in value will be added verbatim, so options must be specified as they would be given to the submit command of the job system used, e.g. for slurm -submitoptions '--account=test1'
-dcleanup success/never/allways: Allows you to keep all run/log files in the log_jobs dirs even if a jub is successfull (never), or to allways delete them (allways) instead of only deleting them after a succesfull run (success)
-dremove 0/1: remove logfiles from older runs (relevant info is included in new one, default 0)
-v number (--verbose): This general option has a big effect on job enabled commands: If it is > 0, they will output a lot of info (to stder) on dependencies, jobs started, etc.
-f 0/1 (--force): use -f 1 to rerun all jobs, even if some were actually already successfully finished (also -force)
--skipjoberrors 0/1: Errors in one job cause the entire command to stop (in direct mode); use --skipjoberrors 1 to skip over these errors and complete other jobs (as far as they are not dependent on the the job with an error) (also -skipjoberrors)
-logfile filename: use filename as base for logfile instead of the default.

When run on a cluster, it can be important to request enough memory/time. Genomecomb will by default request a larger memory or timeslot for larger jobs. It is however very difficult to get the right amount for all occasions (as it can be highly dependent on data). Many genomcomb tools have the options -mem and -time to increase the amount requested (vs the default). However not all, so the following two generic job options can change these requested amounts for any job:

-dmem: memory value (given as e.g. 2G for 2 gigbyte or 4M for 4 megabyte) or requested memory adjustment list
-dtime: time value (given as 2:00:00 for 2 hours) or time adjustment list If a single value is given it will be the default value for all jobs (unless a specific higher default would already be used). Using an adjustment list for these options allows changing the values for specific job(s). The list consists of, alternatingly, a pattern and a value. If the name of a job matches the pattern, the value is applied to that job. The pattern * matches any job name, and can thus be used to set a default at the start of the list (The last match in the list determines the actual value). The other patterns are used as regular expressions. for example, the following options

-dmem '* 2G map_bwa 10G' -dtime '*2:00:00 map_bwa 10:00:00'

will set the default requested memory to 2G and time to 2h, but for mapping with bwa 10G of memory will be re quested and 10h.

Home

Contact

Installation

Documentation