Process_multicompar
Format
cg process_multicompar ?options? projectdir ?dbdir?
Summary
process a sequencing project directory. This expects a genomecomb
directory with samples already processed and makes annotated
multicompar data
Description
This command runs the multicomparison step of cg process_project. As input, the
command expects a basic genomecomb project directory with sequencing
data (projectdir) for which the samples have already been processed.
Some of the steps/commands it uses can be used separately as well:
Arguments
  - projectdir
 
  - project directory with illumina data for different samples,
  each sample in a sub directory. The proc will search for fastq
  files in dir/samplename/fastq/
 
  - dbdir
 
  - directory containing reference data (genome sequence,
  annotation, ...). dbdir can also be given in a projectinfo.tsv file
  in the project directory. process_project called with the dbdir
  parameter will create the projectinfo.tsv file.
 
Options
  - -split 1/0
 
  - split multiple alternative genotypes over different line
 
  - -dbdir dbdir
 
  - dbdir can also be given as an option (instead of
  second parameter)
 
  - -dbfile file
 
  - Use file for extra (files in dbdir are already
  used) annotation
 
  - -targetvarsfile file
 
  - Use this option to easily check certain target
  positions/variants in the multicompar. The variants in file
  will allways be added in the final multicompar file, even if none
  of the samples is variant (or even sequenced) in it.
 
  - -m maxopenfiles (-maxopenfiles)
 
  - The number of files that a program can keep open at the same
  time is limited. pmulticompar will distribute the subtasks thus,
  that the number of files open at the same time stays below this
  number. With this option, the maximum number of open files can be
  set manually (if the program e.g. does not deduce the proper limit,
  or you want to affect the distribution).
 
  - -varfiles files
 
  - With this option you can limit the variant files to be added.
  (default is to use all found in the project dir). They should be
  given as a list of files in this option, so enclose in quotes. You
  can still use * as a wildcard, as cg will resolve the wildcards
  itself.
 
  - -svfiles files
 
  - With this option you can limit the structural variant files
  to be added (default is to use all found in the project dir). They
  should be given as a list of files in this option, so enclose in
  quotes. You can still use * as a wildcard, as cg will resolve the
  wildcards itself.
 
  - -keepfields fieldlist
 
  - Besides the obligatory fields, include only the fields in
  fieldlist (space separated) in the output multicompar. Default is
  to use all fields present in the file (*)
 
  - -limitreg regionfile
 
  - limit the variants and region multicompar files to the
  regions given in regionfile. (Other results, such as
  structural variants are not limited (yet))
 
  - -distrreg
 
  - annotation will be distributed in regions for parallel
  processing. Possible options are  0: no distribution (also
  empty)  1: default distribution  schr or schromosome: each
  chromosome processed separately  chr or chromosome: each
  chromosome processed separately, except the unsorted, etc. with a _
  in the name that will be combined),  a number: distribution into
  regions of this size  a number preceded by an s: distribution
  into regions targeting the given size, but breaks can only occur in
  unsequenced regions of the genome (N stretches)  a number
  preceded by an r: distribution into regions targeting the given
  size, but breaks can only occur in large (>=100000 bases) repeat
  regions  a number preceded by an g: distribution into regions
  targeting the given size, but breaks can only occur in large
  (>=200000 bases) regions without known genes  g: distribution
  into regions as given in the <refdir>/extra/reg_*_distrg.tsv
  file; if this does not exist uses g5000000  a file name: the
  regions in the file will be used for distribution
 
This command can be distributed on a cluster or using multiple
with job options (more info with cg
help joboptions)
Category
Process