Process_multicompar
Format
cg process_multicompar ?options? projectdir ?dbdir?
Summary
process a sequencing project directory. This expects a genomecomb
directory with samples already processed and makes annotated
multicompar data
Description
This command runs the multicomparison step of cg process_project. As input, the
command expects a basic genomecomb project directory with sequencing
data (projectdir) for which the samples have already been processed.
Some of the steps/commands it uses can be used separately as well:
Arguments
- projectdir
- project directory with illumina data for different samples,
each sample in a sub directory. The proc will search for fastq
files in dir/samplename/fastq/
- dbdir
- directory containing reference data (genome sequence,
annotation, ...). dbdir can also be given in a projectinfo.tsv file
in the project directory. process_project called with the dbdir
parameter will create the projectinfo.tsv file.
Options
- -split 1/0
- split multiple alternative genotypes over different line
- -dbdir dbdir
- dbdir can also be given as an option (instead of
second parameter)
- -dbfile file
- Use file for extra (files in dbdir are already
used) annotation
- -targetvarsfile file
- Use this option to easily check certain target
positions/variants in the multicompar. The variants in file
will allways be added in the final multicompar file, even if none
of the samples is variant (or even sequenced) in it.
- -m maxopenfiles (-maxopenfiles)
- The number of files that a program can keep open at the same
time is limited. pmulticompar will distribute the subtasks thus,
that the number of files open at the same time stays below this
number. With this option, the maximum number of open files can be
set manually (if the program e.g. does not deduce the proper limit,
or you want to affect the distribution).
- -varfiles files
- With this option you can limit the variant files to be added.
(default is to use all found in the project dir). They should be
given as a list of files in this option, so enclose in quotes. You
can still use * as a wildcard, as cg will resolve the wildcards
itself.
- -svfiles files
- With this option you can limit the structural variant files
to be added (default is to use all found in the project dir). They
should be given as a list of files in this option, so enclose in
quotes. You can still use * as a wildcard, as cg will resolve the
wildcards itself.
- -keepfields fieldlist
- Besides the obligatory fields, include only the fields in
fieldlist (space separated) in the output multicompar. Default is
to use all fields present in the file (*)
- -limitreg regionfile
- limit the variants and region multicompar files to the
regions given in regionfile. (Other results, such as
structural variants are not limited (yet))
- -distrreg
- annotation will be distributed in regions for parallel
processing. Possible options are 0: no distribution (also
empty) 1: default distribution schr or schromosome: each
chromosome processed separately chr or chromosome: each
chromosome processed separately, except the unsorted, etc. with a _
in the name that will be combined), a number: distribution into
regions of this size a number preceded by an s: distribution
into regions targeting the given size, but breaks can only occur in
unsequenced regions of the genome (N stretches) a number
preceded by an r: distribution into regions targeting the given
size, but breaks can only occur in large (>=100000 bases) repeat
regions a number preceded by an g: distribution into regions
targeting the given size, but breaks can only occur in large
(>=200000 bases) regions without known genes a file name:
the regions in the file will be used for distribution
This command can be distributed on a cluster or using multiple
with job options (more info with cg
help joboptions)
Category
Process