GenomeComb

Process_multicompar

Format

cg process_multicompar ?options? projectdir ?dbdir?

Summary

process a sequencing project directory. This expects a genomecomb directory with samples already processed and makes annotated multicompar data

Description

This command runs the multicomparison step of cg process_project. As input, the command expects a basic genomecomb project directory with sequencing data (projectdir) for which the samples have already been processed.

Some of the steps/commands it uses can be used separately as well:

Arguments

projectdir
project directory with illumina data for different samples, each sample in a sub directory. The proc will search for fastq files in dir/samplename/fastq/
dbdir
directory containing reference data (genome sequence, annotation, ...). dbdir can also be given in a projectinfo.tsv file in the project directory. process_project called with the dbdir parameter will create the projectinfo.tsv file.

Options

-split 1/0
split multiple alternative genotypes over different line
-dbdir dbdir
dbdir can also be given as an option (instead of second parameter)
-dbfile file
Use file for extra (files in dbdir are already used) annotation
-targetvarsfile file
Use this option to easily check certain target positions/variants in the multicompar. The variants in file will allways be added in the final multicompar file, even if none of the samples is variant (or even sequenced) in it.
-m maxopenfiles (-maxopenfiles)
The number of files that a program can keep open at the same time is limited. pmulticompar will distribute the subtasks thus, that the number of files open at the same time stays below this number. With this option, the maximum number of open files can be set manually (if the program e.g. does not deduce the proper limit, or you want to affect the distribution).
-varfiles files
With this option you can limit the variant files to be added. (default is to use all found in the project dir). They should be given as a list of files in this option, so enclose in quotes. You can still use * as a wildcard, as cg will resolve the wildcards itself.
-svfiles files
With this option you can limit the structural variant files to be added (default is to use all found in the project dir). They should be given as a list of files in this option, so enclose in quotes. You can still use * as a wildcard, as cg will resolve the wildcards itself.
-keepfields fieldlist
Besides the obligatory fields, include only the fields in fieldlist (space separated) in the output multicompar. Default is to use all fields present in the file (*)
-limitreg regionfile
limit the variants and region multicompar files to the regions given in regionfile. (Other results, such as structural variants are not limited (yet))
-distrreg
annotation will be distributed in regions for parallel processing. Possible options are 0: no distribution (also empty) 1: default distribution schr or schromosome: each chromosome processed separately chr or chromosome: each chromosome processed separately, except the unsorted, etc. with a _ in the name that will be combined), a number: distribution into regions of this size a number preceded by an s: distribution into regions targeting the given size, but breaks can only occur in unsequenced regions of the genome (N stretches) a number preceded by an r: distribution into regions targeting the given size, but breaks can only occur in large (>=100000 bases) repeat regions a number preceded by an g: distribution into regions targeting the given size, but breaks can only occur in large (>=200000 bases) regions without known genes a file name: the regions in the file will be used for distribution

This command can be distributed on a cluster or using multiple with job options (more info with cg help joboptions)

Category

Process