GenomeComb

Var

Format

cg var ?options? bamfile refseq ?resultfile?

Summary

generic command to call variants based on the alignment in bamfile.

Description

The command can be used to call variants using different methods. The methods are implemented in separate commands such as cg var_gatk; The method indicated using the -method option will be actually used. By default, the result will consist of several files with names derived from the bam file name: If the bamfile has a name of the form map-alignmethod-sample.bam, the result files will consist of file with names like these (depending on the varcaller used): varall-varcaller-alignmethod-sample.tsv.zst var-varcaller-alignmethod-sample.tsv.zst sreg-varcaller-alignmethod-sample.tsv.zst The optional argument resultfile can be given to use a a result filename. The names of other results files are based on this given filename.

cg var can distribute variant calling by analysing separate regions in parallel and combining the results (-distrreg option). This command can be distributed on a cluster or using multiple cores with job options (more info with cg help joboptions)

Arguments

bamfile
alignment on which to call variants
refseq
genomic reference sequence (must be the one used for the alignment)
resultfile
resulting variant file instead of default based on bam file name

Options

-method varcaller
use the given method (varcaller) to call variants. Default varcaller is gatkh for the GATK haplotyped genotype caller. Other possible values are: gatk (GATK unified genoptype caller), strelka, sam, bcf, freebayes, longshot, medaka
-regionfile regionfile
only call variants in the region(s) given in regionfile if the method used does not support -regionfile (e.g. longshot, medaka), this option will be ignored: the entire genome is called, and results will not be limited to the given regionfile
-regmincoverage mincoverage
If no regionfile is given, variants will be called only in regions with a coverage of at least mincoverage (default 3)
-distrreg
distribute regions for parallel processing (default s50000000). Possible options are 0: no distribution (also empty) 1: default distribution schr or schromosome: each chromosome processed separately chr or chromosome: each chromosome processed separately, except the unsorted, etc. with a _ in the name that will be combined), a number: distribution into regions of this size a number preceded by an s: distribution into regions targeting the given size, but breaks can only occur in unsequenced regions of the genome (N stretches) a number preceded by an r: distribution into regions targeting the given size, but breaks can only occur in large (>=100000 bases) repeat regions a number preceded by an g: distribution into regions targeting the given size, but breaks can only occur in large (>=200000 bases) regions without known genes a file name: the regions in the file will be used for distribution
-datatype datatype
Some variant callers (strelka) need to know the type of data (genome, exome or amplicons) for analysis. You can specify it using this option. If not given, it is deduced from acompanying region files (reg_*_amplicons.tsv for ampicons or reg_*_amplicons.tsv for exome)
-split 1/0
indicate if in the datafiles, multiple alternative genotypes are split over different lines (default 1)
-hap_bam 0/1
if the methods supports it (currently only longshot), also create a haplotyped bam
-t number (-threads)
number of threads used by each job (if the method supports threads)
-cleanup 0/1
Use 0 to no cleanup some of the temporary files (vcf, delvar, ...)
-opt options
These options are passed through to the underlying variant caller (default is empty)

Category

Variants