Var
Format
cg var ?options? bamfile refseq ?resultfile?
Summary
generic command to call variants based on the alignment in
bamfile.
Description
The command can be used to call variants using different methods.
The methods are implemented in separate commands such as cg var_gatk;
The method indicated using the -method option will be actually used.
By default, the result will consist of several files with names
derived from the bam file name: If the bamfile has a name of the form
map-alignmethod-sample.bam, the result files will consist of file
with names like these (depending on the varcaller used):
varall-varcaller-alignmethod-sample.tsv.zst
var-varcaller-alignmethod-sample.tsv.zst
sreg-varcaller-alignmethod-sample.tsv.zst The optional argument
resultfile can be given to use a a result filename. The names of
other results files are based on this given filename.
cg var can distribute variant calling by analysing separate
regions in parallel and combining the results (-distrreg option).
This command can be distributed on a cluster or using multiple cores
with job options (more info with cg
help joboptions)
Arguments
- bamfile
- alignment on which to call variants
- refseq
- genomic reference sequence (must be the one used for the
alignment)
- resultfile
- resulting variant file instead of default based on bam file
name
Options
- -method varcaller
- use the given method (varcaller) to call variants.
Default varcaller is gatkh for the GATK haplotyped genotype caller.
Other possible values are: gatk (GATK unified genoptype caller),
strelka, sam, bcf, freebayes, longshot, medaka
- -regionfile regionfile
- only call variants in the region(s) given in regionfile if
the method used does not support -regionfile (e.g. longshot,
medaka), this option will be ignored: the entire genome is called,
and results will not be limited to the given regionfile
- -regmincoverage mincoverage
- If no regionfile is given, variants will be called only in
regions with a coverage of at least mincoverage (default 3)
- -distrreg
- distribute regions for parallel processing (default
s50000000). Possible options are 0: no distribution (also
empty) 1: default distribution schr or schromosome: each
chromosome processed separately chr or chromosome: each
chromosome processed separately, except the unsorted, etc. with a _
in the name that will be combined), a number: distribution into
regions of this size a number preceded by an s: distribution
into regions targeting the given size, but breaks can only occur in
unsequenced regions of the genome (N stretches) a number
preceded by an r: distribution into regions targeting the given
size, but breaks can only occur in large (>=100000 bases) repeat
regions a number preceded by an g: distribution into regions
targeting the given size, but breaks can only occur in large
(>=200000 bases) regions without known genes a file name:
the regions in the file will be used for distribution
- -datatype datatype
- Some variant callers (strelka) need to know the type of data
(genome, exome or amplicons) for analysis. You can specify it using
this option. If not given, it is deduced from acompanying region
files (reg_*_amplicons.tsv for ampicons or reg_*_amplicons.tsv for
exome)
- -split 1/0
- indicate if in the datafiles, multiple alternative genotypes
are split over different lines (default 1)
- -hap_bam 0/1
- if the methods supports it (currently only longshot), also
create a haplotyped bam
- -t number (-threads)
- number of threads used by each job (if the method supports
threads)
- -cleanup 0/1
- Use 0 to no cleanup some of the temporary files (vcf, delvar,
...)
- -opt options
- These options are passed through to the underlying variant
caller (default is empty)
Category
Variants