GenomeComb

Multicompar

Format

cg multicompar ?options? multicomparfile sampledir/varfile ...

Summary

Compare multiple variant files

Description

This command combines multiple variant files into one multicompar file; a tab separated file containing a wide table used to compare variants between different samples. Each line in the table contains general variant info (position, type, annotation, etc.) and columns with variant info specific to each sample (genotype, coverage, etc.). The latter have a column heading of the form field-<sample>.

The <sample> name used is extracted from the filename of the variant files by removing the file extension and the start of the filename up to and including the first -. (Variant files are expected to use the following convention: var-<sample>.tsv.) <sample> can be simply the samplename, but may also include information about e.g. the sequencing or analysis method in the form method-samplename, e.g. var-gatk-sample1.tsv and var-sam-sample1.tsv to indicate variants called using gatk and samtools respectively for the same sample (sample1), thus allowing comparison of different methods, etc.

When some samples contain a variant that others do not, the information in the multicompar file cannot be complete by combining variant files alone: A variant missing in one variant file may either mean that it is reference or that it is not sequenced. Other information about coverage, etc. on the variant not present in the variant file will also be missing. multicompar will indicate this missing information with a questing mark in the table; this missing information can be added by using the -reannot option or the multicompar_reannot command, if it is available in other files. (check cg multicompar_reannot help for more info)

Arguments

multicomparfile
resultfile, will be created if it does not exist
sampledir/varfile
directory or file containing variants of new sample to be added More than one can added in one command

Options

-reannot
Also do reannotation (see cg multicompar_reannot)
-reannotregonly
Also do reannotation, but only region data (sequenced) will be updated
-targetvarsfile variantfile
All variants in variantfile will be in the final multicompar, even if they are not present in any of the samples. A column will be added to indicate for each variant if it was in the variantfile. The name of the column will be the part of the filename after the last dash. If variantfile contains a "name" column, the content of this will be used in the targets column instead of a 1.
-split
if 1 (default), multiple alternative alleles will be on a separate lines, treated mostly as a separate variant Use 0 for giving alternatives alleles on the same line, separated by comma in the alt field

Category

Variants