GenomeComb

collapsealleles

Format

cg collapsealleles ?options? ?file? ?outfile?

Summary

Convert a split variant file to unsplit by collapsing alleles of the same variant into one line.

Description

In a split tsv variant file, multiple alternative alleles of the same variant are on different lines. cg collapsealleles can join these lines to create an unsplit, multiallelic variant file. If multiple lines are merged, most of the fields will contain a list (comma separated) of the values for that field on the original separate lines (one element per alternative allele). If they all have the same value, that single value is put in the result field instead of the list

For some fields the correct value(s) are calculated based on the merged allele lines.

sequenced
one sequenced code (v if one of the allelles is v, u if one of them is u, ...)
zyg
one zyg code (e.g. m if one of the joined alleles was homozygous)
genotypes
list of genotypes (same number as in the individual alleles)

Arguments

file
file to be converted (from stdin if not given)
outfile
file to be written (write to stdout if not given)

Options

-duplicates method
method tells how to deal with situation when there are multiple lines that contain the same allele (from the same variant) * keep: keep all (default), the list in alt field can contain duplicate alleles * max field: pick the one that has the highest value for the given field * min field: pick the one that has the lowest value for the given field * first: pick the first one in the file

Category

tsv