collapsealleles
Format
cg collapsealleles ?options? ?file? ?outfile?
Summary
Convert a split variant file to unsplit by collapsing alleles of
the same variant into one line.
Description
In a split tsv variant file, multiple
alternative alleles of the same variant are on different lines. cg
collapsealleles can join these lines to create an unsplit,
multiallelic variant file. If multiple lines are merged, most of the
fields will contain a list (comma separated) of the values for that
field on the original separate lines (one element per alternative
allele). If they all have the same value, that single value is put in
the result field instead of the list
For some fields the correct value(s) are calculated based on the
merged allele lines.
- sequenced
- one sequenced code (v if one of the allelles is v, u if one
of them is u, ...)
- zyg
- one zyg code (e.g. m if one of the joined alleles was
homozygous)
- genotypes
- list of genotypes (same number as in the individual alleles)
Arguments
- file
- file to be converted (from stdin if not given)
- outfile
- file to be written (write to stdout if not given)
Options
- -duplicates method
- method tells how to deal with situation when there are
multiple lines that contain the same allele (from the same variant)
* keep: keep all (default), the list in alt field can contain
duplicate alleles * max field: pick the one that has the highest
value for the given field * min field: pick the one that has the
lowest value for the given field * first: pick the first one in the
file
Category
tsv