gatk based variant file
Variant files created by genomecomb using the GATK variant caller
are in the usual genomecomb tab-separated variant file format (tsv). besides the usual fields of a variant tsv
file, they (can) also contain the following fields:
- totalcoverage
- Total Depth, counting all reads (DP in vcf INFO)
- coverage
- Read Depth, counting only filtered reads used for calling,
and only from one samples (DP in vcf FORMAT)
- genoqual
- genotype quality, encoded as a phred quality
-10log_10p(genotype call is wrong)
- haploqual
- haplotype qualities, two phred qualities comma separated
- phased
- 1 if genotype is phased, 0 otherwise
- genotypes
- list of allele numbers (0 for ref, 1 for first alt, ...) (GT
in vcf) The alleles are comma (,) separated for phased genotypes (|
in vcf), and ";" separated for unphased (/ in vcf). In
contrast to alleleSeq1 and alleleSe2, this field allows for
ploidies other than 2
- alleledepth
- Allelic depths for the ref and alt alleles in the order
listed
- PL
- Normalized, Phred-scaled likelihoods for AA,AB,BB genotypes
where A=ref and B=alt; not applicable if site is not biallelic
- allelecount
- predicted allele count in genotypes, for each alt allele, in
the same order as listed
- frequency
- predicted allele frequency for each alt allele in the same
order as listed
- totalallelecount
- predicted total number of alleles in called genotypes
- BaseQRankSum
- Phred-scaled p-value From Wilcoxon Rank Sum Test of Alt Vs.
Ref base qualities
- DS
- Were any of the samples downsampled?
- Dels
- Fraction of Reads Containing Spanning Deletions
- FS
- Phred-scaled p-value using Fisher's exact test to detect
strand bias
- HaplotypeScore
- Consistency of the site with at most two segregating
haplotypes
- InbreedingCoeff
- Inbreeding coefficient as estimated from the genotype
likelihoods per-sample when compared against the Hardy-Weinberg
expectation
- MLEAC
- Maximum likelihood expectation (MLE) for the allele counts
(not necessarily the same as the AC), for each ALT allele, in the
same order as listed
- MLEAF
- Maximum likelihood expectation (MLE) for the allele frequency
(not necessarily the same as the AF), for each ALT allele, in the
same order as listed
- MQ
- RMS Mapping Quality
- MQ0
- Total Mapping Quality Zero Reads
- MQRankSum
- Z-score From Wilcoxon rank sum test of Alt vs. Ref read
mapping qualities
- NDA
- Number of alternate alleles discovered (but not necessarily
genotyped) at this site
- QD
- Variant Confidence/Quality by Depth
- RPA
- Number of times tandem repeat unit is repeated, for each
allele (including reference)
- RU
- Tandem repeat unit (bases)
- ReadPosRankSum
- Z-score from Wilcoxon rank sum test of Alt vs. Ref read
position bias
- STR
- Variant is a short tandem repeat
- cluster
- variant is in a region containing many closeby variant calls