Headers
This text describes the fields in the example file (and most files
coming out of genomecomb analysis) shortly. More detail on the format
of the example files can be found in format_tsv.
Some of the fields apply to variant itself: the basic variant
fields describe the variant itself, the annotation fields give info
on the variant (location etc.).
Sample specific fields are given separately for each sample by
appending a dash and the sample name to the generic fieldname. They
specify data about the variant that can be different for each sample,
e.g. the genotype (a sample can have a reference, heterozygous,...
genotype for the variant), or sequencing quality score.
Basic variant fields
  - chromosome: chromosome number
 
  - begin: start position of variant (zero based)
 
  - end: end position of variant (zero based)
 
  - type: single nucleotide variants (snp), short
  insertions (ins), deletions (del) or substitutions (sub), or the
  combinations of two different variant types at heterozygous
  positions (e.g. del_snp).
 
  - ref: The reference allele at this position
 
  - alt: The alternative allele at this position
 
Sample specific fields for cg samples
  - sequenced-cg-cg-testNA19238chr2122cg: Position sequenced
  by CG/CG? ("u" = unsequenced, "v"=variant,
  "r"=reference)
 
  - zyg-cg-cg-testNA19238chr2122cg: zygosity according
  according to CG
 
  - alleleSeq1-cg-cg-testNA19238chr2122cg: First allele
  called in CG genome.
 
  - alleleSeq2-cg-cg-testNA19238chr2122cg: Second allele
  called in CG genome.
 
  - totalScore1-cg-cg-testNA19238chr2122cg: Variant score
  at first allele.
 
  - totalScore2-cg-cg-testNA19238chr2122cg: Variant score
  at second allele.
 
  - coverage-cg-cg-testNA19238chr2122cg: Coverage depth at
  this position by CG.
 
  - refscore-cg-cg-testNA19238chr2122cg: Reference score
  at this position.
 
  - refcons-cg-cg-testNA19238chr2122cg: in a poorly called
  region (accoring to CG)
 
  - cluster-cg-cg-testNA19238chr2122cg: Presence within a
  cluster of SNVs by CG
 
The same fields are present for NA19239 and NA19240
Sample specific fields for gatk analysis
  - sequenced-gatk-rdsbwa-testNA19240chr21il: sequenced by
  Illumina/GATK? ("u" = unsequenced, "v"=variant,
  "r"=reference)
 
  - zyg-gatk-rdsbwa-testNA19240chr21il: zygosity according
  to GATK (m=homozygous, t=heterozygous, c=compound, o=other,
  r=reference, u=unsequenced)
 
  - alleleSeq1-gatk-rdsbwa-testNA19240chr21il: First
  allele called in Illumina genome by GATK.
 
  - alleleSeq2-gatk-rdsbwa-testNA19240chr21il: Second
  allele called in Illumina genome by GATK.
 
  - quality-gatk-rdsbwa-testNA19240chr21il: Quality score
  for this position as called by GATK on Illumina genome.
 
  - phased-gatk-rdsbwa-testNA19240chr21il: order of
  genotypes in alleleSeq1 and alleleSeq2 is significant (phase is
  known)
 
  - genotypes-gatk-rdsbwa-testNA19240chr21il: list of
  genotypes, can contain more than 2
 
  - alleledepth_ref-gatk-rdsbwa-testNA19240chr21il: depth
  of reference allele
 
  - alleledepth-gatk-rdsbwa-testNA19240chr21il: depth of
  alternative alleles
 
  - coverage-gatk-rdsbwa-testNA19240chr21il: Coverage
  depth at this position by by Illumina/GATK.
 
  - genoqual-gatk-rdsbwa-testNA19240chr21il: quality of
  the genotypes
 
  - PL-gatk-rdsbwa-testNA19240chr21il: Normalized,
  Phred-scaled likelihoods for AA,AB,BB genotypes where A=ref and
  B=alt; not applicable if site is not biallelic
 
  - BaseQRankSum-gatk-rdsbwa-testNA19240chr21il:
  Phred-scaled p-value From Wilcoxon Rank Sum Test of Alt Vs. Ref
  base qualities
 
  - totalcoverage-gatk-rdsbwa-testNA19240chr21il: Total
  Depth, counting all reads (DP in vcf INFO)
 
  - DS-gatk-rdsbwa-testNA19240chr21il: Were any of the
  samples downsampled?
 
  - Dels-gatk-rdsbwa-testNA19240chr21il: Fraction of Reads
  Containing Spanning Deletions
 
  - FS-gatk-rdsbwa-testNA19240chr21il: Phred-scaled
  p-value using Fisher's exact test to detect strand bias
 
  - HaplotypeScore-gatk-rdsbwa-testNA19240chr21il:
  Consistency of the site with at most two segregating haplotypes
 
  - MQ-gatk-rdsbwa-testNA19240chr21il: RMS Mapping Quality
 
  - MQ0-gatk-rdsbwa-testNA19240chr21il: Total Mapping
  Quality Zero Reads
 
  - MQRankSum-gatk-rdsbwa-testNA19240chr21il: Z-score From
  Wilcoxon rank sum test of Alt vs. Ref read mapping qualities
 
  - QD-gatk-rdsbwa-testNA19240chr21il: Variant
  Confidence/Quality by Depth
 
  - ReadPosRankSum-gatk-rdsbwa-testNA19240chr21il: Z-score
  from Wilcoxon rank sum test of Alt vs. Ref read position bias
 
  - SOR-gatk-rdsbwa-testNA19240chr21il: Symmetric Odds
  Ratio of 2x2 contingency table to detect strand bias
 
  - cluster-gatk-rdsbwa-testNA19240chr21il: Presence
  within a cluster of SNVs by Illumina/GATK (if yes "cl",
  if no "")
 
Sample specific fields for samtools analysis
  - sequenced-sam-rdsbwa-testNA19240chr21il: sequenced by
  Illumina/samtools? ("u" = unsequenced,
  "v"=variant, "r"=reference)
 
  - zyg-sam-rdsbwa-testNA19240chr21il: zygosity according
  to samtools (m=homozygous, t=heterozygous, c=compound, o=other,
  r=reference, u=unsequenced/unspecified)
 
  - alleleSeq1-sam-rdsbwa-testNA19240chr21il: First allele
  called in Illumina genome by samtools.
 
  - alleleSeq2-sam-rdsbwa-testNA19240chr21il: Second
  allele called in Illumina genome by GATK.
 
  - quality-sam-rdsbwa-testNA19240chr21il: Quality score
  for this position as called by GATK on Illumina genome.
 
  - phased-sam-rdsbwa-testNA19240chr21il: order of
  genotypes in alleleSeq1 and alleleSeq2 is significant (phase is
  known)
 
  - genotypes-sam-rdsbwa-testNA19240chr21il: list of
  genotypes, can contain more than 2
 
  - genoqual-sam-rdsbwa-testNA19240chr21il: genotype
  quality
 
  - loglikelihood-sam-rdsbwa-testNA19240chr21il:
 
  - coverage-sam-rdsbwa-testNA19240chr21il: Coverage depth
  at this position by by Illumina/GATK.
 
  - DV-sam-rdsbwa-testNA19240chr21il: number of
  high-quality non-reference bases
 
  - SP-sam-rdsbwa-testNA19240chr21il: Phred-scaled strand
  bias P-value
 
  - PL-sam-rdsbwa-testNA19240chr21il: List of Phred-scaled
  genotype likelihoods
 
  - totalcoverage-sam-rdsbwa-testNA19240chr21il: Raw read
  depth (DP in vcf INFO); samtools does not count all mapping reads
  for totalcoverage (bad reads are filtered out)
 
  - DP4-sam-rdsbwa-testNA19240chr21il: number of
  high-quality ref-forward bases, ref-reverse, alt-forward and
  alt-reverse bases
 
  - MQ-sam-rdsbwa-testNA19240chr21il: Root-mean-square
  mapping quality of covering reads
 
  - FQ-sam-rdsbwa-testNA19240chr21il: Phred probability of
  all samples being the same
 
  - AF1-sam-rdsbwa-testNA19240chr21il: Max-likelihood
  estimate of the first ALT allele frequency (assuming HWE)
 
  - AC1-sam-rdsbwa-testNA19240chr21il: Max-likelihood
  estimate of the first ALT allele count (no HWE assumption)
 
  - IS-sam-rdsbwa-testNA19240chr21il: Maximum number of
  reads supporting an indel and fraction of indel reads
 
  - PV4-sam-rdsbwa-testNA19240chr21il: P-values for strand
  bias, baseQ bias, mapQ bias and tail distance bias
 
  - PC2-sam-rdsbwa-testNA19240chr21il: Phred probability
  of the nonRef allele frequency in group1 samples being larger
  (,smaller) than in group2.
 
  - QBD-sam-rdsbwa-testNA19240chr21il: Quality by Depth:
  QUAL/#reads
 
  - RPB-sam-rdsbwa-testNA19240chr21il: Read Position Bias
 
  - MDV-sam-rdsbwa-testNA19240chr21il: Maximum number of
  high-quality nonRef reads in samples
 
  - VDB-sam-rdsbwa-testNA19240chr21il: Variant Distance
  Bias (v2) for filtering splice-site artefacts in RNA-seq data.
  Note: this version may be broken.
 
  - cluster-sam-rdsbwa-testNA19240chr21il: Presence within
  a cluster of SNVs by Illumina/GATK (if yes "cl", if no
  "")
 
Annotation fields
  - intGene_impact: impact on gene transcripts (can be list)
  of intGene gene set (integration of refGene, knownGene, genecode,
  ensGene)
 
  - intGene_gene: gene name
 
  - intGene_descr: description of transcript and location
  of variant in transcript in several ways (including hgvs)
 
  - lincRNA_impact: impact on long non-coding genes
 
  - lincRNA_gene: long non-coding gene name
 
  - lincRNA_descr: description of transcript and location
  of variant in long non-coding transcript
 
  - refGene_impact: impact on refGene genes
 
  - refGene_gene: name of refGene genes
 
  - refGene_descr: list of refGene transcripts and
  description of location and effect of variant on the transcript
 
  - mirbase20_impact: impact on mirbase20 miRNA
 
  - mirbase20_mir: mirbase20 miRNA variant is located in
 
  - chainSelf: Presence in self-chained region (yes =
  "label from UCSC", no = "")
 
  - cytoBand: approximate location of bands seen on
  Giemsa-stained chromosomes.
 
  - dgvMerged: Database of Genomic Variants (Structural
  Var Regions)
 
  - evofold: conserved functional RNA structures based on
  predictions made with the EvoFold program
 
  - gad: Genetic Association Database
 
  - genomicSuperDups: Presence in segmental duplication
  (yes = "label from UCSC", no = "")
 
  - gwasCatalog_name: name single nucleotide polymorphisms
  (SNPs) identified by published GWAS
 
  - gwasCatalog_score: score for gwasCatalog SNPs
 
  - homopolymer_base: if part of a homopolymer, the
  homopolymer base is given
 
  - homopolymer_size: if part of a homopolymer, the
  homopolymer size is given
 
  - microsat: Presence in a microsatelite
 
  - oreganno: literature-curated regulatory regions,
  transcription factor binding sites, and regulatory polymorphisms
  from oreganno
 
  - phastConsElements46way_name: evolutionary conservation
  region name (phastcons raw log odds scores)
 
  - phastConsElements46way_score: evolutionary
  conservation score (phastcons)
 
  - phastConsElements46wayPlacental_name: evolutionary
  conservation name in placental mammals
 
  - phastConsElements46wayPlacental_score: evolutionary
  conservation score in placental mammals
 
  - phastConsElements46wayPrimates_name: evolutionary
  conservation name in primates
 
  - phastConsElements46wayPrimates_score: evolutionary
  conservation score in primates
 
  - rmsk: Presence in a RepeatMasker region (yes =
  "label from UCSC", no = "")
 
  - simpleRepeat: Presence in simple tandem repeat (yes =
  "label from UCSC", no = "")
 
  - targetScanS_name: Putative miRNA binding site
 
  - targetScanS_score: Putative miRNA binding site score
 
  - tfbsConsSites_name: transcription factor binding site
 
  - tfbsConsSites_score: transcription factor binding site
  score
 
  - tRNAs: transfer RNA
 
  - vistaEnhancers_name: Vista distant-acting
  transcriptional enhancer
 
  - vistaEnhancers_score: Vista distant-acting
  transcriptional enhancer score
 
  - wgEncodeCaltechRnaSeq: Encode RnaSeq score
 
  - wgEncodeH3k4me1: Encode H3k4me1 score
 
  - wgEncodeH3k4me3: Encode H3k4me3 score
 
  - wgEncodeH3k27ac: Encode H3k27ac score
 
  - wgEncodeRegDnaseClusteredV3_name: Encode Dnase region
  name
 
  - wgEncodeRegDnaseClusteredV3_score: Encode Dnase region
  score
 
  - wgEncodeRegTfbsClusteredV3_name: Encode transcription
  factor binding site name
 
  - wgEncodeRegTfbsClusteredV3_score: Encode transcription
  factor binding site score
 
  - wgRna_name: RNA gene (pre-miRNA, snoRNA or scaRNA)
  name
 
  - wgRna_score: RNA gene score
 
  - 1000g3: frequency (in percent) of the variant in the
  1000 genomes data set
 
  - cadd: deleteriousness of single nucleotide variants
  according to CADD
 
  - clinvar_acc: clinvar accession number
 
  - clinvar_disease: clinvar disease
 
  - gnomad_max_freqp: maximum population frequency (in
  percent) in the the gnmomad (genomic) database
 
  - gnomad_nfe_freqp: maximum frequency in the NFE
  population in the the gnmomad (genomic) database
 
  - kaviar: frequency (in percent) in the klaviar database
 
  - snp147: variant name (in the dbSNP 147 database)
 
  - snp147Common: frequency in dbSNP 147 database (only
  for variants > 1% in a sufficient population)
 
  - evs_ea_freqp: frequency (percent) in Exome Variant
  Server (european american population)
 
  - evs_aa_freqp: frequency (percent) in Exome Variant
  Server (african american population)
 
  - evs_ea_mfreqp: homozygote frequency (percent) in Exome
  Variant Server (european american population)
 
  - evs_aa_mfreqp: homozygote frequency (percent) in Exome
  Variant Server (african american population)