GenomeComb
cg long ?infile? ?outfile?
Converts data in tsv format from wide format (data for each sample/analysis in separate columns) to long format (data for each sample/analysis in separate lines)
Genomecomb usually expects data in tsv files in wide format, where some fields are common/general for all samples (e.g. chromosome, begin, end, .. for variants) and some fields have specific values for each sample (e.g. quality, coverage, ...). In wide format these columns are named by adding "-samplename" to the fieldname. In the long format, sample specific data is on separate lines (the common fields are repeated in each line). "cg long" converts from wide to long format. "cg wide" can be used for the reverse.
var zyg-gatk-bwa-sample1 zyg-sam-bwa-sample1 zyg-gatk-bwa-sample2 test v u r
using sample this will be split to
var sample zyg-gatk-bwa zyg-sam-bwa zyg-gatk-bwa-sample2 test sample1 v u test sample2 r
using analysis this will be split to
var sample zyg test gatk-bwa-sample1 v test gatk-sam-sample1 u test gatk-bwa-sample2 r
Even though the analysis is now used, the name of the column is still sample! (to allow the use of sampleinfo files)
Format Conversion