GenomeComb

Multitranscript

Format

cg multitranscript ?options? multitranscriptfile transcriptfile transcriptfile ?transcriptfile? ...

Summary

Compare multiple transcript files

Description

This command combines multiple transcript files into one multitranscript file; a tab separated file containing a wide table used to compare transcript (counts) between different samples. Each line in the table contains general transcript info (chromosome, begin, end, strand, exonStarts, exonEnds, ?cdsStart?, ?cdsEnd?, ?transcript?, ?gene?, ?geneid?) and columns with transcript info specific to each sample (count, etc.). The latter have a column heading of the form field-<sample>.

<sample> can be simply the samplename, but may also include information about e.g. the sequencing or analysis method in the form method-samplename, e.g. count_weighed-isoquant-sample1.tsv.

When a transcript was not detected in a sample, the value in the table is set to 0.0.

By default multitranscript will use approximate matching for novel transcripts (based on the content of the field "category"): Novel transcripts will be matched if they have identical junction chains, even if the (predicted) ends differ. The resulting transcript will have the lowest start and highest end position of these matches. The id of these result transcripts will be changed to one unique for that transcript and consistent over different experiments (based on location and intron/exon sizes)

Arguments

multitranscriptfile
resultfile, will be created if it does not exist
transcriptfile
file containing transcript of a new sample to be added More than one can added in one command

Options

-exact 0/1
use exact matching
-match pattern
For all transcripts whose id/name matches pattern (using regexp) approximate matching will be used. (overruling the value in category, though this will still be used to see which name will be given to the result; known preferred)
-skipempty 0/1
skip isoform files which are empty (size = 0) (default 1)

Category

RNA