GenomeComb

gene2reg

Format

cg gene2reg ?infile? ?outfile?

Summary

Extract gene elements from data in genepred-like format to a (tab separated) region file.

Description

cg gene2reg outputs a tsv file with the location and meta info about each element of a gene on a separate line. The output is a region file (fields chromosome, begin, end) that typically includes the following extra fields:

type
UTR, CDS, intron, RNA
element
exon1, intron1, exon2, intron1, ...
rna_start
start of the element on the RNA sequence
rna_end
end of the element on the RNA sequence
protein_start
start of the element on the RNA sequence relative to the protein start
protein_end
start of the element on the RNA sequence relative to the protein start
gene
gene name
transcript
transcript id

Options

-upstream nrbases
if nrbases > 0 upstream and downstream regions will be added, where nrbases gives how many bases will be assigned as upstream/downstream (default 0)
-nocds 0/1
if 1 (default = 0) indications of cdsStart and cdsEnd are not taken into account, and all genes will be handled as non coding, meaning that also for coding genes all regions will be output as type RNA instead of UTR and CDS, and exons containf start or stop site are in one (RNA) region.

Arguments

infile
file to be converted, if not given, uses stdin. File may be compressed.
outfile
write results to outfile, if not given, uses stdout

Category

Format Conversion