GenomeComb

Regextract

Format

cg regextract ?options? file1 ?file2? ...

Summary

Find regions with a minimum or maximum coverage in a bam, sam, cram, bcol or tsv file.

Description

Write a regionsfile to stdout that contains all regions in the given bam, sam, cram, bcol or tsv file(s) that have given minimum or maximum coverage. e.g. regions in file1 with a minimum coverage of 20 are extracted using

cg regextract -min 20 file1

Using the -max option will give all regions where the coverage in the file reaches at most the given maximum. WARNING: This will only check regions that have a actual given coverage in the file; Positions that are not present in the given file are not returned! For bam and cram files (only), you can use the -all option to include these regions with no given coverage (in this case unused reference sequences).

If there is no chromosome data in the input files (tsv without chromosome), the chromosome name is taken from the filename by splitting on "-", and taking the second element.

Arguments

file1
sam/bam/cram file or file containing (chromosome,) position and value columns, or bcol file.
...
other files

Options

-qfields list
list (separated by spaces) of possible fieldnames to use as values for the cutoff; The first in the list that is in the header will be used (default = "coverage uniqueSequenceCoverage"). This option is not used for bcol files, as they contain data for only one "column".
-posfields list
list (separated by spaces) of possible fieldnames to use as position; The first in the list that is in the header will be used (default = "offset pos position begin start"). This option is not used for bcol files.
-min number
extract regions with coverage >= number.
-max number
extract regions with coverage <= number.
-shift
change the position by the given amount, e.g. -1 to change from 1-based to 0-based coordinates, (default is -1 for bam, cram and sam files, for other files 0)
-q number
Minimum mapping quality for an alignment to be used (only used for bams and crams)
-Q number
Minimum base quality for a base to be considered (only used for bams and crams)
-f 0/1 (--filtered)
(only used for bams and crams) if 1, skip anomalous read pairs and low quality for calculating depth. Low quality filtering in this option is achieved by setting -q and -Q at 20, unless you specifically define other values for these options. (default 0)
-all 0/1
(only for bams and crams) Return all regions (<= max), including regions with no given coverage (unused reference sequences)
-region region
limit command to the given region in the form "chr" or "chr:begin-end" in half-open coordinates: begin and end start from 0 and end is not included

Category

Regions