Regextract
Format
cg regextract ?options? file1 ?file2? ...
Summary
Find regions with a minimum or maximum coverage in a bam, sam,
cram, bcol or tsv file.
Description
Write a regionsfile to stdout that contains all regions in the
given bam, sam, cram, bcol or tsv file(s) that have given minimum or
maximum coverage. e.g. regions in file1 with a minimum coverage of 20
are extracted using
cg regextract -min 20 file1
Using the -max option will give all regions where the coverage in
the file reaches at most the given maximum. WARNING: This will only
check regions that have a actual given coverage in the file;
Positions that are not present in the given file are not returned!
For bam and cram files (only), you can use the -all option to include
these regions with no given coverage (in this case unused reference
sequences).
If there is no chromosome data in the input files (tsv without
chromosome), the chromosome name is taken from the filename by
splitting on "-", and taking the second element.
Arguments
- file1
- sam/bam/cram file or file containing (chromosome,) position
and value columns, or bcol file.
- ...
- other files
Options
- -qfields list
- list (separated by spaces) of possible fieldnames to use as
values for the cutoff; The first in the list that is in the header
will be used (default = "coverage
uniqueSequenceCoverage"). This option is not used for bcol
files, as they contain data for only one "column".
- -posfields list
- list (separated by spaces) of possible fieldnames to use as
position; The first in the list that is in the header will be used
(default = "offset pos position begin start"). This
option is not used for bcol files.
- -min number
- extract regions with coverage >= number.
- -max number
- extract regions with coverage <= number.
- -shift
- change the position by the given amount, e.g. -1 to change
from 1-based to 0-based coordinates, (default is -1 for bam, cram
and sam files, for other files 0)
- -q number
- Minimum mapping quality for an alignment to be used (only
used for bams and crams)
- -Q number
- Minimum base quality for a base to be considered (only used
for bams and crams)
- -f 0/1 (--filtered)
- (only used for bams and crams) if 1, skip anomalous read
pairs and low quality for calculating depth. Low quality filtering
in this option is achieved by setting -q and -Q at 20, unless you
specifically define other values for these options. (default 0)
- -all 0/1
- (only for bams and crams) Return all regions (<= max),
including regions with no given coverage (unused reference
sequences)
- -region region
- limit command to the given region in the form "chr"
or "chr:begin-end" in half-open coordinates: begin and
end start from 0 and end is not included
Category
Regions