GenomeComb

process_reports

Format

cg process_reports ?options? sampledir ?dbdir? ?reports?

Summary

Calculates a number of statistics on a sample in the reports subdir

Description

cg process_reports commands calculated a number of statistics on a sample and stores these in the subdir reports of the sample. If a type of report cannot be made (e.g. fastqstats if there are no fastqs for the sample), it will be skipped. Most reports are functional rather than fancy: a tsv file with sample, source (i.e. program used to make them), parameter and value

Following report types can be selected:

fastqstats: stats about number, length, quality, ... of reads in the fastq files made using fastq-stats (files report_fastq_fw-source.tsv and report_fastq_rev-sample.tsv) ; fastqc: fastqc analysis with graphs etc. per fastq file (in the fastqc subdir) ; flagstat_reads: stats about bamfiles in the sampledir made using samtools flagstat (file report_flagstat_reads-source.tsv) based on primary alignments only (so counts reads) ; flagstat_alignmments: stats about bamfiles in the sampledir made using samtools flagstat (file report_flagstat_alignments-source.tsv). This counts alignments, not reads (includes secondary alignments) ; samstats: stats about bamfiles in the sampledir made using samtools stats (file report_samstats_summary-source.tsv) all kinds of summary stats based on the sam/bam/cram file (including nr reads, etc) Als creates several report_samstats_section-source.tsv file containing information such as readlength distribution (section=RL), ; histodepth: create a histogram of the sequencing depth. If a targetfile is present, histograms for on- and off-target regions will be separated. The report_* version contains coverage statistics at various depth cutoffs ; vars: number of variants, quality variants ($coverage >= 20 and $quality >= 50), etc in the various var files (file report_vars-source.tsv) ; hsmetrics: picard hsmetrics analysis of target coverage ; covered: how much bases are covered in the various region files (5x coverage, 20x coverage, gatk sequenced, ...), oper chromosome and in total (file report_covered-source.tsv) ; histo: "histogram" of coverage (file crsbwa-sample.histo) ; predictgender: predict gender of a sample

Arguments

sampledir
processed sample directory (containing variant files, bam files, etc.)
dbdir
directory containing reference data (genome sequence, annotation, ...)
reports
select wich reports to make (can also be given as an option). You can use basic (default) to add all basic reports (currently all except predictgender), or all to create all report types.

Options

-dbdir dbdir
dir containing the reference databases
-r reports (also --reports)
list of reports to make
-paired 0/1
fastq reports are split in fw and rev if paired is 1 (default), all data is put under fw if 0
-threads integer
The underlying code my use integer number of threads (default 1, not all reports can use threads)

This command can be distributed on a cluster or using multiple threads with job options (more info with cg help joboptions)

Category

Process