GenomeComb
cg process_reports ?options? sampledir ?dbdir? ?reports?
Calculates a number of statistics on a sample in the reports subdir
cg process_reports commands calculated a number of statistics on a sample and stores these in the subdir reports of the sample. If a type of report cannot be made (e.g. fastqstats if there are no fastqs for the sample), it will be skipped. Most reports are functional rather than fancy: a tsv file with sample, source (i.e. program used to make them), parameter and value
Following report types can be selected:
fastqstats: stats about number, length, quality, ... of reads in the fastq files made using fastq-stats (files report_fastq_fw-source.tsv and report_fastq_rev-sample.tsv) ; fastqc: fastqc analysis with graphs etc. per fastq file (in the fastqc subdir) ; flagstat_reads: stats about bamfiles in the sampledir made using samtools flagstat (file report_flagstat_reads-source.tsv) based on primary alignments only (so counts reads) ; flagstat_alignmments: stats about bamfiles in the sampledir made using samtools flagstat (file report_flagstat_alignments-source.tsv). This counts alignments, not reads (includes secondary alignments) ; samstats: stats about bamfiles in the sampledir made using samtools stats (file report_samstats_summary-source.tsv) all kinds of summary stats based on the sam/bam/cram file (including nr reads, etc) Als creates several report_samstats_section-source.tsv file containing information such as readlength distribution (section=RL), ; histodepth: create a histogram of the sequencing depth. If a targetfile is present, histograms for on- and off-target regions will be separated. The report_* version contains coverage statistics at various depth cutoffs ; vars: number of variants, quality variants ($coverage >= 20 and $quality >= 50), etc in the various var files (file report_vars-source.tsv) ; hsmetrics: picard hsmetrics analysis of target coverage ; covered: how much bases are covered in the various region files (5x coverage, 20x coverage, gatk sequenced, ...), oper chromosome and in total (file report_covered-source.tsv) ; histo: "histogram" of coverage (file crsbwa-sample.histo) ; predictgender: predict gender of a sample
This command can be distributed on a cluster or using multiple threads with job options (more info with cg help joboptions)
Process