Genomecomb 0.111.0 Reference

Format

cg subcommand ?options? ....

Description

This help page gives a (reference style) overview of all genomecomb functions. For an introductory text to genomecomb and its formats, use

cg help intro

All genomecombs functions are called using the cg command and a subcommand. The available subcommands are listed on this page (in categories) with a short description. To get further info on how to use the subcommands and their parameters, use

cg help subcommand

cg subcommand -h

Options

The following options are generic and available for all subcommands. They must however always preceed the subcommand specific options.

-v number (--verbose): Setting this to 1 or 2 (instead of the default 0) makes some subcommands chattier about their progress. At the given number is 1, logging messages are shown (warnings, start of subtask, etc.) If the number >= 2, progress counters are also shown (for commands that support them)
--stack 0/1: When the program returns an error, by default only the error message is shown, which is normally ok to show errors in input format, etc. if --stack is set to 1, a full stack trace is shown on error (which may be useful to solve errors caused by bugs in the program)

Many commands will transparantly compress a result file if it has an extension indicating this (.zst, .gz, ...). Following options are for this case:

-compressionlevel number: number indicating the level of compression (1 for fast compression, higher numbers for better but slower compression) used when resultfiles are to be compressed
-compressionthreads number: Use number cg_threads (default 1) to compress the result file (if the result needs to be compressed, and the compression method supports threads)

-shadowdir path|threads (default 1) to compress the result file (if the result needs to be compressed, and the compression method supports threads)

-shadowdir path]]: alternative location (e.g. a faster disk or disk without snapshots) to store some of the intermediate files that must be accessible to multiple jobs (will be made using shadow_mkdir). Can also be specified using the environment variable SHADOWDIR.

-scratchdir path: alternative location to store (local, not potentially accessed by multiple nodes) temporary files which may become very big. The default location is /tmp; if this is too small (on some nodes) you can specify a different location using this option Can also be specified using the environment variable SCRATCHDIR.

Available subcommands

Process

process_project: process a sequencing project directory (projectdir), generating full analysis information (variant calls, multicompar, reports, ...) starting from raw sample data from various sources.
make_project: Create a project directory from a samplesheet.
makerefdb: Create a reference sequence and annotation databases directory
process_multicompar: process a sequencing project directory. This expects a genomecomb directory with samples already processed and makes annotated multicompar data
process_reports: Calculates a number of statistics on a sample in the reports subdir
process_rtgsample: Convert rtg data to cg format
process_sample: Processes one sample directory (projectdir), generating full analysis information (variant calls, multicompar, reports, ...) starting from raw sample data that can come from various sources.
project_addsample: Add a sample directory to a project directory.
renamesamples: Converts the sample names in a file or entire directory to other names.

Query

select: Command for very flexible selection, sorting and conversion of tab separated files
viz: grahical vizualization of tsv files (even very large ones) as table without loading everything into memory
multiselect: Command for using cg select on multiple files, and combine the results
groupby: Group lines in a tsv file, based on 1 or more identical fields

Analysis

exportplink: make a plink "Transposed fileset" from the genome data
homwes: finds regions of homozygosity based on a variant file
homwes_compare: finds regions of homozygosity based on a variant file

Regions

multireg: Compare multiple regions files
regselect: select all regions or variants in file that overlap with regions in region_select_file
covered: Find number of bases covered by regions in a region file
regcollapse: Collapses overlapping regions in one (sorted) or more region files
regcommon: list regions common between region_file1 and region_file2
regextract: Find regions with a minimum or maximum coverage in a bam, sam, cram, bcol or tsv file.
regjoin: join regions in 1 or 2 regionfiles
regsubtract: subtract regions in region_file2 from region_file1

Annotation

annotate: Annotate a variant file with region, gene or variant data
download_biograph: download_biograph gene rank data from biograph.
download_mart: Download data from biomart
geneannot2reg: Creates a (region) annotation file based on a genelist with annotations.

Validation

genome_seq: Returns sequences of regions in the genome (fasta file), optionally masked for snps/repeats
makeprimers: Make sequencing primers for Sanger validation experiments
makesequenom: Make input files for designing sequenom validation experiments
primercheck: check primer sets for multiple amplicons and snps
validatesv

Variants

multicompar: Compare multiple variant files
multicompar_reannot: Reannotate multicompar file
multicount: Compare multiple count files that all have the same ids (depricated)
pmulticompar: Compare multiple variant files
process_sv: Do all steps on a Complete Genomics sample to generate structural variant calls
sv: generic command to call structural variants based on the alignment in bamfile.
svmulticompar: Compare multiple structural variant files
var: generic command to call variants based on the alignment in bamfile.

tsv

cat: Concatenate tab separated files and print on the standard output (stripping headers of files other than the first).
checktsv: Checks the given tsv file for errors
collapsealleles: Convert a split variant file to unsplit by collapsing alleles of the same variant into one line.
fixtsv: fix erors in a tsv file
graph: Visualization of large tsv files as a scatter plot
index: make indices for a tsv file
less: view a (possibly compressed) file using the pager less.
paste: merge lines of tab separated files.
split: split a tab separated file in multiple tab separated files based on the content of a (usually chromosome) field.
tsvjoin: join two tsv files based on common fields

Conversion

sam_clipamplicons: Clip primers from aligned sequences in a sam file (by changing bases to N and quality to 0) given a set of target amplicons.
liftover: Use liftover to convert a variant file from one build to another
liftregion: Use liftover to convert a region file from one build to another
liftsample: Use liftover to convert the files from a sample (variant, region, ...) from one build to another
liftchain2tsv: Converts data in UCSC chain file format (for liftOver) to a tsv (tab separated) format to be used in e.g. cg liftregion.
correctvariants: Complete or correct a variants file
bam_index: index a bam/cram/sam file.
bam_sort: sort a bam file.
bamreorder: Changes the order of the contigs/chromosomes in a bam file
fastq_split: split a (large) fastq file in multiple smaller ones
samcat: concatenates sam files outputting to stdout.

Format Conversion

10x2tsv: Converts data in 10x format to the tab-separated format (tsv).
bam2cram: Convert a bam file to a cramfile
bam2fastq: Extracts reads from a bamfile into fastq. The actual extraction is based on picard's SamToFastq or samtools bam2fq, but the data is prepared first (ao. sorted by name to avoid the problems caused by position sorted fq files)
bam2reg: Extract regions with a minimum coverage from a bam file.
bam2sam: convert bam (or cram) to sam, sorting chromosomes into natural sort order.
bams2crams: Convert bam files to a cram
bcol: several commands for managing bcol files
bcol_get: Get a list of values from a bcol file
bcol_make: Creates a bcol file
bcol_table: Outputs the data in a bcol file as tab-separated
bcol_update: updates bcol file(s) in the old format to the new
bed2genetsv: Converts data in bed12 format to the tsv gene format
bed2tsv: Converts data in bed format to tsv format
cg2sreg: Extracts sequenced region data from a Complete Genomics format file in tsv (tab separated, simple feature table) format. The command will also sort the tsv appropriately.
cg2tsv: Converts data in Complete Genomics format to tsv (tab separated, simple feature table) format. The command will also sort the tsv appropriately. The cggenefile is optional. It contains the CG gene annotations, which can be (best) left out as these annotations (and more) can be done in genomecomb.
clc2tsv: Converts output from the clc bio assembly cell snp caller to tsv format
colvalue: Converts data in tsv format from a long key-value format to a wide column-value format
convcgcnv: Convert the Complete Genomics CNV files to the cg format
convcgsv: Convert the Complete Genomics SV files to the cg format
csv2tsv: Converts comma-separated value (csv) data to tab-sparated (tsv)
fasta2tsv: Converts data in fasta format to tab-separated format (tsv).
fastq2tsv: Converts data in fastq format to tab-separated format (tsv).
gb2fasta: Comverts a genbank file to a fasta file The file may contain multiple sequences. The name of the sequence (in the fasta file) is taken from one of the following fields (in order, if not empty or .): LOCUS, VERSION, ACCESSION, KEYWORDS, DEFINITION
gene2reg: Extract gene elements from data in genepred-like format to a (tab separated) region file.
genepred2tsv: Converts data in genepred format to tab-separated format (tsv).
genepredtsv2fasta: Extracts transcript sequences based on a transcript tsv file (format_transcript) and stores them in a fasta file.
gff2tsv: Converts data in gff format to gene tsv format
gtf2tsv: Converts data in gtf format to gene tsv (tab separated) format
keyvalue: Converts data in tsv format from wide format (data for each sample in separate columns) to keyvalue format.
long: Converts data in tsv format from wide format (data for each sample/analysis in separate columns) to long format (data for each sample/analysis in separate lines)
rtg2tsv: Converts data in rtg format to tsv format
sam2tsv: Converts data in sam format to tsv format. The original sam header is included in the comments. Extra fields in the sam file are kept as field:type:value, but put together as a list in a column named other. Optionally the values for some extra fields can be split into different columns using the -fields option.
tsv210x: Converts data in tab-separated format (tsv) format to the 10x format.
tsv2bed: Converts data in tab-separated format (tsv) format to bed format. By default it will create a minimal bed file by extracting chromosome,begin and end from the input using the default fields.
tsv2csv: Converts tab-sparated (tsv) data to comma-separated value (csv)
tsv2fasta: Converts data in tab-separated format (tsv) format to fasta format.
tsv2fastq: Converts data in tab-separated format (tsv) format to fastq format.
tsv2gff: Converts data in tab-separated format (tsv) format to gff format.
tsv2gtf: Converts data in tab-separated format (tsv) format to gtf format.
tsv2sam: Converts data in tab-separated format (tsv) format to sam,bam or cram format.
tsv2vcf: Converts data in genomecomb tab-separated variant format (tsv) to vcf.
vcf2tsv: Converts data in vcf format to genomecomb tab-separated variant file (tsv). The command will also sort the tsv appropriately.
wide: Converts data in tsv format from long format (data for each sample in separate lines) to wide format (data for each sample in separate columns)

Report

bam_histo: create a report with coverage data of given regions in a bam file
coverage_report: generates coverage report for region of interest A histo file is generated at the location of the bam file
depth_histo: makes a histogram of the depth in the given bamfile, subdivided in on- and oftarget regions
hsmetrics: Creates a hsmetrics file using picard CalculateHsMetrics
predictgender: predicts gender for a sample

Info

error_report: display help
help: display help
version: return version information
versions: return version information as a tsv

RNA

iso_isoquant: Call and count isoforms and genes using isoquant.
multigene: Compare multiple gene files
multitranscript: Compare multiple transcript files

Singlecell

sc_demultiplex: demultiplexes ont single cell results into separate samples based on a demultiplexing file
sc_pseudobulk: make pseudobulk files of sc_gene and sc_transcript files based on an sc_group file

Compression

bgzip: Compresses files using bgzip
compress: Compresses files using a choice of methods
job_update: updates the logfile of a command using joboptions
lz4: Compresses files using lz4
lz4ra: decompresses part of a lz4 compressed file
razip: Compresses files using razip
unzip: Decompresses
zcat: pipe contents of one or more (potentially compressed) files to stdout.
zst: Compresses files using zstd

Mapping

map: align reads in fastq files to a reference genome using a variety of methods

Convenience

install: easy install of optional dependencies (external software used by genomecomb for some analysis) and some reference databases.

Dev

qjobs: returns the running jobs on the cluster in tsv format.
qsub: submit a command to the cluster (grid engine).
sh: runs a shell in which genomecomb commands can be input and executed interactively.
source: runs a script (Tcl) with all commands/extensions from genomecomb available.
tsvdiff: compare tsv files

Tools

cplinked: create a copy of a directory (or directories) where each file in it is a softlink to the src.
hardsync: creates a hardlinked copy of a directory in another location
mklink: Create softlinks (improved)
shadow_mkdir: create a shadowed dir

Visualization

viz_transcripts: Creates a visual presentation of isoform usage

Home

Contact

Installation

Documentation