Genomecomb 0.110.0 Reference
Format
cg subcommand ?options? ....
Description
This help page gives a (reference style) overview of all
genomecomb functions. For an introductory text to genomecomb and its
formats, use
cg help intro
All genomecombs functions are called using the cg command and a
subcommand. The available subcommands are listed on this page (in
categories) with a short description. To get further info on how to
use the subcommands and their parameters, use
cg help subcommand
or
cg subcommand -h
Options
The following options are generic and available for all
subcommands. They must however always preceed the subcommand specific
options.
- -v number (--verbose)
- Setting this to 1 or 2 (instead of the default 0) makes some
subcommands chattier about their progress. At the given number is
1, logging messages are shown (warnings, start of subtask, etc.) If
the number >= 2, progress counters are also shown (for commands
that support them)
- --stack 0/1
- When the program returns an error, by default only the error
message is shown, which is normally ok to show errors in input
format, etc. if --stack is set to 1, a full stack trace is shown on
error (which may be useful to solve errors caused by bugs in the
program)
Many commands will transparantly compress a result file if it
has an extension indicating this (.zst, .gz, ...). Following options
are for this case:
- -compressionlevel number
- number indicating the level of compression (1 for fast
compression, higher numbers for better but slower compression) used
when resultfiles are to be compressed
- -compressionthreads number
- Use number cg_threads
(default 1) to compress the result
file (if the result needs to be compressed, and the compression
method supports threads)
- -shadowdir path|threads (default 1) to compress the result file
(if the result needs to be compressed, and the compression method
supports threads)
- -shadowdir path]]
- alternative location (e.g. a faster disk or disk without
snapshots) to store some of the intermediate files that must be
accessible to multiple jobs (will be made using shadow_mkdir). Can also be specified
using the environment variable SHADOWDIR.
- -scratchdir path
- alternative location to store (local, not potentially
accessed by multiple nodes) temporary files which may become very
big. The default location is /tmp; if this is too small (on some
nodes) you can specify a different location using this option Can
also be specified using the environment variable SCRATCHDIR.
Available subcommands
Process
- process_project
- process a sequencing project directory (projectdir), generating full analysis
information (variant calls, multicompar, reports, ...) starting
from raw sample data from various sources.
- make_project
- Create a project directory from a samplesheet.
- makerefdb
- Create a reference sequence and annotation databases
directory
- process_multicompar
- process a sequencing project directory. This expects a
genomecomb directory with samples already processed and makes
annotated multicompar data
- process_reports
- Calculates a number of statistics on a sample in the reports
subdir
- process_rtgsample
- Convert rtg data to cg format
- process_sample
- Processes one sample directory (projectdir), generating full analysis
information (variant calls, multicompar, reports, ...) starting
from raw sample data that can come from various sources.
- project_addsample
- Add a sample directory to a project directory.
- renamesamples
- Converts the sample names in a file or entire directory to
other names.
Query
- select
- Command for very flexible selection, sorting and conversion
of tab separated files
- viz
- grahical vizualization of tsv files (even very large ones) as
table without loading everything into memory
- multiselect
- Command for using cg select on multiple files, and combine
the results
- groupby
- Group lines in a tsv file, based on 1 or more identical
fields
Analysis
- exportplink
- make a plink "Transposed fileset" from the genome
data
- homwes
- finds regions of homozygosity based on a variant file
- homwes_compare
- finds regions of homozygosity based on a variant file
Regions
- multireg
- Compare multiple regions files
- regselect
- select all regions or variants in file that overlap with
regions in region_select_file
- covered
- Find number of bases covered by regions in a region file
- regcollapse
- Collapses overlapping regions in one (sorted) or more region
files
- regcommon
- list regions common between region_file1 and region_file2
- regextract
- Find regions with a minimum or maximum coverage in a bam,
sam, cram, bcol or tsv file.
- regjoin
- join regions in 1 or 2 regionfiles
- regsubtract
- subtract regions in region_file2 from region_file1
Annotation
- annotate
- Annotate a variant file with region, gene or variant data
- download_biograph
- download_biograph gene rank data from biograph.
- download_mart
- Download data from biomart
- geneannot2reg
- Creates a (region) annotation file based on a genelist with
annotations.
Validation
- genome_seq
- Returns sequences of regions in the genome (fasta file),
optionally masked for snps/repeats
- makeprimers
- Make sequencing primers for Sanger validation experiments
- makesequenom
- Make input files for designing sequenom validation
experiments
- primercheck
- check primer sets for multiple amplicons and snps
- validatesv
Variants
- multicompar
- Compare multiple variant files
- multicompar_reannot
- Reannotate multicompar file
- multicount
- Compare multiple count files that all have the same ids
(depricated)
- pmulticompar
- Compare multiple variant files
- process_sv
- Do all steps on a Complete Genomics sample to generate
structural variant calls
- sv
- generic command to call structural variants based on the
alignment in bamfile.
- svmulticompar
- Compare multiple structural variant files
- var
- generic command to call variants based on the alignment in
bamfile.
tsv
- cat
- Concatenate tab separated files and print on the standard
output (stripping headers of files other than the first).
- checktsv
- Checks the given tsv file for errors
- collapsealleles
- Convert a split variant file to unsplit by collapsing alleles
of the same variant into one line.
- fixtsv
- fix erors in a tsv file
- graph
- Visualization of large tsv files as a scatter plot
- index
- make indices for a tsv file
- less
- view a (possibly compressed) file using the pager less.
- paste
- merge lines of tab separated files.
- split
- split a tab separated file in multiple tab separated files
based on the content of a (usually chromosome) field.
- tsvjoin
- join two tsv files based on common fields
Conversion
- sam_clipamplicons
- Clip primers from aligned sequences in a sam file (by
changing bases to N and quality to 0) given a set of target
amplicons.
- liftover
- Use liftover to convert a variant file from one build to
another
- liftregion
- Use liftover to convert a region file from one build to
another
- liftsample
- Use liftover to convert the files from a sample (variant,
region, ...) from one build to another
- liftchain2tsv
- Converts data in UCSC chain file format (for liftOver) to a
tsv (tab separated) format to be used in e.g. cg liftregion.
- correctvariants
- Complete or correct a variants file
- bam_sort
- sort a bam file.
- bamreorder
- Changes the order of the contigs/chromosomes in a bam file
- fastq_split
- split a (large) fastq file in multiple smaller ones
Format Conversion
- 10x2tsv
- Converts data in 10x format to the tab-separated format (tsv).
- bam2cram
- Convert a bam file to a cramfile
- bam2fastq
- Extracts reads from a bamfile into fastq. The actual
extraction is based on picard's SamToFastq or samtools bam2fq, but
the data is prepared first (ao. sorted by name to avoid the
problems caused by position sorted fq files)
- bam2reg
- Extract regions with a minimum coverage from a bam file.
- bam2sam
- convert bam (or cram) to sam, sorting chromosomes into
natural sort order.
- bams2crams
- Convert bam files to a cram
- bcol
- several commands for managing bcol files
- bcol_get
- Get a list of values from a bcol file
- bcol_make
- Creates a bcol file
- bcol_table
- Outputs the data in a bcol file as
tab-separated
- bcol_update
- updates bcol file(s) in the old format to the new
- bed2genetsv
- Converts data in bed12 format to the tsv gene format
- bed2tsv
- Converts data in bed format to tsv
format
- cg2sreg
- Extracts sequenced region data from a Complete Genomics
format file in tsv (tab separated, simple feature table) format.
The command will also sort the tsv appropriately.
- cg2tsv
- Converts data in Complete Genomics format to tsv (tab
separated, simple feature table) format. The command will also sort
the tsv appropriately. The cggenefile is optional. It contains the
CG gene annotations, which can be (best) left out as these
annotations (and more) can be done in genomecomb.
- clc2tsv
- Converts output from the clc bio assembly cell snp caller to
tsv format
- colvalue
- Converts data in tsv format from a long key-value format to a
wide column-value format
- convcgcnv
- Convert the Complete Genomics CNV files to the cg format
- convcgsv
- Convert the Complete Genomics SV files to the cg format
- csv2tsv
- Converts comma-separated value (csv) data to tab-sparated (tsv)
- fasta2tsv
- Converts data in fasta format to tab-separated format (tsv).
- fastq2tsv
- Converts data in fastq format to tab-separated format (tsv).
- gb2fasta
- Comverts a genbank file to a fasta file The file may contain
multiple sequences. The name of the sequence (in the fasta file) is
taken from one of the following fields (in order, if not empty or
.): LOCUS, VERSION, ACCESSION, KEYWORDS, DEFINITION
- gene2reg
- Extract gene elements from data in genepred-like format to a
(tab separated) region file.
- genepred2tsv
- Converts data in genepred format to tab-separated format (tsv).
- genepredtsv2fasta
- Extracts transcript sequences based on a transcript tsv file
(format_transcript) and stores
them in a fasta file.
- gff2tsv
- Converts data in gff format to gene tsv format
- gtf2tsv
- Converts data in gtf format to gene tsv (tab separated)
format
- keyvalue
- Converts data in tsv format from wide format (data for each
sample in separate columns) to keyvalue format.
- long
- Converts data in tsv format from wide format (data for each
sample/analysis in separate columns) to long format (data for each
sample/analysis in separate lines)
- rtg2tsv
- Converts data in rtg format to tsv format
- sam2tsv
- Converts data in sam format to tsv
format. The original sam header is included in the comments. Extra
fields in the sam file are kept as field:type:value, but put
together as a list in a column named other. Optionally the values
for some extra fields can be split into different columns using the
-fields option.
- tsv210x
- Converts data in tab-separated format (tsv) format to the 10x format.
- tsv2bed
- Converts data in tab-separated format (tsv) format to bed format. By default it will
create a minimal bed file by extracting chromosome,begin and end
from the input using the default fields.
- tsv2csv
- Converts tab-sparated (tsv) data to
comma-separated value (csv)
- tsv2fasta
- Converts data in tab-separated format (tsv) format to fasta format.
- tsv2fastq
- Converts data in tab-separated format (tsv) format to fastq format.
- tsv2gff
- Converts data in tab-separated format (tsv) format to gff format.
- tsv2gtf
- Converts data in tab-separated format (tsv) format to gtf format.
- tsv2sam
- Converts data in tab-separated format (tsv) format to sam,bam or cram format.
- tsv2vcf
- Converts data in genomecomb tab-separated variant format (tsv) to vcf.
- vcf2tsv
- Converts data in vcf format to genomecomb tab-separated
variant file (tsv). The command will also
sort the tsv appropriately.
- wide
- Converts data in tsv format from long format (data for each
sample in separate lines) to wide format (data for each sample in
separate columns)
Report
- bam_histo
- create a report with coverage data of given regions in a bam
file
- coverage_report
- generates coverage report for region of interest A histo file
is generated at the location of the bam file
- depth_histo
- makes a histogram of the depth in the given bamfile,
subdivided in on- and oftarget regions
- hsmetrics
- Creates a hsmetrics file using picard CalculateHsMetrics
- predictgender
- predicts gender for a sample
Info
- error_report
- display help
- help
- display help
- version
- return version information
- versions
- return version information as a tsv
RNA
- iso_isoquant
- Call and count isoforms and genes using isoquant.
- multigene
- Compare multiple gene files
- multitranscript
- Compare multiple transcript files
- sc_pseudobulk
- make pseudobulk files of sc_gene and sc_transcript files
based on an sc_group file
Compression
- bgzip
- Compresses files using bgzip
- compress
- Compresses files using a choice of methods
- job_update
- updates the logfile of a command using joboptions
- lz4
- Compresses files using lz4
- lz4ra
- decompresses part of a lz4 compressed file
- razip
- Compresses files using razip
- unzip
- Decompresses
- zcat
- pipe contents of one or more (potentially compressed) files
to stdout.
- zst
- Compresses files using zstd
Mapping
- map
- align reads in fastq files to a reference genome using a
variety of methods
Convenience
- install
- easy install of optional dependencies (external software used
by genomecomb for some analysis) and some reference databases.
- samcat
- concatenates sam files outputting to stdout.
Dev
- qjobs
- returns the running jobs on the cluster in tsv format.
- qsub
- submit a command to the cluster (grid engine).
- sh
- runs a shell in which genomecomb commands can be input and
executed interactively.
- source
- runs a script (Tcl) with all commands/extensions from
genomecomb available.
- tsvdiff
- compare tsv files
Tools
- cplinked
- create a copy of a directory (or directories) where each file
in it is a softlink to the src.
- hardsync
- creates a hardlinked copy of a directory in another location
- mklink
- Create softlinks (improved)
- shadow_mkdir
- create a shadowed dir
Visualization
- viz_transcripts
- Creates a visual presentation of isoform usage