GenomeComb

Map

Format

cg map ?options? result refseq sample fastqfile ?fastqfile2? ...

Summary

align reads in fastq files to a reference genome using a variety of methods

Description

This is a generic command to map reads to a reference genome using different methods.

The workload can be distributed on a cluster or using multiple cores with job options (more info with cg help joboptions) by mapping the separate fastq files in parallel and efficiently combining the results. Having the data in enough separate fastq files is required for good parallelisation.

Arguments

result
resulting aligignment file, the file extension (.bam, .cram, .sam) determines the format of the file
refseq
reference sequence (in fasta format, the reference directory name may be given instead)
sample
sample name (will be used to fill in in the readgroup info)
fastqfile
fastq file with the sequences to be aligned

Options

This command can be distributed on a cluster or using multiple with job options (more info with cg help joboptions)

-method method
gives the alignment method/software to be used, e.g.: bwa, minimap2, ngmlr, bowtie2
-paired 1/0 (-p)
fastqs are paired/unpaired; if paired the matching fastqs should be given consecutively, e.g. set1_fw.fastq.gz set1_rev.fastq.gz set2_fw.fastq.gz set2_rev.fastq.gz ...
-preset preset (-p,-x)
some of the mapping software (e.g. minimap2) supports presets, these can be given here, e.g. splice, sr e.g. all presets supported by minimap2 (map-ont, sr, splice, ..) and ngmlr (ont, pacbio) can be used with their respective methods.
-readgroupdata readgroupdata
space separated key value list that will be put into the readgroup info (The given sample will be added for some fields)
-fixmate
use samtools fixmate to fills in mate coordinates and insert size fields
-sort coordinate/nosort/name/c/1
How to sort the result: "1" and "c" are synonyms for coordinate sort
-mergesort 1/0
Instead of concatenating and then sorting all sam files, mergesort will merge the already sorted sam files. This is faster unless a lot of intermediate files have to be made because of the limit to the number of files that can be opened at the same time. (default 0)
-maxopenfiles number
the number of files that can be opened at the same time by the system. (default is the limit given by the system where the command is run in /proc/self/limits or using ulimit -n)
-keepsams 1/0
(default 0)
-threads number (-t)
(default 2)
-mem number
how much memory should be asked/reserved for the mapping (default: decided by method)
-time number
how much time should be asked/reserved for the mapping (default: decided by method)
-joinfastqs 1/0
If you have (too) many very small fastq files the fact that fastqs are mapped seperately can become a performance problem (e.g. more time spent on loading indexes than actual mapping). use joinfastqs to combine fastq files into one (or 2 in case of -paired 1) before processing. (default 0)
-compressionlevel number
how much will the result file be compressed (what levels mean depends on the compression method) (default: reasonable level for given method)

Dependencies

Some of the mapping programs supported are distributed with genomecomb (bwa and minimap2), but not all (e.g. ngmlr) These should be installed separately, and should be runnable from the path.

Category

Mapping