Map
Format
cg map ?options? result refseq sample fastqfile ?fastqfile2? ...
Summary
align reads in fastq files to a reference genome using a variety
of methods
Description
This is a generic command to map reads to a reference genome using
different methods.
The workload can be distributed on a cluster or using multiple
cores with job options (more info with
cg help joboptions) by mapping the separate fastq files in parallel
and efficiently combining the results. Having the data in enough
separate fastq files is required for good parallelisation.
Arguments
- result
- resulting aligignment file, the file extension (.bam, .cram,
.sam) determines the format of the file
- refseq
- reference sequence (in fasta format, the reference directory
name may be given instead)
- sample
- sample name (will be used to fill in in the readgroup info)
- fastqfile
- fastq file with the sequences to be aligned
Options
This command can be distributed on a cluster or using multiple
with job options (more info with cg
help joboptions)
- -method method
- gives the alignment method/software to be used, e.g.: bwa,
minimap2, ngmlr, bowtie2
- -paired 1/0 (-p)
- fastqs are paired/unpaired; if paired the matching fastqs
should be given consecutively, e.g. set1_fw.fastq.gz
set1_rev.fastq.gz set2_fw.fastq.gz set2_rev.fastq.gz ...
- -preset preset (-p,-x)
- some of the mapping software (e.g. minimap2) supports
presets, these can be given here, e.g. splice, sr e.g. all presets
supported by minimap2 (map-ont, sr, splice, ..) and ngmlr (ont,
pacbio) can be used with their respective methods.
- -readgroupdata readgroupdata
- space separated key value list that will be put into the
readgroup info (The given sample will be added for some fields)
- -fixmate
- use samtools fixmate to fills in mate coordinates and insert
size fields
- -sort coordinate/nosort/name/c/1
- How to sort the result: "1" and "c" are
synonyms for coordinate sort
- -mergesort 1/0
- Instead of concatenating and then sorting all sam files,
mergesort will merge the already sorted sam files. This is faster
unless a lot of intermediate files have to be made because of the
limit to the number of files that can be opened at the same time.
(default 0)
- -maxopenfiles number
- the number of files that can be opened at the same time by
the system. (default is the limit given by the system where the
command is run in /proc/self/limits or using ulimit -n)
- -keepsams 1/0
- (default 0)
- -threads number (-t)
- (default 2)
- -mem number
- how much memory should be asked/reserved for the mapping
(default: decided by method)
- -time number
- how much time should be asked/reserved for the mapping
(default: decided by method)
- -joinfastqs 1/0
- If you have (too) many very small fastq files the fact that
fastqs are mapped seperately can become a performance problem (e.g.
more time spent on loading indexes than actual mapping). use
joinfastqs to combine fastq files into one (or 2 in case of -paired
1) before processing. (default 0)
- -compressionlevel number
- how much will the result file be compressed (what levels mean
depends on the compression method) (default: reasonable level for
given method)
Dependencies
Some of the mapping programs supported are distributed with
genomecomb (bwa and minimap2), but not all (e.g. ngmlr) These should
be installed separately, and should be runnable from the path.
Category
Mapping