fastq_split

Format

cg fastq_split ?options? infile outtemplate

Summary

split a (large) fastq file in multiple smaller ones

Description

cg fastq_split will split a (large) fastq file in several smaller fastq files. You can either specify the target number of sequences per output file (-numseq option), or the target number of files you want to generate (-parts option). outtemplate is a filename (which may contain a path) that determines the names of the files the parts of the fastq file will be written to. The file part will be prepended with p<part>_., e.g. if outtemplate is results/test.fastq.gz, the results files will be named: results/p1_test.fastq.gz results/p2_test.fastq.gz ...

The command supports parallel processing using the joboptions, mainly for the second part of the command, namely compressing the resulting fastq files. The first part (splitting up the file) is not parallellized, but can be run as a single job on a cluster when using the -parts option. Using -numseq, the first part is always run directly (not as a job), because the command cannot know up front how many parts will be generated, and thus not submit the follow up compression jobs

Arguments

infile ...: fastq file to split
outtemplate ...: template for filenames that will be generated

Options

-parts number: number of output files to generate (this option has preference)
-numseq number: number of sequences to store per output file
-maxparts number: maximum number of parts that will be generated (if numseq is small, last part will contain all remaining sequenced)
-threads number: number of threads to use to compress the output files (bgzip -@)

Home

Contact

Installation