GenomeComb

Process

Format

cg make_project ?options? projectdir samplename samplesheet

Summary

Create a project directory from a samplesheet.

Description

make_project is a convenience function to create a project directory based on a samplesheet. (You can do the same manually by creating the directories, adding links or copying raw data files). If the project directory does not exist yet, it will be created. The command creates sample directories under projectdir/samples, and adds the given raw data under projectdir/samples/samplename (by default using softlinks).

The samplesheet is a tsv file with at least the fields "sample" and "seqfiles" (The command will also recognize fieldnames "fastq" or "fastqs" instead of "seqfiles")

For each in the samplesheet line the sample (name given by the field "sample") will be created in projectdir/samples seqfiles gives the location of sequencing data files in fastq or ubam format that will be added to the sample. The value can be the path to a specific file, a directory (containing the sequencing files), or a (glob) pattern matching one or more sequencing files (e.g. data/sample1_*.fastq.gz) The data files are (by default) soflinked in the directory projectdir/samples/samplename/fastq for fastq files (extension .fastq, .fq, .fastq.gz, or .fq.gz), and in projectdir/samples/samplename/ubam for unaligned bam files (extension .bam)

You can have more than one line for the same sample, possibly merging sequencing data from different sources into one sample. (You can not mix fastq and ubam sources this way; if both a fastq and ubam directory are present, only the ubam will be used for analysis)

The extra fields in the samplesheet (if not empty) are added to the projects options.tsv file, which allows you to set specific analysis options for each sample, e.g. the samplesheet

sample  seqfiles    preset
sample1 sample1/*.fastq.gz  srs
sample2 sample2/*.fastq.gz  ont

will setup a projectdir where sample1 will be analysed using the srs preset (for short read sequencing), whereas sample2 is an ont sample to be analysed using the ont preset.

Arguments

projectdir
project directory name.
samplename
sample name
rawdata
directory containing source/raw data (for only one sample), or one or more raw data files.

Options

-tranfer soft/rel/hard/copy
determines how raw data files are "transferred" to the sampledir: soft(link absolute path), rel(ative softlink), hard(link) or copy
-force 0/1
Normally an error is given if the target file already exists. use -force 1 to overwrite existing sample data
-amplicons ampliconfile
Add amplicons file. This option turns on amplicon sequencing analysis (see cg process_sample) using the amplicons defained in ampliconfile
-targetfile targetfile
Add targetfile. if provided, coverage statistics will be calculated for this region

Category

Process