GenomeComb

Bcol_make

Format

cg bcol make ?options? bcolfile column ?srcfile?

Summary

Creates a bcol file

Description

The bcol_make commands creates a bcol file based on tab-separated data. The input data is given via stdin. The data in one of the columns (given as a parameter) in the file is converted to the binary format.

Values in the input file can be assigned to specific positions using the poscol parameter, which indicates which field will contain the position. A default value will be added in the bcol file for positions skipped in the input file. If the endcol parameter is also given, all values between first pos ition and end will be set to the value in the value column. The optional chromosomecol parameter can be used to indicate which field must be used to assign values to a specific chromosome. If the input file contains fields with names recognized for chromosome, begin or end, these will be automatically used for these parameters. If these fields are present, but you do not want to use them, set the parameters explicitely to empty

Multiple columns (of the same type) can be stored in one bcol file, e.g. for variant annotation (a different score for each potential alternative allele) using the -m and -l options.

Arguments

bcolfile
name of bcolfile to be created
column
Name of the column containing the data
srcfile
Name of the file used for providing input data (if not given, default is from stdin)

Options

-t type
type of data, should be one described in the description of the format (bcol)
--precision precision
default precision for display of data in bcol
-d defaultvalue (--default)
value that will be given for positions not in the input file
-h header (--header)
set to 0 (default is 1) if the input data does not have a header (in this case chromosomecol and poscol must be given by number, starting from 0)
-c chromosomecol (--chromosomecol)
name of the column that contains the chromosome
-n chromosomename (--chromosomename)
if no chromosomecol is present (no -c), this name will be given as chromosome (default is empty)
-p poscol
name of the column that contains the positions
-e endcol
name of the column that contains the the end positions (in case of regions, this cannot be used with -m)
-co compressionlevel (--compress)
compression level used (default 8): 1 is very fast; 8 is slower, but compresses better, 0 can be used for no compression
-l multilist (--multilist)
a comma separated list with the names of all columns, e.g. A,C,G,T for variant annotation. This option must be given together with the next option (-m)
-m multicol (--multicol)
name of the column containing a list of names the values will be assigned to. If a name in multilist is not present in this list for a row, the default value will be assigned.

Category

Format Conversion