GenomeComb
The bcol format stores one (or more) column(s) of data in an efficient binary format. All data must be of the same type (list of supported types below). bcol files can e.g. used to store a score for each position of the genome and use this for annotation.
The bcol format consists of two files, one in a text format describing the bcol data (bcolfile.bcol) and one containing the actual binary data (bcolfile.bcol.bin.zst). There may be a (third) index file (bcolfile.bcol.bin.zsti) for faster random access to the compressed binary data. bcol files can be created based on a tab-separated file (tsv) file using the command cg bcol make.
There was an older version of the bcol format (which did not support multiple chromosomes or multiple alleles) that is not described here. Files (there were separate bcol files per chromsome) in this older format can be converted to the current using
cg bcol_update newbcolfile oldbcolfile ?oldbcolfile2? ...
bcolfile.bcol is a tab-separated file (tsv) with the following format
# binary column # type iu # default 0 # precision 3 chromosome begin end 1 0 50 2 10 51
The fields "chromosome begin end" must be present, and indicate for which regions data is present in the binary file. They folow the same zero-based half-open indexing scheme as all genomecomb tsv files.
The comments before the table must contain the following information:
A bcol file can contain multiple values for each position. This is indicated by the extra field multi in the bcol comments. e.g. "# multi A,C,T,G" is used in a var bcol to indicate that there are 4 values for each position for the bases A, C, T and G respectively.
bcolfile.bcol.bin.zst is the (compressed) binary file containing the actual data, a lz3 compressed continuous stream of values in the given binary type. The uncompressed binary file (bcolfile.bcol.bin) is also recognized, but by default the binaries are compressed. A zst index file bcolfile.bcol.bin.zst.zsti may also exists to speed up random access to the compressed binary.