GenomeComb

GenomeComb

Installation

System Requirements

Binaries for genomecomb are available for 64bit Linux systems (Linux-x86_64). Genomecomb itself is distributed as a portable application directory: A self-contained directory with the genomecomb executable (cg) and all needed depencies compiled in a way they should work on all (except very ancient) Linux systems. The R application directory (dirR) (used e.g. by scywalker tools in genomecomb) may give errors when run on very minimal systems that do not have "which" installed (in /usr/bin) or fonts configured. If you want to use these commands on such a system, you should install which and fontconfig, e.g. on a centos system:

yum install -y which
yum install -y fontconfig

Install program

Installation of the genomecomb package itself is as simple as downloading the binary archive from one of the following links and unpacking the downloaded archive somewhere in the PATH

If the directory and link are not in the path, you can allways call the executable (cg) directly from the directory, add the directory to the PATH, or put a soft-link to the executable somehwere in the path. (The executable itself must stay in the application directory to work.) Example using a softlink to make it available in a bin directory. e.g.:

ln -s /opt/genomecomb-0.110.0-linux-x86_64/cg /usr/local/bin/cg

Download reference data

For many analyses, such as annotation, you need a reference genome and genome annotation data. These have to be available in a directory you will give as a parameter (refseq). You can download these for several species/builds (some of these, esp human are very large: 55G) and unpack them in a directory, or more easily, directly install them using cg install command Some larger files in the reference databases (cadd annotation, minimap2 and star indexes) are provided separately (downloads below). You can install these download by unpacking directly in the relevant reference database directory (the one you unpacked before)

Special reference files are needed for the cg liftover command (to lift e.g. variants from one genome build to another). These are provided (for lifting between the different human builds) in the following download

Build reference data (optional)

For genomes that are not available for download (but that are available on the UCSC genome browser), you can create a (basic) reference directory using the cg makerefdb command.

Some annotation files are ommitted from the downloads because of the size or because they may not be redistributed (dbnsfp). The genomecomb bundle also contains the scripts used to make the downloadable reference data (in the directory makedbs). You can use these to download the source data and build the (more complete) databases yourself. However, this is a process that takes a lot of time and space.

External software

Some of the externally developed software used by genomecomb is included in the distribution (e.g. samtools, tabix). Most external software needs to be installed on the system (if you want to use them) in a way genomecomb can call them, usually just having the correct executable(s) in the path (for others, like gatk or picard, it will look for a directory of that name in the path). For more convenient installation the following downloads contain portable application directories for most of the supported software (in versions that were tested with genomecomb). These can be made available to genomecomb only (not disturbing the rest of the system) by unpacking them in the directory extra in the genomecomb application. Of course they can also be made available generally on the system by unpacking in a directory in the PATH. You can download and install these (by default for genomecomb only) using cg install. This way you can also install all programs needed for a given preset in one go, e.g. to download tools needed for ont sequence analysis

cg install ont

Example files

To be able to demonstrate selected queries, we have made our annotations of the comparison of chr 21 and 22 of the NA19238, NA19239 and NA19240 individuals ( NA19240 sequenced with different technologies) available: dataset_howto_query.tar.gz For the process_project howto, a small dataset dataset_mixed_yri_mx2.tar.gz is available for download.

Example commandline install

The following commands will install genomecomb and the hg38 reference databases in the /opt/genomecomb directory.

mkdir /opt/genomecomb
cd /opt/genomecomb

wget https://genomecomb.bioinf.be/download/genomecomb-0.110.0-Linux-x86_64.tar.gz
tar xvzf genomecomb-0.110.0-Linux-x86_64.tar.gz
ln -s /opt/genomecomb/genomecomb-0.110.0-Linux-x86_64/cg /usr/local/bin/cg

# Install the hg38 reference databases and the software needed to run the ont preset.
cg install hg38 hg38-cadd hg38-minimap2 ont

# The above could also be done manually using
#wget https://genomecomb.bioinf.be/download/refdb_hg38-0.109.0.tar.gz
#tar xvzf refdb_hg38-0.109.0.tar.gz
#wget https://genomecomb.bioinf.be/download/refdb_hg38-cadd-0.109.0.tar.gz
#cd /opt/genomecomb/hg38
#tar xvzf ../refdb_hg38-cadd-0.109.0.tar.gz
#cd /opt/genomecomb

# download example data
wget https://genomecomb.bioinf.be/download/dataset_howto_query.tar.gz
tar xvzf dataset_howto_query.tar.gz

# delete downloaded files
#rm genomecomb-0.110.0-Linux-x86_64.tar.gz
#rm refdb_hg38-0.109.0.tar.gz refdb_hg38-cadd-0.109.0.tar.gz
rm dataset_howto_query.tar.gz