GenomeComb

tsvjoin

Format

cg tsvjoin ?options? file1 file2 ?outfile?

Summary

join two tsv files based on common fields

Description

tsvjoin creates a new tsv file joining the two given input tsv files. The new tsv adds extra fields from the second tsv to the first where the idfields are the same between the two. Both files must be sorted on their respective id fields.

Arguments

file1
input file 1
file2
input file 2
outfile
write results to outfile, if not given, uses stdout

Options

-idfields list
which fields identify the object in file1, and should be match the id fields in file2 (default is all fields with the same name between the 2 files)
-idfields2 list
which fields identify the object in file2, and should be match the id fields in file1 (default is same fields as for file1)
-pre1 string
prefix all fieldnames (except the idfields) coming from file1 with string
-pre2 string
prefix all fieldnames (except the idfields) coming from file2 with string
-sorted 0/1
tsvjoin will by default () sort the input files properly to do the join, you can use -sorted 1 if the files are already properly sorted (avoiding the extra sort)
-type inner/full/left/right
determines the type of join: In an inner join only lines are present where the id was found in both files. A left join shows all lines of file1; if the id values where not found in file2, the file2 based fields are empty. A right join contains all lines in file2 and a full join (default) has data for all lines in both files, putting empty values where needed.
-comments 0/1/2
by default comments (start of tsv file) from file1 will be included in the result (1), select 2 to include comments from file2, or 0 to not include comments

Category

tsv