GenomeComb
cg groupby ?options? fields ?infile? ?outfile?
Group lines in a tsv file, based on 1 or more identical fields
"cg select -g" provides most of the groupby functionality (and a lot extra). "cg groupby" can run using little memory on sorted files, while for select results must however allways fit in memory. Groups lines in a tab separated file, where the value in in the given fields are the same. By default, entries in data fields (= other than the ones on which the groups are made) are concatenated using commas. Only subsequent lines with the same values in fields are grouped! Sort first if you are not sure that they are. If the expected result is not too large to fit in memory, you can use the "-sorted 0" option to avoid sorting first: it wil group same values regardless of the order, keeping the results in memory and writing them when finished.
Query