Simple command line data manipulation

Recently I found a nice little tool called datamash. This allows for simple data wrangling, or should one say mashing, directly on the command line. I found this tool since I wanted to have a way of easily transposing tab separated files via the command line. One such example would be if you have a files looking something like this:

gene1    gene2    gene3    gene4
2    4    6    8

Now, instead we might want this transposed, with datamash thing becomes very easy

Now our output will be transposed:

gene1    1
gene2    4
gene3    6
gene4    7

Used together with paste this becomes very handy when adding together datasets

Datamash also allows for other things, like getting minimums, maximums, sums etc stright from the command line. As example, you might have a file containing a lot of columns with different values, getting the sum, median or other statistics becomes quite easy with datamash. As exampel we can use the following file, and we want stats on the second column:

1    2    3
2    3    5
4    5    7

This would tell us that the sum is 10, the median is 3 and the average is 3.33

Handy! It also allows for more statistical operations, but the transposing was the main functionality I was after, the other stuff is pure gravy.

datamash can be found here: https://www.gnu.org/software/datamash/