Ettema lab blog

Multithreading with BLAST

The NCBI BLAST+ suite has built in multithreading, which is nice. However, this multithreading is not always utilising all threads available to it. A nifty way around this is to split your input query into several smaller files (equal to the number of threads you have available) and blast them separately using a single thread

Continue reading and comment →

Simple command line data manipulation

Recently I found a nice little tool called datamash. This allows for simple data wrangling, or should one say mashing, directly on the command line. I found this tool since I wanted to have a way of easily transposing tab separated files via the command line. One such example would be if you have a

Continue reading and comment →

Printing from commandline

When you have a lot of documents to print, perhaps a bunch of papers you want to read, it can be a bit annoying to print these one by one. A nice solution can instead by to do this from the commandline. The ‘lp’ tool is quite nice and can be modified to do a

Continue reading and comment →

Finding overlapping ranges in Perl

Finding if a value is in a range is something you come across quite often. For instance, perhaps you want to see if a sequence is in a particular region. Or in my case it is to find if two genes overlap. One could solve this in a lot of different ways, but the one

Continue reading and comment →

Defensive BASH

Ran across this interesting blog post talking about defensive BASH scripting. There might be some things one might not agree with, but it do contain a lot of tips for good practices. The idea behind the name “defensive” is to write code that will run into as few of a problems as possible. I for

Continue reading and comment →

Summary of Spang et al.

Farshid Jalalvand, who runs the blog ‘Ytterst Upphöjda Observationer’, made a very nice summary of the recent EttemaLab Spang et al. paper. He has provided a good background to phylogenetics and it very well worth the read for all you Swedish speaking people out there. You can find his blogpost here

Continue reading and comment →

Using for loop vs Gnu parallel for BLAST

I recently ran across a issue when I had to run a lot of small blasts (1000+) of a bunch of files against a common database, and was thinking about how to do this efficiently. My first approach was to loop over all the files I wanted to blast and run them with the -num_threads

Continue reading and comment →

Extracting information from GenBank files

Sometimes you want to get a quick overview of the distribution of a group of uncultured microorganisms. In my case I had extracted all the 16S sequences classified as a certain clade, from the Silva database, and I wanted to know which environment they came from. This information can easily be found in the GenBank

Continue reading and comment →

Mounting drives on OSX over ssh

My computer setup at work consists of a workstation running Linux, and a macbook. In a way to make things smoothly I wanted a way to automatically mount a drive from my Linux machine on my macbook so I can easily share files between the computers. A really nice way of doing this is using

Continue reading and comment →

Inverse removing files from a directory

One thing I really like about bash is how certain problems can be solved in a multitude of ways. Recently I had the following problem. A directory contain a whole bunch of files like this:

In reality there were thousands of files, and also some non html files, but let’s just keep it simple

Continue reading and comment →

Categories