Multithreading with BLAST

The NCBI BLAST+ suite has built in multithreading, which is nice. However, this multithreading is not always utilising all threads available to it.

A nifty way around this is to split your input query into several smaller files (equal to the number of threads you have available) and blast them separately using a single thread for each. Afterwards concatenating the output together. This will allow for a much higher utilisation of currently available computational resources, and in the end will make it go much faster.

One can do this using built in bash tools such as ‘split’, but one have to be a bit careful if the input fasta file is not formatted with one sequence per line. For convenience sake I have made a bash wrapper script that will do just this using a custom perl script for splitting the file, and GNU parallel for running all instances of BLAST in parallel.

In terms of time-saving my tests using a 1000 sequence protein query against the swissprot database and 30 threads took 8.4 minutes to finish.
Using the split approach this was reduced to 3.2 minutes. A total saving of about 62%.

There are other methods available for very large datasets, such as diamond aligner, which will do the alignments even faster. But sometimes you have to use BLAST specifically, for instance if you require a specific output format.

The script is available on request.