Using for loop vs Gnu parallel for BLAST

I recently ran across a issue when I had to run a lot of small blasts (1000+) of a bunch of files against a common database, and was thinking about how to do this efficiently.
My first approach was to loop over all the files I wanted to blast and run them with the -num_threads option in blastp to speed things up. The command looked like this:

Testing this on a dataset of 1700 files took over 10m.

Instead I thought why not do this using GNU parallel. So my command changed to this:

Using this approach I was able to finish in a tenth of the time

The underlying issue here is that each blast is so quick, so there is not much to gain by multi-threading blastp itself. By instead using parallel, you get 20 blastjobs running in parallel, and a new one starting up as soon as one finishes.
For more info on using parallel, check out this nice little presentation by former lab member Ino de Bruijn.