Inverse removing files from a directory

One thing I really like about bash is how certain problems can be solved in a multitude of ways.

Recently I had the following problem.

A directory contain a whole bunch of files like this:

In reality there were thousands of files, and also some non html files, but let’s just keep it simple for now.

So I only wanted to keep some of these, and the ones I wanted to keep I had in a separate file, files_to_keep.txt. So how can one easily remove all the files that do not exists in this file? For testing purposes, let’s say we want to get rid of file1.html and file3.html.

Well, we had a small chat about it in the office and came up with two solutions. One way would be to create a new file containing all the files in the directory ending with .html

and then do a comparison of the files with comm, only printing the lines only found in ‘files_in_dir.txt”

This will basically print the files which we want to remove. So a easy way of doing this would be to put it in a while loop, and can of course be done as a single step using a pipe

This does solve the issue. However, there is an alternative way using find and xargs. With find we can find all the files ending in .html, and we can also exclude files using the negated name flag ‘! -name’. However, -name only takes a single file, and we really don’t want to specify each file manually since we could be dealing with thousands of files. In our small test case, we could do it, and it would be something like this:

Which we then can remove right away using xargs

However, we want to be able to do this for thousands of files, so we want to generate the string ‘! -name file2.html ! -name file4.html’ automatically from our file, files_to_keep.txt. Here is my solution to this. We can use sed and paste to generate this string:

This will generate the following output

So we can now combine this, using backticks, with find, to find all the html files, and xargs to remove all the unwanted files.

So that’s two ways of solving that issue. Anyone have some better solutions?

Edit: As Ino suggested in the comments, a less complicated way would be to use inverse grep, like so:

As always with bash, there is always a better solution lurking out there 🙂