How easy is it, to write a program that removes a file or a folder? Okay, assume you have all the power a modern operating system provides you, including rm shell convenience command, and unlink system call. Do you really think the answer is still "very"?.. Then take a look at these examples.

Permission trap

Today one of my peers, Denis, was writing a program that somehow analyzed packages in Mandriva Linux distribution. Since trying to analyze the contents of a package without unpacking it looks more like gynecologist's work than that of a programmer, the software Denis wrote unpacked it first. Then it analyzed the package, and erased it with rm -rf in order to avoid disk pollution.

But sometimes it failed to erase it, which led to strange failures of the whole program.

The reason was the permission trap. Common sense tells us that if you can create a file, then you can remove it. This is wrong when it comes to uncommon permissions. Certain software that is very secure sets up specific attributes for certain files. In particular, it revoked writing permissions from certain catalogs, which prevented recursive removal of the unpacked files. You may try it yourself:

$ mkdir -p /tmp/try/this; chmod a-w /tmp/try; rm -rf /tmp/try
rm: cannot remove `/tmp/try/this': Permission denied

Oops. Even if you're the owner of the file, you won't remove it. Even --force doesn't help. To avoid this, I guess, you should chmod files recursively before removing them.

Why does Chromium remove its files twice?

By the way, I encountered a couple of engineers from PVS Studio when they visited Institute for System Programming, in which I worked; they were looking for a mentor for a Ph.D. thesis one of them was going to defend. Small world!

One more totally not interesting thing is that they are not going to support Linux because there is no enough users of their software on that platform. This is both funny and sad: on the one hand, Linux should have much more support, but on the other hand, if your static analysis tool is not decoupled enough to be somewhat platform-independent, then you do not deserve a Ph.D.

I read about this interesting issue here. Most likely, it won't hit you in the nearest years if you're using Linux, but for Windows users it might be interesting and maybe useful. Let me quote the piece of code they referenced

Why in the world would anyone try to remove a file twice determinedly? What was the programmer trying to achieve? Let's quote that article again:

A file can be definitely removed or cannot be removed at all only in textbooks and in some abstract world. In the real system it often happens that a file cannot be removed right now and can be removed an instance later. There may be many reasons for that: antivirus software, viruses, version control systems and whatever. Programmers often do not think of such cases. They believe that when you cannot remove a file you cannot remove it at all. But if you want to make everything well and avoid littering in directories, you should take these extraneous factors into account.

"Remove" vs. "unlink"

Okay, so you've managed to remove a file. The question is, have you freed some space on your disk? The answer is maybe.

Removing is called unlinking on purpose. What you do by "unlinking" is removing a reference from a directory index to a certain chunk on your hard disk. The chunk, nevertheless, may persist, especially if there are more references to it. One of the possibilities is hard links, but you explicitly created them. Another could be a badly working file system, but that was your choice.

However, there is a very harsh way to experience the difference between removing and unlinking. Assume you have a long-running program that writes something to its log file. The information it writes is not very useful, and you might want to discard it. As the program spends hours of working, the log file grows larger and larger, and you realize that you really should erase the log to prevent your disk from filling up. You rm -rf log_file, and... nothing happens. The log is still being written somewhere, and there is as little space as there was.

The thing is that you unlinked the file, but the file system's directory index wasn't the last place it was referenced from. Another place is the file handler of the program running, and it will be referenced until the file is closed. Otherwise, as far as I know, the only way to fix it is to shut the program down. So, do not turn on too much logging by default.

Uncannily removing sensitive files

Of course, you know this. It's like knowing that you should move carefully in a room full of rakes and nevertheless getting hit by one because it's just so obvious...

These are classical rm -rf $FOO/* when $FOO happens to be unset... These are when you type rm -rf /etc, and thus crush your system, instead of rm -rf etc to remove a backup copy of /etc you were so prudent to create. These are the famous bumblebee bugfix—or was it mere viral marketing?

***

Usually, you think that creating is easy, and destroying is simple. In many cases, however, it's not true. In Russia, for instance, it's much easier to open a business than to shut it. And when it gets to removing files, it's also sometimes true. So, if your program really depends on successful file removing, it should be more careful with such a seemingly simple operation.

Comments imported from the old website

Maxim Ivanov on 21 November 2011 commented:

In case of abusive logging, I found handy to run ">/path/to/logfile", which effectively frees up space.

Pavel Shved on 17 December 2011 commented:

@Maxim, this sometimes works, and sometimes does not. I'm yet to figure out the exact conditions one may rely on this under.