Recent blog updates

Shown here are only posts related to reiser4. You can view all posts here.

Do not use Btrfs!

"Btrfs" is a new, modern filesystem, which has always been a nice thing to try for any Linux aficionado. When you're bored, and if your Linux installation is too stable, you can always try one of new cool filesystems to make your machine faster, cooler, and less stable.

ReiserFS, named after its developer Hans Reiser, was the first Linux filesystem with journaling support. It also was known as the fastest file system to deal with large amounts of small files. It had none of the Btrfs' problems I describe here.

The successor of this filesystem, Reiser4 should have had even better small file support. It also was innovative, and allowed transient compression of content (which worked, by the way), but it has never been finished.

Hans, being in prison, doesn't continue the development of the new filesystem, and, it seems, that others just fail to keep the pace. reiser4 is still not included in Linux kernel. I'm pretty sure that it's not just because of non-technical "personality issues", but of architectural incompatibility with the kernel philosophy (and this) as well.

It's sad that prison seems to have disrupted Hans' mind. We probably will never have the same drive in Linux file system world.

After Hanz Reizer has been sentenced for a dozen of years in prison for killing his wife (I'm pretty sure he thought he had good reasons), we can't take reiser4 seriously anymore. I had a sore experience installing reiser4, but, unfortunately, it has never entered a stable state (and, most likely, never will).

When Hans, in addition to his wife, had killed my root filesystem through his unstable evil creation, I decided to install something cool again. My choice was Btrfs. It is a new fast filesystem, and it was claimed to be fast as hell. Moreover, it became the default filesystem for MeeGo mobile Linux OS standard, which is now nearly dead due to unrelated circumstances.

Aside from being just new, Btrfs offers a lot of unique features. One of them is that it's a copy-on-write filesystem, which allows a user to make snapshots of the whole file system on the same partition without consuming twice as much disk space. You can also roll back to a saved state immediately, without re-writing everything. However, I recently learned an already known way to make these operations fast with any filesystem, so Btrfs' advantage here is questionable. An "only" advantage left is its speed.

After I used Btrfs both at home and at work, I now advise to steer clear of it. At least, until its issues are resolved.

I'm not a kernel hacker of any sort, I haven't looked at Btrfs source code, and I never even read its specs. I just share my experience with it as a user. The post may already even be out-of date, and the state-of-the art tools and implementation may have already fixed the problems I describe.

Lack of proper tools

Btrfs still doesn't have file system checker that fixes the FS when computer was powered off or reset. Reference: its own wiki. I really don't understand how come Btrfs was used at real devices. I remember I really once had to run file-system check for my N900 mobile phone after an unfortunate battery exhaustion. Will my MeeGo device lose all my data after its battery dies?

And you know what? This "we still don't have file system recovery tool" message has been hanging there for more than a year. I guess, it will never be actually implemented.

So, it's already a good reason to never use it for sensitive data, such as /home partition.

Irrecoverable disk pollution

Okay, so, Btrfs is not very reliable to have, for instance, your /home partition on it. But we could use it for a temporary file system that doesn't store important data!.. could we?

I tried Btrfs on one of the nodes of our Linux Driver Verification cluster as a file system for a 130 Gb partition. Here's the workflow the filesystem underwent:

  1. Write a lot of small files
  2. Realize that the disk is nearly full
  3. Erase the small files written
  4. Go to step 1

On day I realised that the cluster node just was not working. I checked the logs and I realized that there was no space left on the device. I ran df -h to check how much space there really was, and it showed that there were... 13 gigabytes free!

It could mean only that Btrfs just damn secretly used more than 10% of disk for its own metadata. Using more disk space than the overall file size is normal, but here it uses it unpredictably: Btrfs lies about the real amount of space left. Some mailing list postings confirmed that, but I can't find the links now. The only suggestion available was to "balance" the btrfs with its tools. But it didn't work for me!. The file system remained in its unusable state.

Then I decided to try to remove some files. I typed rm -rf directory/, and... File removal resulted in a "no space left on device" error! So when I try to clear some space with rm it just outputs "no space left" errors. How nice.

However, it actually did remove the files, but the process merely took more time than usual (about 2 files removed per minute), and had less than 1% success rate. The recommended elsewhere echo >file/to/remove didn't work either. After 30 hours of removing, the filesystem has traversed the 130 Gb folder, and removed a couple of gigabytes successfully. The subsequent rm -rf cleared the disk in no time...

... just to enter the same crippled, unusable state, but with 27 Gb "free" (more than 20% of the filesystem) after a couple of weeks! That time, even rm -rfs couldn't help. We re-formatted the partition into ext4.

By the way, Edward Shishkin is the primary developer of reiser4 at the moment, i.e. he's a competitor of Btrfs. If it was not for my own experience, I wouldn't think that his observations and conclusions are grass-root.

I guess that my observations confirm Edward Shishkin's review of Btrfs: the filesystem is space-ineffeicient without any good reasons, and its tree balancing algorithms are apparently broken.

And it is also not fast!

Okay, but, initially, I mentioned that Btrfs was, at least, fast. Not just working with lots of small files should be faster, but the overall workflow should have had the speed increase.

It does not.

My home PC uses Btrfs as its root filesystem. A media storage, which I keep movies on, however, uses XFS. Now guess what happens when I try to play an HD movie from the XFS partition?

No, you're wrong. It doesn't play smoothly. It just stalls unless I increase the I/O priority of the movie player to the highest value. Because btrfs-transactio process wakes up each five seconds and sends my PC to a non-interruptible I/O sleep. This renders my PC unusable for less than 0.5 second, but it is enough to crush my HD movie experience: it just wouldn't play because disk throughput is not enough for a small, but significant amount of time.

In production, however, I also noticed that. A month ago I introduced a change to my BLAST project. I now realize that it was the Btrfs to blame, but back then I just saw that when a process is about to write to a file 10 times per second it sometimes stalls for a second before the write succeeds. This is intolerable for any serious file system.

Perhaps, during the common usage cycle, when files are not re-recorded several times per second, it wouldn't be an issue at all. However, the way I use file systems at work is not "common", and it should satisfy more requirements than those of users that just never write to their files.

***

Finally, the solution to Disk I/O speed seems to have come not from tricky file systems. Solid-state drives (SSDs) with their awesome read/write speeds have made filesystems much less relevant. The main challenge for file system developers would be to adapt to SSD's requirements. And it is natural! We all are pretty damn sure that sequential reads-writes should be faster than random I/O; we are used to placing our swap partitions at the beginning of the drive, to increase the speed they're accessed. We treat magnet disks requirements as granted without really noticing this.

Time to change the paradigm. As of today, if you want a faster disk I/O, the best choice would probably be an SSD, not a faster file system.

To me, as I described above, Btrfs doesn't seem a good choice both at home and in production. Its features do not overcome the fact that it's just unusable. Until it's fixed, do not use Btrfs.

Read on | Comments (0) | Make a comment >>


More posts about reiser4 >>