31 Jul 2014 etbe   » (Master)

BTRFS Status July 2014

My last BTRFS status report was in April [1], it wasn’t the most positive report with data corruption and system hangs. Hacker News has a brief discussion of BTRFS which includes the statement “Russell Coker’s reports of his experiences with BTRFS give me the screaming heebie-jeebies, no matter how up-beat and positive he stays about it” [2] (that’s one of my favorite comments about my blog).

Since April things have worked better. Linux kernel 3.14 solves the worst problems I had with 3.13 and it’s generally doing everything I want it to do. I now have cron jobs making snapshots as often as I wish (as frequently as every 15 minutes on some systems), automatically removing snapshots (removing 500+ snapshots at once doesn’t hang the system), balancing, and scrubbing. The fact that I can now expect that a filesystem balance (which is a type of defragment operation for BTRFS that frees some “chunks”) from a cron job and expect the system not to hang means that I haven’t run out of metadata chunk space. I expect that running out of metadata space can still cause filesystem deadlocks given a lack of reports on the BTRFS mailing list of fixes in that regard, but as long as balance works well we can work around that.

My main workstation now has 35 days of uptime and my home server has 90 days of uptime. Also the server that stores my email now has 93 days uptime even though it’s running Linux kernel 3.13.10. I am rather nervous about the server running 3.13.10 because in my experience every kernel before 3.14.1 had BTRFS problems that would cause system hangs. I don’t want a server that’s an hour’s drive away to hang…

The server that runs my email is using kernel 3.13.10 because when I briefly tried a 3.14 kernel it didn’t work reliably with the Xen kernel 4.1 from Debian/Wheezy and I had a choice of using the Xen kernel 4.3 from Debian/Unstable to match the Linux kernel or use an earlier Linux kernel. I have a couple of Xen servers running Debian/Unstable for test purposes which are working well so I may upgrade my mail server to the latest Xen and Linux kernels from Unstable in the near future. But for the moment I’m just not doing many snapshots and never running a filesystem scrub on that server.

Scrubbing

In kernel 3.14 scrub is working reliably for me and I have cron jobs to scrub filesystems on every system running that kernel. So far I’ve never seen it report an error on a system that matters to me but I expect that it will happen eventually.

The paper “An Analysis of Data Corruption in the Storage Stack” from the University of Wisconsin (based on NetApp data) [3] shows that “nearline” disks (IE any disks I can afford) have an incidence of checksum errors (occasions when the disk returns bad data but claims it to be good) of about 0.42%. There are 18 disks running in systems I personally care about (as opposed to systems where I am paid to care) so with a 0.42% probability of a disk experiencing data corruption per year that would give a 7.3% probability of having such corruption on one disk in any year and a greater than 50% chance that it’s already happened over the last 10 years. Of the 18 disks in question 15 are currently running BTRFS. Of the 15 running BTRFS 10 are scrubbed regularly (the other 5 are systems that don’t run 24*7 and the system running kernel 3.13.10).

Newer Kernels

The discussion on the BTRFS mailing list about kernel 3.15 is mostly about hangs. This is correlated with some changes to improve performance so I presume that it has exposed race conditions. Based on those discussions I haven’t felt inclined to run a 3.15 kernel. As the developers already have some good bug reports I don’t think that I could provide any benefit by doing more testing at this time. I think that there would be no benefit to me personally or the Linux community in testing 3.15.

I don’t have a personal interest in RAID-5 or RAID-6. The only systems I run that have more data than will fit on a RAID-1 array of cheap SATA disks are ones that I am paid to run – and they are running ZFS. So the ongoing development of RAID-5 and RAID-6 code isn’t an incentive for me to run newer kernels. Eventually I’ll test out RAID-6 code, but at the moment I don’t think they need more bug reports in this area.

I don’t have a great personal interest in filesystem performance at this time. There are some serious BTRFS performance issues. One problem is that a filesystem balance and subtree removal seem to take excessive amounts of CPU time. Another is that there isn’t much support for balancing IO to multiple devices (in RAID-1 every process has all it’s read requests sent to one device). For large-scale use of a filesystem these are significant problems. But when you have basic requirements (such as a mail server for dozens of users or a personal workstation with a quad-core CPU and fast SSD storage) it doesn’t make much difference. Currently all of my systems which use BTRFS have storage hardware that exceeds the system performance requirements by such a large margin that nothing other than installing Debian packages can slow the system down. So while there are performance improvements in newer versions of the BTRFS kernel code that isn’t an incentive for me to upgrade.

It’s just been announced that Debian/Jessie will use Linux 3.16, so I guess I’ll have to test that a bit for the benefit of Debian users. I am concerned that 3.16 won’t be stable enough for typical users at the time that Jessie is released.

Related posts:

  1. BTRFS Status March 2014 I’m currently using BTRFS on most systems that I can...
  2. BTRFS Status April 2014 Since my blog post about BTRFS in March [1] not...
  3. Starting with BTRFS Based on my investigation of RAID reliability [1] I have...

Syndicated 2014-07-31 10:45:10 from etbe - Russell Coker

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!