Notes about Linux Btrfs

Updated: 2017-11-10
Created: 2016

This page as of 171109 is quite draft, being incomplete, but the written parts are likely to be mostly accurate.

Btrfs terms (171111)

This list of Btrfs terms in effect also describes the overall external and internal structure of a Btrfs instance.

Btrfs status (171109)

What currently works quite reliably
  • Basic POSIX filesystem features, and copy-on-write updates, work quite well.
  • Subvolumes and snapshots and reflinking work quite well, with some scalability limitations.
  • Recoverability in case of issues is fairly good within the limits above.
What currently works with some limitations:
  • Checksumming works well but it has several surprising limitations and costs a lot of CPU time. The limitations are that it leads to damage amplification unless metadata is redundant, does not necessarily work with direct IO, and is not very useful with NFS clients.
  • The send/receive copy methods work but they can be subtle to understand and don't copy some less used metadata like inode flags.
  • Automatic compression of files. The main limitation is that there are many corner cases with surprising performance behaviours related to compression. I personally think it is not worth the complications.
  • Updating-in-place of large files works well but usually leads to greatly fragmented extents.
  • Defragmentation works but since it works by making copies of files it breaks reflinking and thus can greatly increase space used.
  • Full balance can be very slow.
  • Very little testing and use is done on platforms other than amd64.
  • Like all current local filesystems, volumes larger than 4-8TB work well but have severe scalablity problems in maintenance operations.
  • Two level allocation in chunks and nodes can lead to a situation where all chunks are allocated but there is plenty of free nodes, which can require simple but subtle workarounds.
  • For some hard-to-imagine reasons non-privileged users can create subvolumes and snapshots without limit.
Things that don't quite work
  • Multi device volumes are fundamentally misdesigned, so even those implemented correctly behave strangely in some important situations. The profiles single and raid0, raid1 and raid10 mostly work. The raid1, raid10 profiles in particular have unpleasant corner cases when operating degraded (a partial fix is avaiable from kernel 4.14). There is no data loss, but consequences can be time consuming.
  • Quota groups have severe scalability problems and other problems.
  • Too many snapshots (more than 20-50 let's say) and other forms of reflinking have severe scalability issues.
  • The ssd space allocator behaves badly on almost every devide and situation.
  • In particular be careful about the list of major issues.

So my recommended pattern of use is:

Btrfs hints

Btrfs major issues (171109)

Another list of major issues.

kernel version dependent hints (170621)

Another list of kernel version dependent hints.

Tools version dependent hints (170621)

Tools version dependent hints:

kernel version independent hints (170621)

Btrfs references (171109)

Some of my notes on Btrfs (171111)

These are pointers to some of the entries in my technical blog where Btrfs is discussed: