Subvolumes 子卷 

A BTRFS subvolume is a part of filesystem with its own independent file/directory hierarchy and inode number namespace. Subvolumes can share file extents. A snapshot is also subvolume, but with a given initial content of the original subvolume. A subvolume has always inode number 256 (see more in Inode numbers).
BTRFS 子卷是文件系统的一部分,具有独立的文件/目录层次结构和 inode 编号命名空间。子卷可以共享文件范围。快照也是子卷,但具有原始子卷的给定初始内容。子卷始终具有 inode 编号 256(请参阅更多信息 Inode 编号)。

Note 注意

A subvolume in BTRFS is not like an LVM logical volume, which is block-level snapshot while BTRFS subvolumes are file extent-based.
BTRFS 中的子卷不像 LVM 逻辑卷,后者是基于块级的快照,而 BTRFS 子卷是基于文件范围的。

A subvolume looks like a normal directory, with some additional operations described below. Subvolumes can be renamed or moved, nesting subvolumes is not restricted but has some implications regarding snapshotting. The numeric id (called subvolid or rootid) of the subvolume is persistent and cannot be changed.
子卷看起来像一个普通目录,具有下面描述的一些附加操作。子卷可以重命名或移动,嵌套子卷没有限制,但在快照方面有一些影响。子卷的数字 ID(称为 subvolid 或 rootid)是持久的,不能更改。

A subvolume in BTRFS can be accessed in two ways:
BTRFS 中的子卷可以通过两种方式访问:

  • like any other directory that is accessible to the user
    就像对用户可访问的任何其他目录一样

  • like a separately mounted filesystem (options subvol or subvolid)
    就像一个单独挂载的文件系统(选项 subvol 或 subvolid)

In the latter case the parent directory is not visible and accessible. This is similar to a bind mount, and in fact the subvolume mount does exactly that.
在后一种情况下,父目录是不可见和不可访问的。这类似于绑定挂载,事实上,子卷挂载就是这样做的。

A freshly created filesystem is also a subvolume, called top-level, internally has an id 5. This subvolume cannot be removed or replaced by another subvolume. This is also the subvolume that will be mounted by default, unless the default subvolume has been changed (see btrfs subvolume set-default).
新创建的文件系统也是一个子卷,称为顶层,内部具有 id 5。这个子卷不能被删除或替换为另一个子卷。这也是默认情况下将被挂载的子卷,除非默认子卷已更改(参见 btrfs subvolume set-default)。

A snapshot is a subvolume like any other, with given initial content. By default, snapshots are created read-write. File modifications in a snapshot do not affect the files in the original subvolume.
快照是像任何其他子卷一样具有给定初始内容的子卷。默认情况下,快照是以读写方式创建的。在快照中对文件的修改不会影响原始子卷中的文件。

Subvolumes can be given capacity limits, through the qgroups/quota facility, but otherwise share the single storage pool of the whole btrfs filesystem. They may even share data between themselves (through deduplication or snapshotting).
子卷可以通过 qgroups/quota 功能设置容量限制,但否则共享整个 btrfs 文件系统的单个存储池。它们甚至可以在彼此之间共享数据(通过去重或快照)。

Note 注意

A snapshot is not a backup: snapshots work by use of BTRFS’ copy-on-write behaviour. A snapshot and the original it was taken from initially share all of the same data blocks. If that data is damaged in some way (cosmic rays, bad disk sector, accident with dd to the disk), then the snapshot and the original will both be damaged. Snapshots are useful to have local online “copies” of the filesystem that can be referred back to, or to implement a form of deduplication, or to fix the state of a filesystem for making a full backup without anything changing underneath it. They do not in themselves make your data any safer.
快照不是备份:快照通过 BTRFS 的写时复制行为工作。快照和最初生成它的原始数据最初共享所有相同的数据块。如果数据以某种方式受损(宇宙射线、磁盘坏道、对磁盘的 dd 操作),那么快照和原始数据都将受损。快照对于拥有本地在线文件系统的“副本”以供参考,或者实现一种去重形式,或者修复文件系统的状态以进行完整备份而不会在其下面发生任何更改非常有用。它们本身并不会使您的数据更安全。

Subvolume flags 子卷标志 

The subvolume flag currently implemented is the ro property (read-only status). Read-write subvolumes have that set to false, snapshots as true. In addition to that, a plain snapshot will also have last change generation and creation generation equal.
目前实现的子卷标志是 ro 属性(只读状态)。读写子卷将其设置为 false,快照设置为 true。此外,普通快照还将具有相同的最后更改生成和创建生成。

Read-only snapshots are building blocks of incremental send (see btrfs-send(8)) and the whole use case relies on unmodified snapshots where the relative changes are generated from. Thus, changing the subvolume flags from read-only to read-write will break the assumptions and may lead to unexpected changes in the resulting incremental stream.
只读快照是增量发送的构建块(参见 btrfs-send(8)),整个用例依赖于未修改的快照,从中生成相对变化。因此,将子卷标志从只读更改为读写将打破假设,并可能导致生成的增量流中的意外更改。

A snapshot that was created by send/receive will be read-only, with different last change generation, read-only and with set received_uuid which identifies the subvolume on the filesystem that produced the stream. The use case relies on matching data on both sides. Changing the subvolume to read-write after it has been received requires to reset the received_uuid. As this is a notable change and could potentially break the incremental send use case, performing it by btrfs property set requires force if that is really desired by user.
通过发送/接收创建的快照将是只读的,具有不同的最后更改生成,只读并设置了 received_uuid,该标识在文件系统上产生流的子卷。用例依赖于双方匹配的数据。在接收后将子卷更改为读写需要重置 received_uuid。由于这是一个显着的更改,并且可能会破坏增量发送用例,如果用户确实希望执行此操作,则需要通过 btrfs property set 强制执行。

Note 注意

The safety checks have been implemented in 5.14.2, any subvolumes previously received (with a valid received_uuid) and read-write status may exist and could still lead to problems with send/receive. You can use btrfs subvolume show to identify them. Flipping the flags to read-only and back to read-write will reset the received_uuid manually. There may exist a convenience tool in the future.
安全检查已在 5.14.2 中实施,任何先前接收过的子卷(具有有效的 received_uuid)和读写状态可能仍然存在,并可能导致发送/接收问题。您可以使用 btrfs subvolume show 来识别它们。将标志翻转为只读,然后再切换为读写将手动重置 received_uuid。未来可能会存在一个方便的工具。

Nested subvolumes 嵌套子卷

There are no restrictions for subvolume creation, so it’s up to the user how to organize them, whether to have a flat layout (all subvolumes are direct descendants of the toplevel one), or nested.
对于子卷的创建没有限制,因此由用户决定如何组织它们,是采用平面布局(所有子卷都是顶层子卷的直接后代)还是嵌套布局。

What should be mentioned early is that a snapshotting is not recursive, so a subvolume or a snapshot is effectively a barrier and no files in the nested appear in the snapshot. Instead there’s a stub subvolume (also sometimes empty subvolume with the same name as original subvolume, with inode number 2). This can be used intentionally but could be confusing in case of nested layouts.
需要提前提到的是,快照不是递归的,因此子卷或快照实际上是一个屏障,嵌套中的文件不会出现在快照中。相反,会有一个存根子卷(有时也是与原始子卷同名的空子卷,inode 编号为 2)。这可以被有意使用,但在嵌套布局的情况下可能会令人困惑。

Case study: system root layouts
案例研究:系统根布局

There are two ways how the system root directory and subvolume layout could be organized. The interesting use case for root is to allow rollbacks to previous version, as one atomic step. If the entire filesystem hierarchy starting in / is in one subvolume, taking snapshot will encompass all files. This is easy for the snapshotting part but has undesirable consequences for rollback. For example, log files would get rolled back too, or any data that are stored on the root filesystem but are not meant to be rolled back either (database files, VM images, …).
系统根目录和子卷布局有两种组织方式。根的有趣用例是允许一次性回滚到以前的版本。如果从 / 开始的整个文件系统层次结构在一个子卷中,那么拍摄快照将包含所有文件。这对于快照部分来说很容易,但对于回滚来说有不良后果。例如,日志文件也会被回滚,或者存储在根文件系统上但不打算回滚的任何数据(数据库文件、VM 镜像等)。

Here we could utilize the snapshotting barrier mentioned above, each directory that stores data to be preserved across rollbacks is it’s own subvolume. This could be e.g. /var. Further more-fine grained partitioning could be done, e.g. adding separate subvolumes for /var/log, /var/cache etc.
在这里,我们可以利用上面提到的快照障碍,每个存储要在回滚中保留的数据的目录都是它自己的子卷。例如,这可以是 /var 。此外,可以进行更细粒度的分区,例如为 /var/log/var/cache 等添加单独的子卷。

That there are separate subvolumes requires separate actions to take the snapshots (here it gets disconnected from the system root snapshots). This needs to be taken care of by system tools, installers together with selection of which directories are highly recommended to be separate subvolumes.
存在单独的子卷需要单独的操作来进行快照(这里它与系统根快照断开连接)。这需要由系统工具、安装程序以及选择哪些目录强烈建议成为单独的子卷来处理。

Mount options 挂载选项

Mount options are of two kinds, generic (that are handled by VFS layer) and specific, handled by the filesystem. The following list shows which are applicable to individual subvolume mounts, while there are more options that always affect the whole filesystem:
挂载选项有两种类型,一种是通用的(由 VFS 层处理),另一种是特定的,由文件系统处理。以下列表显示了哪些适用于单个子卷挂载的选项,同时还有更多选项总是影响整个文件系统:

  • generic: noatime/relatime/…, nodev, nosuid, ro, rw, dirsync
    通用: noatime/relatime/…, nodev, nosuid, ro, rw, dirsync

  • fs-specific: compress, autodefrag, nodatacow, nodatasum
    特定于文件系统: 压缩, autodefrag, nodatacow, nodatasum

An example of whole filesystem options is e.g. space_cache, rescue, device, skip_balance, etc. The exceptional options are subvol and subvolid that are actually used for mounting a given subvolume and can be specified only once for the mount.
整个文件系统选项的一个示例是例如 space_cache, rescue, device, skip_balance 等。特殊选项是 subvol 和 subvolid,实际上用于挂载给定的子卷,并且只能为挂载指定一次。

Subvolumes belong to a single filesystem and as implemented now all share the same specific mount options, changes done by remount have immediate effect. This may change in the future.
子卷属于单个文件系统,目前实现时都共享相同的特定挂载选项,重新挂载所做的更改会立即生效。这在将来可能会发生变化。

Mounting a read-write snapshot as read-only is possible and will not change the ro property and flag of the subvolume.
将读写快照挂载为只读是可能的,并且不会更改子卷的只读属性和标志。

The name of the mounted subvolume is stored in file /proc/self/mountinfo in the 4th column:
挂载子卷的名称存储在文件 /proc/self/mountinfo 的第 4 列中:

27 21 0:19 /subv1 /mnt rw,relatime - btrfs /dev/sda rw,space_cache
           ^^^^^^

Inode numbers i 节点编号

A directory representing a subvolume has always inode number 256 (sometimes also called a root of the subvolume):
代表子卷的目录始终具有 i 节点编号 256(有时也称为子卷的根):

$ ls -lis
total 0
389111 0 drwxr-xr-x 1 user users 0 Jan 20 12:13 dir
389110 0 -rw-r--r-- 1 user users 0 Jan 20 12:13 file
   256 0 drwxr-xr-x 1 user users 0 Jan 20 12:13 snap1
   256 0 drwxr-xr-x 1 user users 0 Jan 20 12:13 subv1

If a subvolume is nested and then a snapshot is taken, then the cloned directory entry representing the subvolume becomes empty and the inode has number 2. All other files and directories in the target snapshot preserve their original inode numbers.
如果子卷被嵌套,然后拍摄快照,那么代表子卷的克隆目录条目变为空,i 节点编号为 2。目标快照中的所有其他文件和目录保留其原始 i 节点编号。

Note 注意

Inode number is not a filesystem-wide unique identifier, some applications assume that. Please use pair subvolumeid:inodenumber for that purpose. The subvolume id can be read by btrfs inspect-internal rootid or by the ioctl BTRFS_IOC_INO_LOOKUP.
Inode 编号不是文件系统范围内的唯一标识符,一些应用程序会假定它是。请使用配对的子卷 ID:inode 编号来实现此目的。子卷 ID 可以通过 btrfs inspect-internal rootid 或 ioctl BTRFS_IOC_INO_LOOKUP 读取。

Performance 性能

Subvolume creation needs to flush dirty data that belong to the subvolume, this step may take some time, otherwise once there’s nothing else to do, the snapshot is instant and in the metadata it only creates a new tree root copy.
子卷创建需要刷新属于子卷的脏数据,这一步可能需要一些时间,否则一旦没有其他事情要做,快照就会立即生成,并且在元数据中只创建一个新的树根副本。

Snapshot deletion has two phases: first its directory is deleted and the subvolume is added to a list, then the list is processed one by one and the data related to the subvolume get deleted. This is usually called cleaning and can take some time depending on the amount of shared blocks (can be a lot of metadata updates), and the number of currently queued deleted subvolumes.
快照删除有两个阶段:首先删除其目录并将子卷添加到列表,然后逐个处理列表并删除与子卷相关的数据。这通常被称为清理,根据共享块的数量(可能有大量元数据更新)和当前排队删除的子卷数量,可能需要一些时间。