Tree checker 树检查器

Tree checker is a feature that verifies metadata blocks before write or after read from the devices. The b-tree nodes contain several items describing the filesystem structures and to some degree can be verified for consistency or validity. This is an additional check to the checksums that only verify the overall block status while the tree checker tries to validate and cross reference the logical structure. This takes a slight performance hit but is comparable to calculating the checksum and has no noticeable impact while it does catch all sorts of errors.
树检查器是一项功能,用于在从设备写入或读取元数据块之前验证元数据块。B 树节点包含描述文件系统结构的多个项目,并且在一定程度上可以验证其一致性或有效性。这是对仅验证整体块状态的校验和的额外检查,而树检查器尝试验证和交叉引用逻辑结构。这会略微影响性能,但与计算校验和相比,影响可以忽略不计,同时它可以捕获各种错误。

There are two occasions when the checks are done:
进行检查的两个场合是:

Pre-write checks 预写检查

When metadata blocks are in memory and about to be written to the permanent storage, the checks are performed, before the checksums are calculated. This can catch random corruptions of the blocks (or pages) either caused by bugs or by other parts of the system or hardware errors (namely faulty RAM).
当元数据块在内存中并且即将写入永久存储时,在计算校验和之前执行检查。这可以捕捉到由于错误或系统其他部分或硬件错误(即有缺陷的 RAM)而导致的块(或页面)的随机损坏。

Once a block does not pass the checks, the filesystem refuses to write more data and turns itself to read-only mode to prevent further damage. At this point some the recent metadata updates are held only in memory so it’s best to not panic and try to remember what files could be affected and copy them elsewhere. Once the filesystem gets unmounted, the most recent changes are unfortunately lost. The filesystem that is stored on the device is still consistent and should mount fine.
一旦一个块未通过检查,文件系统会拒绝写入更多数据,并将自身转为只读模式以防止进一步损坏。此时,一些最近的元数据更新仅保存在内存中,因此最好不要惊慌,尝试记住可能受影响的文件并将其复制到其他地方。一旦文件系统被卸载,最近的更改将不幸丢失。存储在设备上的文件系统仍然一致,应该可以正常挂载。

A message may look like:
一条消息可能如下所示:

[ 1716.823895] BTRFS critical (device vdb): corrupt leaf: root=18446744073709551607 block=38092800 slot=0, invalid key objectid: has 1 expect 6 or [256, 18446744073709551360] or 18446744073709551604
[ 1716.829499] BTRFS info (device vdb): leaf 38092800 gen 19 total ptrs 4 free space 15851 owner 18446744073709551607
[ 1716.832891] BTRFS info (device vdb): refs 3 lock (w:0 r:0 bw:0 br:0 sw:0 sr:0) lock_owner 0 current 1506
[ 1716.836054]  item 0 key (1 1 0) itemoff 16123 itemsize 160
[ 1716.837993]          inode generation 1 size 0 mode 100600
[ 1716.839760]  item 1 key (256 1 0) itemoff 15963 itemsize 160
[ 1716.841742]          inode generation 4 size 0 mode 40755
[ 1716.843393]  item 2 key (256 12 256) itemoff 15951 itemsize 12
[ 1716.845320]  item 3 key (18446744073709551611 48 1) itemoff 15951 itemsize 0
[ 1716.847505] BTRFS error (device vdb): block=38092800 write time tree block corruption detected

The line(s) before the write time tree block corruption detected message is specific to the found error.
写入时间树块损坏检测消息之前的行与发现的错误具体相关。

Post-read checks 后读取检查 

Metadata blocks get verified right after they’re read from devices and the checksum is found to be valid. This protects against changes to the metadata that could possibly also update the checksum, less likely to happen accidentally but rather due to intentional corruption or fuzzing.
元数据块在从设备读取后立即进行验证,并发现校验和有效。这可以防止对元数据的更改,这些更改可能还会更新校验和,不太可能发生意外,而更可能是由于有意的损坏或模糊化。

[ 4823.612832] BTRFS critical (device vdb): corrupt leaf: root=7 block=30474240 slot=0, invalid nritems, have 0 should not be 0 for non-root leaf
[ 4823.616798] BTRFS error (device vdb): block=30474240 read time tree block corruption detected

The checks 检查

As implemented right now, the metadata consistency is limited to one b-tree node and what items are stored there, i.e. there’s no extensive or broad check done e.g. against other data structures in other b-tree nodes. This still provides enough opportunities to verify consistency of individual items, besides verifying general validity of the items like the length or offset. The b-tree items are also coupled with a key so proper key ordering is also part of the check and can reveal random bitflips in the sequence (this has been the most successful detector of faulty RAM).
就目前的实现而言,元数据一致性仅限于一个 B 树节点和存储在其中的项目,即没有对其他 B 树节点中的其他数据结构进行广泛的检查。这仍然提供了足够的机会来验证各个项目的一致性,除了验证项目的一般有效性,如长度或偏移量。B 树项目还与一个键关联,因此适当的键排序也是检查的一部分,并且可以揭示序列中的随机位翻转(这是最成功的检测器之一,用于检测有故障的 RAM)。

The capabilities of tree checker have been improved over time and it’s possible that a filesystem created on an older kernel may trigger warnings or fail some checks on a new one.
树检查器的功能随着时间的推移得到了改进,有可能在旧内核上创建的文件系统可能会在新内核上触发警告或未通过某些检查。

Reporting problems 报告问题

In many cases the bug is caused by hardware and cannot be automatically fixed by btrfs check --repair, so do not try that without being advised to. Even if the error is unfixable it’s useful to report it, either to validate the cause but also to give more ideas how to improve the tree checker. Please consider reporting it to the mailing list linux-btrfs@vger.kernel.org.
在许多情况下,错误是由硬件引起的,无法通过 btrfs check --repair 自动修复,因此请在未经建议的情况下不要尝试。即使错误无法修复,报告错误也很有用,既可以验证原因,也可以提供更多改进树检查器的想法。请考虑将其报告给邮件列表 linux-btrfs@vger.kernel.org。