Compression 压缩
Btrfs supports transparent file compression. There are three algorithms
available: ZLIB, LZO and ZSTD (since v4.14), with various levels.
The compression happens on the level of file extents and the algorithm is
selected by file property, mount option or by a defrag command.
You can have a single btrfs mount point that has some files that are
uncompressed, some that are compressed with LZO, some with ZLIB, for instance
(though you may not want it that way, it is supported).
Btrfs 支持透明文件压缩。有三种可用的算法:ZLIB、LZO 和 ZSTD(自 v4.14 起),具有不同的级别。压缩发生在文件范围的级别上,并且算法是通过文件属性、挂载选项或 defrag 命令选择的。您可以有一个单独的 btrfs 挂载点,其中有一些文件是未压缩的,一些使用 LZO 压缩,一些使用 ZLIB 压缩,例如(尽管您可能不希望这样,但是支持)。
Once the compression is set, all newly written data will be compressed, i.e.
existing data are untouched. Data are split into smaller chunks (128KiB) before
compression to make random rewrites possible without a high performance hit. Due
to the increased number of extents the metadata consumption is higher. The
chunks are compressed in parallel.
一旦设置了压缩,所有新写入的数据都将被压缩,即现有数据不会受到影响。数据在压缩之前被分成较小的块(128KiB),以便进行随机重写而不会受到性能损失。由于范围数量的增加,元数据消耗更高。这些块是并行压缩的。
The algorithms can be characterized as follows regarding the speed/ratio
trade-offs:
算法可以根据速度/比率折衷来描述如下:
- ZLIB
slower, higher compression ratio
较慢,压缩比较高levels: 1 to 9, mapped directly, default level is 3
级别:1 到 9,直接映射,默认级别为 3good backward compatibility
良好的向后兼容性
- LZO
faster compression and decompression than ZLIB, worse compression ratio, designed to be fast
比 ZLIB 更快的压缩和解压缩速度,压缩比较差,旨在提高速度no levels 没有级别
good backward compatibility
良好的向后兼容性
- ZSTD
compression comparable to ZLIB with higher compression/decompression speeds and different ratio
具有比 ZLIB 更高的压缩/解压缩速度和不同比率的压缩levels: 1 to 15, mapped directly (higher levels are not available)
级别:1 到 15,直接映射(较高级别不可用)since 4.14, levels since 5.1
自 4.14 起,自 5.1 起的级别
The differences depend on the actual data set and cannot be expressed by a
single number or recommendation. Higher levels consume more CPU time and may
not bring a significant improvement, lower levels are close to real time.
差异取决于实际数据集,无法用单个数字或建议来表达。更高的级别消耗更多的 CPU 时间,可能不会带来显著的改进,较低的级别接近实时。
How to enable compression
如何启用压缩
Typically the compression can be enabled on the whole filesystem, specified for
the mount point. Note that the compression mount options are shared among all
mounts of the same filesystem, either bind mounts or subvolume mounts.
Please refer to btrfs(5) section
MOUNT OPTIONS.
通常可以在整个文件系统上启用压缩,指定挂载点。请注意,压缩挂载选项在同一文件系统的所有挂载之间共享,无论是绑定挂载还是子卷挂载。请参阅 btrfs(5) 章节 MOUNT OPTIONS。
$ mount -o compress=zstd /dev/sdx /mnt
This will enable the zstd
algorithm on the default level (which is 3).
The level can be specified manually too like zstd:3
. Higher levels compress
better at the cost of time. This in turn may cause increased write latency, low
levels are suitable for real-time compression and on reasonably fast CPU don’t
cause noticeable performance drops.
这将在默认级别(即 3 级)上启用 zstd
算法。也可以手动指定级别,如 zstd:3
。更高级别在时间成本上压缩效果更好。这反过来可能导致写入延迟增加,低级别适用于实时压缩,在相对较快的 CPU 上不会引起明显的性能下降。
$ btrfs filesystem defrag -czstd file
The command above will start defragmentation of the whole file and apply
the compression, regardless of the mount option. (Note: specifying level is not
yet implemented). The compression algorithm is not persistent and applies only
to the defragmentation command, for any other writes other compression settings
apply.
上述命令将启动整个文件的碎片整理并应用压缩,不考虑挂载选项。(注意:尚未实现指定级别)。压缩算法不是持久的,仅适用于碎片整理命令,对于任何其他写入,其他压缩设置适用。
Persistent settings on a per-file basis can be set in two ways:
可以通过两种方式在每个文件基础上设置持久设置:
$ chattr +c file
$ btrfs property set file compression zstd
The first command is using legacy interface of file attributes inherited from
ext2 filesystem and is not flexible, so by default the zlib compression is
set. The other command sets a property on the file with the given algorithm.
(Note: setting level that way is not yet implemented.)
第一个命令使用从 ext2 文件系统继承的文件属性的传统接口,不够灵活,因此默认设置为 zlib 压缩。另一个命令在文件上设置了给定算法的属性。(注意:目前尚未实现通过这种方式设置级别。)
Compression levels 压缩级别
The level support of ZLIB has been added in v4.14, LZO does not support levels
(the kernel implementation provides only one), ZSTD level support has been added
in v5.1.
ZLIB 的级别支持已在 v4.14 中添加,LZO 不支持级别(内核实现仅提供一个级别),ZSTD 级别支持已在 v5.1 中添加。
There are 9 levels of ZLIB supported (1 to 9), mapping 1:1 from the mount option
to the algorithm defined level. The default is level 3, which provides the
reasonably good compression ratio and is still reasonably fast. The difference
in compression gain of levels 7, 8 and 9 is comparable but the higher levels
take longer.
ZLIB 支持 9 个级别(1 到 9),从挂载选项到算法定义级别的 1:1 映射。默认级别为 3,提供合理的压缩比并且仍然相当快。级别 7、8 和 9 的压缩增益相当,但较高级别需要更长时间。
The ZSTD support includes levels 1 to 15, a subset of full range of what ZSTD
provides. Levels 1-3 are real-time, 4-8 slower with improved compression and
9-15 try even harder though the resulting size may not be significantly improved.
ZSTD 支持级别 1 到 15,是 ZSTD 提供的完整范围的子集。级别 1-3 是实时的,4-8 较慢但具有改进的压缩,9-15 则更加努力,尽管结果大小可能没有显著改善。
Level 0 always maps to the default. The compression level does not affect
compatibility.
级别 0 总是映射到默认值。压缩级别不影响兼容性。
Incompressible data 不可压缩的数据
Files with already compressed data or with data that won’t compress well with
the CPU and memory constraints of the kernel implementations are using a simple
decision logic. If the first portion of data being compressed is not smaller
than the original, the compression of the file is disabled -- unless the
filesystem is mounted with compress-force. In that case compression will
always be attempted on the file only to be later discarded. This is not optimal
and subject to optimizations and further development.
具有已经压缩数据或者数据不适合使用 CPU 和内存约束的内核实现进行压缩的文件,使用简单的决策逻辑。如果要压缩的数据的第一部分不比原始数据小,文件的压缩将被禁用 -- 除非文件系统已经使用了 compress-force 进行挂载。在这种情况下,将始终尝试对文件进行压缩,但最终会被丢弃。这并不是最佳方案,可以进行优化和进一步的开发。
If a file is identified as incompressible, a flag is set (NOCOMPRESS) and it’s
sticky. On that file compression won’t be performed unless forced. The flag
can be also set by chattr +m (since e2fsprogs 1.46.2) or by properties with
value no or none. Empty value will reset it to the default that’s currently
applicable on the mounted filesystem.
如果文件被识别为不可压缩,会设置一个标志(NOCOMPRESS)并且是粘性的。在这个文件上,除非强制执行,否则不会进行压缩。该标志也可以通过 chattr +m 进行设置(自 e2fsprogs 1.46.2 起),或者通过值为 no 或 none 的属性进行设置。空值将重置为当前适用于挂载文件系统的默认值。
There are two ways to detect incompressible data:
有两种方法可以检测不可压缩的数据:
actual compression attempt - data are compressed, if the result is not smaller, it’s discarded, so this depends on the algorithm and level
实际压缩尝试 - 如果数据被压缩后结果没有变小,则被丢弃,因此这取决于算法和级别pre-compression heuristics - a quick statistical evaluation on the data is performed and based on the result either compression is performed or skipped, the NOCOMPRESS bit is not set just by the heuristic, only if the compression algorithm does not make an improvement
预压缩启发式 - 对数据进行快速统计评估,根据结果进行压缩或跳过,NOCOMPRESS 位不仅仅由启发式设置,只有在压缩算法没有改进时才会设置
$ lsattr file
---------------------m file
Using the forcing compression is not recommended, the heuristics are
supposed to decide that and compression algorithms internally detect
incompressible data too.
不建议使用强制压缩,启发式算法应该决定压缩算法内部也会检测无法压缩的数据。
Pre-compression heuristics
预压缩启发式算法
The heuristics aim to do a few quick statistical tests on the compressed data
in order to avoid probably costly compression that would turn out to be
inefficient. Compression algorithms could have internal detection of
incompressible data too but this leads to more overhead as the compression is
done in another thread and has to write the data anyway. The heuristic is
read-only and can utilize cached memory.
启发式算法旨在对压缩数据进行一些快速的统计测试,以避免可能代价高昂的压缩,这样会导致效率低下。压缩算法也可能具有内部检测无法压缩数据的功能,但这会增加更多开销,因为压缩是在另一个线程中进行的,仍然需要写入数据。启发式算法是只读的,可以利用缓存内存。
The tests performed based on the following: data sampling, long repeated
pattern detection, byte frequency, Shannon entropy.
基于以下进行的测试:数据采样、长重复模式检测、字节频率、香农熵。
Compatibility 兼容性
Compression is done using the COW mechanism so it’s incompatible with
nodatacow. Direct IO read works on compressed files but will fall back to
buffered writes and leads to no compression even if force compression is set.
Currently nodatasum and compression don’t work together.
使用 COW 机制进行压缩,因此与 nodatacow 不兼容。直接 IO 读取在压缩文件上运行,但会退回到缓冲写入,并导致即使设置了强制压缩也不会进行压缩。目前 nodatasum 和压缩不能一起工作。
The compression algorithms have been added over time so the version
compatibility should be also considered, together with other tools that may
access the compressed data like bootloaders.
随着时间的推移,压缩算法已被添加,因此还应考虑版本兼容性,以及可能访问压缩数据的其他工具,如引导加载程序。