mkfs.btrfs(8)
SYNOPSIS 概要
mkfs.btrfs [options] <device> [<device>…]
mkfs.btrfs [选项] <设备> [<设备>…]
DESCRIPTION 描述
mkfs.btrfs is used to create the btrfs filesystem on a single or multiple
devices. The device is typically a block device but can be a file-backed image
as well. Multiple devices are grouped by UUID of the filesystem.
mkfs.btrfs 用于在单个或多个设备上创建 btrfs 文件系统。设备通常是块设备,但也可以是支持文件的映像。多个设备由文件系统的 UUID 分组。
Before mounting such filesystem, the kernel module must know all the devices
either via preceding execution of btrfs device scan or using the device
mount option. See section MULTIPLE DEVICES
for more details.
在挂载此类文件系统之前,内核模块必须通过之前执行 btrfs 设备扫描或使用设备挂载选项来了解所有设备。有关更多详细信息,请参阅“多设备”部分。
The default block group profiles for data and metadata depend on number of
devices and possibly other factors. It’s recommended to use specific profiles
but the defaults should be OK and allowing future conversions to other profiles.
Please see options -d and -m for further details and btrfs-balance(8) for
the profile conversion post mkfs.
数据和元数据的默认块组配置取决于设备数量和可能的其他因素。建议使用特定配置文件,但默认配置应该是可以的,并允许将来转换为其他配置文件。有关更多详细信息,请参阅选项 -d 和 -m,以及 btrfs-balance(8) 用于 mkfs 后的配置文件转换。
OPTIONS 选项
- -b|--byte-count <size> -b|--字节计数 <大小>
Specify the size of each device as seen by the filesystem. If not set, the entire device size is used. The total filesystem size will be sum of all device sizes, for a single device filesystem the option effectively specifies the size of the filesystem.
指定文件系统看到的每个设备的大小。如果未设置,则使用整个设备大小。总文件系统大小将是所有设备大小的总和,对于单个设备文件系统,该选项有效地指定了文件系统的大小。
- --csum <type>, --checksum <type>
Specify the checksum algorithm. Default is crc32c. Valid values are crc32c, xxhash, sha256 or blake2. To mount such filesystem kernel must support the checksums as well. See section CHECKSUM ALGORITHMS in btrfs(5).
指定校验和算法。默认值为 crc32c。有效值为 crc32c、xxhash、sha256 或 blake2。要挂载这样的文件系统,内核必须支持校验和。请参阅 btrfs(5) 中的 CHECKSUM ALGORITHMS 部分。
- -d|--data <profile>
Specify the profile for the data block groups. Valid values are raid0, raid1, raid1c3, raid1c4, raid5, raid6, raid10 or single or dup (case does not matter).
指定数据块组的配置文件。有效值为 raid0、raid1、raid1c3、raid1c4、raid5、raid6、raid10 或 single 或 dup(大小写不敏感)。See section DUP PROFILES ON A SINGLE DEVICE for more details.
有关在单个设备上的 DUP 配置文件的更多详细信息,请参阅“单个设备上的 DUP 配置文件”部分。On multiple devices, the default was raid0 until version 5.7, while it is single since version 5.8. You can still select raid0 manually, but it was not suitable as default.
在多个设备上,默认值为 raid0,直到版本 5.7 为止,而自版本 5.8 以来为 single。您仍然可以手动选择 raid0,但作为默认值不合适。- -m|--metadata <profile>
Specify the profile for the metadata block groups. Valid values are raid0, raid1, raid1c3, raid1c4, raid5, raid6, raid10, single or dup (case does not matter).
指定元数据块组的配置文件。有效值为 raid0、raid1、raid1c3、raid1c4、raid5、raid6、raid10、single 或 dup(大小写不敏感)。Default on a single device filesystem is DUP and is recommended for metadata in general. The duplication might not be necessary in some use cases and it’s up to the user to changed that at mkfs time or later. This depends on hardware that could potentially deduplicate the blocks again but this cannot be detected at mkfs time.
在单设备文件系统上,默认为 DUP,并且通常建议用于元数据。在某些用例中,可能不需要复制,用户可以在 mkfs 时或以后更改。这取决于硬件,可能会再次对块进行去重,但在 mkfs 时无法检测到。Note 注意
Up to version 5.14 there was a detection of a SSD device (more precisely if it’s a rotational device, determined by the contents of file
/sys/block/DEV/queue/rotational
) that used to select single. This has changed in version 5.15 to be always dup.
在版本 5.14 之前,会检测 SSD 设备(更准确地说是旋转设备,由文件/sys/block/DEV/queue/rotational
的内容确定)以选择 single。在版本 5.15 中已更改为始终为 dup。Note that the rotational status can be arbitrarily set by the underlying block device driver and may not reflect the true status (network block device, memory-backed SCSI devices, real block device behind some additional device mapper layer, etc). It’s recommended to always set the options --data/--metadata to avoid confusion and unexpected results.
请注意,底层块设备驱动程序可以任意设置旋转状态,可能不反映真实状态(网络块设备、基于内存的 SCSI 设备、某些附加设备映射层后面的真实块设备等)。建议始终设置选项 --data/--metadata 以避免混淆和意外结果。See section DUP PROFILES ON A SINGLE DEVICE for more details.
有关更多详细信息,请参阅“单个设备上的 DUP 配置文件”部分。On multiple devices the default is raid1.
在多个设备上,默认值为 raid1。
- -M|--mixed
Normally the data and metadata block groups are isolated. The mixed mode will remove the isolation and store both types in the same block group type. This helps to utilize the free space regardless of the purpose and is suitable for small devices. The separate allocation of block groups leads to a situation where the space is reserved for the other block group type, is not available for allocation and can lead to ENOSPC state.
通常,数据和元数据块组是隔离的。混合模式将消除隔离,并将两种类型存储在同一块组类型中。这有助于利用空闲空间,无论用途如何,并适用于小型设备。块组的单独分配会导致一种情况,即为另一种块组类型保留空间,无法用于分配,可能导致 ENOSPC 状态。The recommended size for the mixed mode is for filesystems less than 1GiB. The soft recommendation is to use it for filesystems smaller than 5GiB. The mixed mode may lead to degraded performance on larger filesystems, but is otherwise usable, even on multiple devices.
混合模式的推荐大小适用于小于 1GiB 的文件系统。软推荐是在小于 5GiB 的文件系统中使用它。在较大的文件系统上,混合模式可能会导致性能下降,但在其他情况下仍然可用,甚至可以在多个设备上使用。The nodesize and sectorsize must be equal, and the block group types must match.
节点大小和扇区大小必须相等,并且块组类型必须匹配。Note 注意
Versions up to 4.2.x forced the mixed mode for devices smaller than 1GiB. This has been removed in 4.3+ as it caused some usability issues.
版本在 4.2.x 之前强制对小于 1GiB 的设备使用混合模式。这在 4.3+中已被移除,因为它导致了一些可用性问题。Mixed profile cannot be used together with other profiles. It can only be set at creation time. Conversion to or from mixed profile is not implemented.
混合配置文件不能与其他配置文件一起使用。它只能在创建时设置。未实现到混合配置文件的转换或从混合配置文件的转换。- -n|--nodesize <size>
Specify the nodesize, the tree block size in which btrfs stores metadata. The default value is 16KiB (16384) or the page size, whichever is bigger. Must be a multiple of the sectorsize and a power of 2, but not larger than 64KiB (65536). Leafsize always equals nodesize and the options are aliases.
指定节点大小,btrfs 存储元数据的树块大小。默认值为 16KiB(16384)或页面大小中较大的值。必须是扇区大小的倍数且是 2 的幂,但不能大于 64KiB(65536)。Leafsize 总是等于 nodesize,这些选项是别名。Smaller node size increases fragmentation but leads to taller b-trees which in turn leads to lower locking contention. Higher node sizes give better packing and less fragmentation at the cost of more expensive memory operations while updating the metadata blocks.
较小的节点大小会增加碎片化,但会导致更高的 B 树,进而降低锁竞争。较大的节点大小可以更好地打包和减少碎片化,但会增加更新元数据块时的内存操作成本。Note 注意
Versions up to 3.11 set the nodesize to 4KiB.
版本直到 3.11 将节点大小设置为 4KiB。- -s|--sectorsize <size> -s|--扇区大小 <大小>
Specify the sectorsize, the minimum data block allocation unit.
指定扇区大小,即最小数据块分配单元。Note 注意
Versions prior to 6.7 set the sectorsize matching the host CPU page size, starting in 6.7 this is 4KiB for cross-architecture compatibility. Please read more about the subpage block size support and its status.
在 6.7 版本之前,扇区大小与主机 CPU 页大小匹配,从 6.7 版本开始,为了跨架构兼容性,扇区大小为 4KiB。请阅读更多关于子页块大小支持及其状态的信息。By default, the value is 4KiB, but it can be manually set to match the system page size (e.g. using command getconf PAGE_SIZE). However, if the sector size is different from the page size, the resulting filesystem may not be mountable by the current kernel, apart from the default 4KiB. Hence, using this option is not advised unless you intend to mount it on a system with the suitable page size.
默认情况下,该值为 4KiB,但可以手动设置为与系统页大小匹配(例如,使用命令 getconf PAGE_SIZE)。然而,如果扇区大小与页大小不同,则生成的文件系统可能无法被当前内核挂载,除非使用默认的 4KiB。因此,除非您打算在具有适当页大小的系统上挂载它,否则不建议使用此选项。- -L|--label <string> -L|--标签 <字符串>
Specify a label for the filesystem. The string should be less than 256 bytes and must not contain newline characters.
指定文件系统的标签。字符串应少于 256 字节,且不得包含换行符。- -K|--nodiscard
Do not perform whole device TRIM operation on devices that are capable of that. This does not affect discard/trim operation when the filesystem is mounted. Please see the mount option discard for that in btrfs(5).
不要在能够执行整个设备 TRIM 操作的设备上执行该操作。这不会影响文件系统挂载时的丢弃/修剪操作。请参阅 btrfs(5) 中的挂载选项 discard。- -r|--rootdir <rootdir> -r|--根目录 <根目录>
Populate the toplevel subvolume with files from rootdir. This does not require root permissions to write the new files or to mount the filesystem.
使用 rootdir 中的文件填充顶层子卷。这不需要 root 权限来写入新文件或挂载文件系统。Note 注意
This option may enlarge the image or file to ensure it’s big enough to contain the files from rootdir. Since version 4.14.1 the filesystem size is not minimized. Please see option --shrink if you need that functionality.
此选项可能会扩大镜像或文件,以确保其足够大以容纳来自 rootdir 的文件。自版本 4.14.1 起,文件系统大小不再被最小化。如果需要该功能,请参阅选项 --shrink。
- --shrink
Shrink the filesystem to its minimal size, only works with --rootdir option.
将文件系统缩小到其最小尺寸,仅适用于 --rootdir 选项。If the destination block device is a regular file, this option will also truncate the file to the minimal size. Otherwise it will reduce the filesystem available space. Extra space will not be usable unless the filesystem is mounted and resized using btrfs filesystem resize.
如果目标块设备是一个常规文件,则此选项还将将文件截断为最小大小。否则,它将减少文件系统可用空间。除非文件系统已挂载并使用 btrfs 文件系统调整大小,否则额外空间将无法使用。Note 注意
Prior to version 4.14.1, the shrinking was done automatically.
在版本 4.14.1 之前,收缩是自动完成的。
- -O|--features <feature1>[,<feature2>…]
A list of filesystem features turned on at mkfs time. Not all features are supported by old kernels. To disable a feature, prefix it with ^.
一个在 mkfs 时打开的文件系统功能列表。并非所有功能都受旧内核支持。要禁用功能,请在其前面加上 ^。See section FILESYSTEM FEATURES for more details. To see all available features that mkfs.btrfs supports run:
查看文件系统功能部分以获取更多详细信息。要查看 mkfs.btrfs 支持的所有可用功能,请运行:$ mkfs.btrfs -O list-all
- -f|--force -f|--强制
Forcibly overwrite the block devices when an existing filesystem is detected. By default, mkfs.btrfs will utilize libblkid to check for any known filesystem on the devices. Alternatively you can use the wipefs utility to clear the devices.
在检测到现有文件系统时强制覆盖块设备。默认情况下,mkfs.btrfs 将利用 libblkid 来检查设备上是否存在任何已知文件系统。或者,您可以使用 wipefs 实用程序来清除设备。- -q|--quiet -q|--安静
Print only error or warning messages. Options --features or --help are unaffected. Resets any previous effects of --verbose.
仅打印错误或警告消息。选项 --features 或 --help 不受影响。重置 --verbose 的任何先前效果。- -U|--uuid <UUID>
Create the filesystem with the given UUID. For a single-device filesystem, you can duplicate the UUID. However, for a multi-device filesystem, the UUID must not already exist on any currently present filesystem.
使用给定的 UUID 创建文件系统。对于单设备文件系统,您可以复制 UUID。但是,对于多设备文件系统,UUID 不能已经存在于任何当前存在的文件系统中。
- --device-uuid <UUID>
Create the filesystem with the given device-uuid UUID (also known as UUID_SUB in blkid). For a single device filesystem, you can duplicate the device-uuid. However, used for a multi-device filesystem this option will not work at the moment.
使用给定的设备 UUID(也称为 blkid 中的 UUID_SUB)创建文件系统。对于单设备文件系统,您可以复制设备 UUID。但是,对于多设备文件系统,此选项目前不起作用。
- -v|--verbose -v|--详细
Increase verbosity level, default is 1.
增加详细程度, 默认值为 1。- -V|--version -V|--版本
Print the mkfs.btrfs version and exit.
打印 mkfs.btrfs 的版本并退出。
- --help
Print help. 打印帮助。
- -l|--leafsize <size> -l|--叶子大小 <大小>
Removed in 6.0, used to be alias for --nodesize.
在 6.0 版中已移除,曾是 --nodesize 的别名。- -R|--runtime-features <feature1>[,<feature2>…]
-R|--运行时特性 <功能 1>[,<功能 2>…] Removed in 6.3, was used to specify features not affecting on-disk format. Now all such features are merged into -O|--features option. The option -R will stay for backward compatibility.
在 6.3 版中已移除,用于指定不影响磁盘格式的特性。现在所有这类特性都合并到 -O|--特性 选项中。选项 -R 将保留以保持向后兼容性。
SIZE UNITS 大小单位
The default unit is byte. All size parameters accept suffixes in the 1024
base. The recognized suffixes are: k, m, g, t, p, e, both uppercase
and lowercase.
默认单位为字节。所有大小参数都接受 1024 进制的后缀。识别的后缀有:k、m、g、t、p、e,大小写均可。
MULTIPLE DEVICES 多个设备
Before mounting a multiple device filesystem, the kernel module must know the
association of the block devices that are attached to the filesystem UUID.
在挂载多个设备文件系统之前,内核模块必须知道与附加到文件系统 UUID 的块设备的关联。
There is typically no action needed from the user. On a system that utilizes a
udev-like daemon, any new block device is automatically registered. The rules
call btrfs device scan.
通常用户无需采取任何操作。在使用类似 udev 的守护程序的系统上,任何新的块设备都会自动注册。规则调用 btrfs 设备扫描。
The same command can be used to trigger the device scanning if the btrfs kernel
module is reloaded (naturally all previous information about the device
registration is lost).
如果重新加载 btrfs 内核模块(自然地,所有关于设备注册的先前信息都会丢失),可以使用相同的命令来触发设备扫描。
Another possibility is to use the mount options device to specify the list of
devices to scan at the time of mount.
另一种可能性是使用挂载选项设备来指定挂载时要扫描的设备列表。
# mount -o device=/dev/sdb,device=/dev/sdc /dev/sda /mnt
Note 注意
This means only scanning, if the devices do not exist in the system,
mount will fail anyway. This can happen on systems without initramfs/initrd and
root partition created with RAID1/10/5/6 profiles. The mount action can happen
before all block devices are discovered. The waiting is usually done on the
initramfs/initrd systems.
这意味着只进行扫描,如果设备在系统中不存在,挂载将无法成功。这种情况可能发生在没有 initramfs/initrd 且使用 RAID1/10/5/6 配置创建的根分区的系统上。挂载操作可能发生在所有块设备被发现之前。等待通常在 initramfs/initrd 系统上进行。
Warning 警告
RAID5/6 has known problems and should not be used in production.
RAID5/6 存在已知问题,不应在生产环境中使用。
FILESYSTEM FEATURES 文件系统特性
Features that can be enabled during creation time. See also btrfs(5) section
FILESYSTEM FEATURES.
可在创建时启用的功能。另请参阅 btrfs(5) 章节“文件系统功能”。
- mixed-bg 混合背景
(kernel support since 2.6.37)
(内核支持自 2.6.37 版本)mixed data and metadata block groups, also set by option --mixed
混合数据和元数据块组,也可通过选项--mixed 设置
- extref 扩展引用
(default since btrfs-progs 3.12, kernel support since 3.7)
(默认自 btrfs-progs 3.12,内核支持自 3.7)increased hardlink limit per file in a directory to 65536, older kernels supported a varying number of hardlinks depending on the sum of all file name sizes that can be stored into one metadata block
将每个目录中文件的硬链接限制增加到 65536,较旧的内核支持的硬链接数量取决于可以存储到一个元数据块中的所有文件名大小之和- raid56
(kernel support since 3.9)
(内核支持自 3.9 版本起)extended format for RAID5/6, also enabled if RAID5 or RAID6 block groups are selected
RAID5/6 的扩展格式,如果选择了 RAID5 或 RAID6 块组也会启用
- skinny-metadata 瘦元数据
(default since btrfs-progs 3.18, kernel support since 3.10)
(自 btrfs-progs 3.18 起默认,内核支持自 3.10 起)reduced-size metadata for extent references, saves a few percent of metadata
减小范围引用的元数据大小,节省了几个百分点的元数据
- no-holes 无孔
(default since btrfs-progs 5.15, kernel support since 3.14)
(自 btrfs-progs 5.15 起默认,内核支持自 3.14 起)improved representation of file extents where holes are not explicitly stored as an extent, saves a few percent of metadata if sparse files are used
改进了文件范围的表示,其中未显式存储空洞作为一个范围,如果使用稀疏文件,可以节省几个百分点的元数据- zoned 分区
(kernel support since 5.12)
(内核支持自 5.12 起)zoned mode, data allocation and write friendly to zoned/SMR/ZBC/ZNS devices, see ZONED MODE in btrfs(5), the mode is automatically selected when a zoned device is detected
分区模式,数据分配和对分区/SMR/ZBC/ZNS 设备友好的写入,参见 btrfs(5)中的分区模式,当检测到分区设备时,该模式会自动选择- quota 配额
(kernel support since 3.4)
(自内核支持自 3.4 版本)Enable quota support (qgroups). The qgroup accounting will be consistent, can be used together with --rootdir. See also btrfs-quota(8).
启用配额支持(qgroups)。qgroup 会计将是一致的,可以与--rootdir 一起使用。另请参阅 btrfs-quota(8)。
- free-space-tree 空间树
(default since btrfs-progs 5.15, kernel support since 4.5)
(默认自 btrfs-progs 5.15,内核支持自 4.5)Enable the free space tree (mount option space_cache=v2) for persisting the free space cache in a b-tree. This is built on top of the COW mechanism and has better performance than v1.
启用自由空间树(挂载选项 space_cache=v2)以将空闲空间缓存持久化在 b 树中。这是建立在 COW 机制之上的,并且比 v1 有更好的性能。Offline conversion from filesystems that don’t have this feature enabled at mkfs time is possible, see btrfstune(8).
可以对没有在 mkfs 时启用此功能的文件系统进行离线转换,请参阅 btrfstune(8)。Online conversion can be done by mounting with
space_cache=v2
, this is sufficient to be done one time.
可以通过挂载space_cache=v2
来进行在线转换,这只需要做一次就足够了。
- block-group-tree
(kernel support since 6.1)
(自 6.1 版本起内核支持)Enable a dedicated b-tree for block group items, this greatly reduces mount time for large filesystems due to better data locality that avoids seeking. On rotational devices the large size is considered starting from the 2-4TiB. Can be used on other types of devices (SSD, NVMe, …) as well.
为块组项启用专用的 B 树,这将极大地减少大型文件系统的挂载时间,因为更好的数据局部性避免了寻道。在旋转设备上,大尺寸从 2-4TiB 开始考虑。也可用于其他类型的设备(SSD、NVMe 等)。Offline conversion from filesystems that don’t have this feature enabled at mkfs time is possible, see btrfstune(8). Online conversion is not possible.
可以对在 mkfs 时未启用此功能的文件系统进行离线转换,请参阅 btrfstune(8)。在线转换不可行。
- raid-stripe-tree
(kernel support since 6.7, CONFIG_BTRFS_DEBUG)
(自 6.7 版本开始内核支持,CONFIG_BTRFS_DEBUG)Separate tree for logical file extent mapping where the physical mapping may not match on multiple devices. This is now used in zoned mode to implement RAID0/RAID1* profiles, but can be used in non-zoned mode as well. The support for RAID56 is in development and will eventually fix the problems with the current implementation. This is a backward incompatible feature and has to be enabled at mkfs time.
逻辑文件范围映射的单独树,物理映射可能在多个设备上不匹配。现在在分区模式下用于实现 RAID0/RAID1* 配置文件,但也可以在非分区模式下使用。RAID56 的支持正在开发中,最终将修复当前实现中的问题。这是一个不兼容的功能,必须在 mkfs 时启用。Note 注意
Due to the status of implementation it is enabled only in builds with CONFIG_BTRFS_DEBUG. Support by the kernel module can be found in the sysfs feature list.
由于实现状态,仅在具有 CONFIG_BTRFS_DEBUG 的构建中启用。内核模块的支持可以在 sysfs 功能列表中找到。- squota 配额
(kernel support since 6.7)
(自 6.7 版本开始支持)Enable simple quota accounting (squotas). This is an alternative to qgroups with a smaller performance impact but no notion of shared vs. exclusive usage.
启用简单的配额账户(squotas)。这是一种与 qgroups 相对应的替代方案,对性能影响较小,但没有共享与独占使用的概念。
BLOCK GROUPS, CHUNKS, RAID
块组,块,RAID
The highlevel organizational units of a filesystem are block groups of three types:
data, metadata and system.
文件系统的高级组织单位是三种类型的块组:数据、元数据和系统。
- DATA 数据
store data blocks and nothing else
存储数据块,除此之外无其他- METADATA 元数据
store internal metadata in b-trees, can store file data if they fit into the inline limit
在 B 树中存储内部元数据,如果符合内联限制,还可以存储文件数据- SYSTEM 系统
store structures that describe the mapping between the physical devices and the linear logical space representing the filesystem
存储结构,描述物理设备与表示文件系统的线性逻辑空间之间的映射
Other terms commonly used:
其他常用术语:
- block group, chunk 块组,块
a logical range of space of a given profile, stores data, metadata or both; sometimes the terms are used interchangeably
给定配置文件的逻辑空间范围,存储数据、元数据或两者;有时这些术语可以互换使用A typical size of metadata block group is 256MiB (filesystem smaller than 50GiB) and 1GiB (larger than 50GiB), for data it’s 1GiB. The system block group size is a few megabytes.
典型的元数据块组大小为 256MiB(文件系统小于 50GiB)和 1GiB(大于 50GiB),对于数据而言是 1GiB。系统块组大小为几兆字节。- RAID
a block group profile type that utilizes RAID-like features on multiple devices: striping, mirroring, parity
在多个设备上利用类似 RAID 的特性的块组配置类型:分段、镜像、奇偶校验- profile 配置文件
when used in connection with block groups refers to the allocation strategy and constraints, see the section PROFILES for more details
在与块组一起使用时,指的是分配策略和约束条件,请参阅“配置文件”部分了解更多详情
PROFILES 档案
There are the following block group types available:
可用的区块组类型如下:
Profiles |
Redundancy 冗余 Copies 复制 |
Redundancy 冗余 Parity 奇偶校验 |
Redundancy 冗余 Striping 条带化 |
Space utilization 空间利用 |
Min/max devices 最小/最大设备 |
---|---|---|---|---|---|
single |
1 |
100% |
1/any 1/任意 |
||
DUP |
2 / 1 device 2 / 1 设备 |
50% |
1/any (see note 1) 1/任何(见注 1) |
||
RAID0 |
1 |
1 to N 1 到 N |
100% |
1/any (see note 5) 1/任何(见注 5) |
|
RAID1 |
2 |
50% |
2/any 2/任何 |
||
RAID1C3 |
3 |
33% |
3/any 3/任何 |
||
RAID1C4 |
4 |
25% |
4/any 4/任何 |
||
RAID10 |
2 |
1 to N 1 到 N |
50% |
2/any (see note 5) 2/任意 (见注 5) |
|
RAID5 |
1 |
1 |
2 to N-1 2 到 N-1 |
(N-1)/N |
2/any (see note 2) 2/任何(见注 2) |
RAID6 |
1 |
2 |
3 to N-2 3 到 N-2 |
(N-2)/N |
3/any (see note 3) 3/任何(见注释 3) |
Warning 警告
It’s not recommended to create filesystems with RAID0/1/10/5/6
profiles on partitions from the same device. Neither redundancy nor
performance will be improved.
不建议在同一设备的分区上使用 RAID0/1/10/5/6 配置文件系统。冗余性和性能都不会得到改善。
Note 1: DUP may exist on more than 1 device if it starts on a single device and
another one is added. Since version 4.5.1, mkfs.btrfs will let you create DUP
on multiple devices without restrictions.
注意 1:如果在单个设备上启动并添加另一个设备,则可能存在 DUP。自版本 4.5.1 起,mkfs.btrfs 将允许您在多个设备上创建无限制的 DUP。
Note 2: It’s not recommended to use 2 devices with RAID5. In that case,
parity stripe will contain the same data as the data stripe, making RAID5
degraded to RAID1 with more overhead.
注意 2:不建议使用带有 RAID5 的 2 个设备。在这种情况下,奇偶校验条带将包含与数据条带相同的数据,使 RAID5 降级为具有更多开销的 RAID1。
Note 3: It’s also not recommended to use 3 devices with RAID6, unless you
want to get effectively 3 copies in a RAID1-like manner (but not exactly that).
注意 3:不建议使用带有 RAID6 的 3 个设备,除非您希望以类似 RAID1 的方式(但不完全相同)获得有效的 3 个副本。
Note 4: Since kernel 5.5 it’s possible to use RAID1C3 as replacement for
RAID6, higher space cost but reliable.
注意 4: 自内核 5.5 起,可以使用 RAID1C3 作为 RAID6 的替代方案,空间成本更高但更可靠。
Note 5: Since kernel 5.15 it’s possible to use (mount, convert profiles)
RAID0 on one device and RAID10 on two devices.
注意 5: 自内核 5.15 起,可以在一个设备上使用 RAID0,并在两个设备上使用 RAID10(挂载,转换配置文件)。
PROFILE LAYOUT 配置文件布局
For the following examples, assume devices numbered by 1, 2, 3 and 4, data or
metadata blocks A, B, C, D, with possible stripes e.g. A1, A2 that would be
logically A, etc. For parity profiles PA and QA are parity and syndrome,
associated with the given stripe. The simple layouts single or DUP are left
out. Actual physical block placement on devices depends on current state of
the free/allocated space and may appear random. All devices are assumed to be
present at the time of the blocks would have been written.
对于以下示例,请假设设备编号为 1、2、3 和 4,数据或元数据块为 A、B、C、D,可能的条带为例如 A1、A2,逻辑上为 A 等。对于奇偶校验配置文件 PA 和 QA,奇偶校验和综合,与给定的条带相关联。简单的布局单个或 DUP 被省略。设备上实际的物理块放置取决于当前的空闲/已分配空间状态,可能看起来是随机的。假定所有设备在块被写入时都是存在的。
RAID1 RAID1
device 1 设备 1 |
device 2 设备 2 |
device 3 设备 3 |
device 4 设备 4 |
---|---|---|---|
A |
D |
||
B |
C |
||
C |
|||
D |
A |
B |
RAID1C3 RAID1C3
device 1 设备 1 |
device 2 设备 2 |
device 3 设备 3 |
device 4 设备 4 |
---|---|---|---|
A |
A |
D |
|
B |
B |
||
C |
A |
C |
|
D |
D |
C |
B |
RAID0 RAID0
device 1 设备 1 |
device 2 设备 2 |
device 3 设备 3 |
device 4 设备 4 |
---|---|---|---|
A2 |
C3 |
A3 |
C2 |
B1 |
A1 |
D2 |
B3 |
C1 |
D3 |
B4 |
D1 |
D4 |
B2 |
C4 |
A4 |
RAID5 RAID5
device 1 设备 1 |
device 2 设备 2 |
device 3 设备 3 |
device 4 设备 4 |
---|---|---|---|
A2 |
C3 |
A3 |
C2 |
B1 |
A1 |
D2 |
B3 |
C1 |
D3 |
PB |
D1 |
PD |
B2 |
PC |
PA |
RAID6 RAID6
device 1 设备 1 |
device 2 设备 2 |
device 3 设备 3 |
device 4 设备 4 |
---|---|---|---|
A2 |
QC |
QA |
C2 |
B1 |
A1 |
D2 |
QB |
C1 |
QD |
PB |
D1 |
PD |
B2 |
PC |
PA |
DUP PROFILES ON A SINGLE DEVICE
在单个设备上复制配置文件
The mkfs utility will let the user create a filesystem with profiles that write
the logical blocks to 2 physical locations. Whether there are really 2
physical copies highly depends on the underlying device type.
mkfs 实用程序允许用户使用将逻辑块写入 2 个物理位置的配置文件创建文件系统。是否真的有 2 个物理副本高度取决于底层设备类型。
For example, a SSD drive can remap the blocks internally to a single copy--thus
deduplicating them. This negates the purpose of increased redundancy and just
wastes filesystem space without providing the expected level of redundancy.
例如,SSD 驱动器可以在内部将块重新映射到单个副本,从而对其进行去重。这会抵消增加冗余的目的,只会浪费文件系统空间,而不提供预期的冗余级别。
The duplicated data/metadata may still be useful to statistically improve the
chances on a device that might perform some internal optimizations. The actual
details are not usually disclosed by vendors. For example we could expect that
not all blocks get deduplicated. This will provide a non-zero probability of
recovery compared to a zero chance if the single profile is used. The user
should make the tradeoff decision. The deduplication in SSDs is thought to be
widely available so the reason behind the mkfs default is to not give a false
sense of redundancy.
复制的数据/元数据仍可能对统计上提高设备可能执行一些内部优化的机会有所帮助。通常厂商不会披露实际细节。例如,我们可以预期并非所有块都会被去重。这将提供与使用单个配置文件时的零概率相比的非零恢复概率。用户应该做出权衡决定。认为 SSD 中的去重功能是普遍可用的,因此 mkfs 默认设置的原因是不给出虚假的冗余感。
As another example, the widely used USB flash or SD cards use a translation
layer between the logical and physical view of the device. The data lifetime
may be affected by frequent plugging. The memory cells could get damaged,
hopefully not destroying both copies of particular data in case of DUP.
作为另一个例子,广泛使用的 USB 闪存或 SD 卡在设备的逻辑和物理视图之间使用翻译层。数据寿命可能会受到频繁插拔的影响。内存单元可能会受损,希望不会在 DUP 的情况下破坏特定数据的两个副本。
The wear levelling techniques can also lead to reduced redundancy, even if the
device does not do any deduplication. The controllers may put data written in
a short timespan into the same physical storage unit (cell, block etc). In case
this unit dies, both copies are lost. BTRFS does not add any artificial delay
between metadata writes.
均衡磨损技术也可能导致冗余减少,即使设备没有进行任何去重。控制器可能会将在短时间内写入的数据放入相同的物理存储单元(单元、块等)。如果该单元损坏,两个副本都会丢失。BTRFS 不会在元数据写入之间添加任何人为延迟。
The traditional rotational hard drives usually fail at the sector level.
传统的旋转硬盘通常在扇区级别出现故障。
In any case, a device that starts to misbehave and repairs from the DUP copy
should be replaced! DUP is not backup.
无论如何,一个开始表现异常并从 DUP 副本中修复的设备应该被替换!DUP 不是备份。
KNOWN ISSUES 已知问题
SMALL FILESYSTEMS AND LARGE NODESIZE
小文件系统和大节点大小
The combination of small filesystem size and large nodesize is not recommended
in general and can lead to various ENOSPC-related issues during mount time or runtime.
通常不建议将小文件系统大小和大节点大小的组合,并且可能导致挂载时或运行时出现各种与 ENOSPC 相关的问题。
Since mixed block group creation is optional, we allow small
filesystem instances with differing values for sectorsize and nodesize
to be created and could end up in the following situation:
由于混合块组创建是可选的,我们允许创建具有不同 sectorsize 和 nodesize 值的小文件系统实例,并可能出现以下情况:
# mkfs.btrfs -f -n 65536 /dev/loop0
btrfs-progs v3.19-rc2-405-g976307c
See https://btrfs.readthedocs.io for more information.
Performing full device TRIM (512.00MiB) ...
Label: (null)
UUID: 49fab72e-0c8b-466b-a3ca-d1bfe56475f0
Node size: 65536
Sector size: 4096
Filesystem size: 512.00MiB
Block group profiles:
Data: single 8.00MiB
Metadata: DUP 40.00MiB
System: DUP 12.00MiB
SSD detected: no
Incompat features: extref, skinny-metadata
Number of devices: 1
Devices:
ID SIZE PATH
1 512.00MiB /dev/loop0
# mount /dev/loop0 /mnt/
mount: mount /dev/loop0 on /mnt failed: No space left on device
The ENOSPC occurs during the creation of the UUID tree. This is caused
by large metadata blocks and space reservation strategy that allocates more
than can fit into the filesystem.
在创建 UUID 树时发生 ENOSPC。这是由于大型元数据块和空间保留策略导致分配的空间超过文件系统的容量。
AVAILABILITY 可用性
btrfs is part of btrfs-progs. Please refer to the documentation at
https://btrfs.readthedocs.io.
btrfs 是 btrfs-progs 的一部分。请参考 https://btrfs.readthedocs.io 上的文档。