In the last section of this series I discussed using ZFS snapshots, ZFS send and using other interesting features ZFS has to offer. In this post I discuss setting up a arch system using a ZFS pool as a root filesystem
I originally posted this on June 23rd 2016, since then I have learned a fair bit about setting up a system on ZFS, the following set up is my revised configuration as of September 2nd 2017.
Part Two - Installation Link to heading
Pre-Install Setup Link to heading
Boot partitions Link to heading
I would recomend using a dedicated partition wheather using BIOS or UEFI. While you are supposed to be able to store your kernel and any other boot images on a ZFS dataset when using a BIOS bootloader like GRUB, in my experience I have had more luck using a boot partition.
ZFS Setup Link to heading
Pool Type and Redundancy Link to heading
ZFS has several different levels of redundancy based on the number of discs that are used and in what setting they are configured. They are different than traditional RAID levels.
A crude description of the different levels:
- Stripe
- No redundancy, similar to RAID0.
- Default pool creation mode.
- RAIDZ
- Minimum of 3 discs.
- One disc can be lost without pool failure.
- One of the discs is redundant and will not provide storage.
- RAIDZ2
- Minimum of 4 discs.
- Two discs can be lost without pool failure.
- Two of the discs are redundant and will not provide storage.
- RAIDZ3
- Minimum of 5 discs.
- Three discs can be lost without pool failure.
- Three of the discs are redundant and will not provide storage.
- Mirror - Minimum of 2 discs.
- Half of the discs can be lost without pool failure.
- Storage from half of the discs are used as redundancy and will not provide storage.
- Provides the best performance and flexibility at a cost of storage space.
Disk Setup Link to heading
It’s recommended to use the disk id names as recommended by the ZOL project. The identification and partition number of each drive can be found with:
[root]# ls /dev/disk/by-id/
ata-SanDisk_SDSSDXPS480G_152271401093
ata-SanDisk_SDSSDXPS480G_154501401266
Note: In the following post I will be using the above two sandisk SSDs as an example.
Once the disc IDs are known, the pool can be created. While the disks can be partitioned manually in GPT or MBR, it is not necessary to partition entire drives before creating the pool as ZFS will partition the drives itself as Solaris Root (bf00)
in the creation of a new pool.
ZFS has issues being booted off of. The simplest option is to partition the boot partition in another format as would be done with a regular install.
I do not use a swap partition and thus have no experience using one; however, the Arch wiki explains the process of sending one up if necessary.
Pool Creation Link to heading
After deciding on a pool type and getting the disc IDs, a pool can be created with the zpool create
command.
The syntax is:
zpool create [-fnd] [-o property=value] ... \
[-O file-system-property=value] ... \
[-m mountpoint] [-R root] ${POOL_NAME} ${DISK} ...
Flags:
-f
- Force.-n
- Display creation but don’t create pool.-d
- Do not enable any features unless specified.-o
- Set a pool property.-O
- Set a property on root filesystem.-m
- Mountpoint.-R
- Set an alternate root location.
First, probe for ZFS on the system, there should be no output from modprobe
.
[root]# modprobe zfs
Then the zpool create
command can be used to create a new pool.
When creating a pool ashift=12
will specify advanced format disks, this will force 4096 size blocksize. Here I create a mirrored pool named ‘vault’ with my two SSD’s.
[root]# zpool create -f -o ashift=12 vault mirror \
ata-SanDisk_SDSSDXPS480G_152271401093 \
ata-SanDisk_SDSSDXPS480G_154501401266
Test the pool was created successfully with zpool status
.
[root]# zpool status
pool: vault
state: ONLINE
scan: scrub repaired 0 in 0h8m with 0 errors on Mon Jun 13 00:08:39 2016
config:
NAME STATE READ WRITE CKSUM
vault ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
ata-SanDisk_SDSSDXPS480G_152271401093 ONLINE 0 0 0
ata-SanDisk_SDSSDXPS480G_154501401266 ONLINE 0 0 0
errors: No known data errors
Properties Link to heading
There are many properties that can be set on an entire pool or specific data sets. A property that will almost always be wanted on the entire pool is compression. ZFS uses “LZ4” compression which is a great compromise between performance and amount of compression.
Set compression on
[root]# zfs set compression=on vault
By default atime=on
is enabled, it can be turned off to increase performance, or set to relatime
which is the default on many Linux filesystems.
In recording access time relatime
is a good compromise, it will record access time but much more infrequently than the default atime=on
setting.
[root]# zfs set atime=on vault
[root]# zfs set relatime=on vault
Dataset Creation Link to heading
Datasets are similar to partitions; however, they come with many benefits partitions do not have. In addition to being hierarchically organizable, they do not require a specific quota. All of the datasets will share space in a given pool. These qualities in datasets mean that they can be used extensively without repercussions.
ZFS has the ability to manage datasets itself, or to tell the datasets to fall back to the system controlled legacy management where datasets will be managed with the fstab.
I have found using legacy management works best. If legacy mounting fails, which it does in certain circumstances I use ZFS managed mounting.
I’ll use a few variables to represent different locations in the pool for datasets.
SYS_ROOT=vault/sys
- The location of any systems on the pool. Alternatively Set to the root dataset, e.g.vault
, if you plan on only ever running a single system on the ZFS pool.DATA_ROOT=vault/data
- System shared data.
Key Datasets Link to heading
In order to create a setup that may be used with boot environments the root filesystem will be contained inside an additional ‘ROOT
’ dataset. When used with boot environments this will allow the root filesystem to be cloned between environments while sharing any datasets that are not inside ‘ROOT
’. It is not necessary to use boot environments but there is no downside to creating a system that is compatible with them for the future.
At minimum the following datasets should be created inside the root pool, ${SYS_ROOT}/${SYSTEM_NAME}/ROOT
, and ${SYS_ROOT}/${SYSTEM_NAME}/ROOT/default
. Most people will also want a separate ${SYS_ROOT}/${SYSTEM_NAME}/home
dataset that is not contained within the boot environment ‘ROOT
’ data set.
${SYS_ROOT}/${SYSTEM_NAME}/ROOT
- Will contain boot environments.
- The dataset we will be using as the filesystem root ‘
default
’ will reside within. - Will not be mounted using property
mountpoint=none
.
${SYS_ROOT}/${SYSTEM_NAME}/ROOT/default
- The root dataset boot environment, can be named anything but ‘default’ is the convention for the initial boot environment.
- Uses ZFS mounting with property
mountpoint=/
.
${SYS_ROOT}/${SYSTEM_NAME}/home
- The home dataset.
- Does not go within
${SYS_ROOT}/${SYSTEM_NAME}/ROOT
so that it is shared between boot environments. - Uses legacy mounting with property
mountpoint=legacy
. - Will be mounted at
/home
.
For example, my current system’s boot environment which will be mounted to /
:
vault/sys/chin/ROOT/default
In this configuration it makes it easy to dual boot multiple systems off of a single ZFS pool. To create a new system just add a new dataset under vault/sys
, and set it up as normal. This should even work dual booting Linux and FreeBSD.
Create the ${SYS_ROOT}/${SYSTEM_NAME}/ROOT/default
and ${SYS_ROOT}/${SYSTEM_NAME}/home
datasets and set their mount points.
[root]# SYS_ROOT=vault/sys; SYSTEM_NAME=chin
[root]# zfs create -o mountpoint=none -p ${SYS_ROOT}/${SYSTEM_NAME}
[root]# zfs create -o mountpoint=none ${SYS_ROOT}/${SYSTEM_NAME}/ROOT
[root]# zfs create -o mountpoint=/ ${SYS_ROOT}/${SYSTEM_NAME}/ROOT/default
[root]# zfs create -o mountpoint=legacy ${SYS_ROOT}/${SYSTEM_NAME}/home
Additional Datasets Link to heading
It is not necessary to create any additional datasets; however, since there is virtually no cost to using them and doing so gives the ability to manipulate properties for each dataset individually.
I set the properties following the tuning recommendations in various places including the Arch wiki:
canmount=off Datasets Link to heading
Set /var
, /var/lib
and /usr
to canmount=off
meaning they’re not mounted and are only there to create the ZFS dataset structure. This will put their data in the boot environment dataset. Their properties will be inherited by any child datasets.
Properties for /var
:
xattr=sa
- Stores data in inodes and can increase performance.
Create the datasets:
[root]# zfs create -o canmount=off -o mountpoint=/var -o xattr=sa ${SYS_ROOT}/${SYSTEM_NAME}/var
[root]# zfs create -o canmount=off -o mountpoint=/var/lib ${SYS_ROOT}/${SYSTEM_NAME}/var/lib
[root]# zfs create -o canmount=off -o mountpoint=/var/lib/systemd ${SYS_ROOT}/${SYSTEM_NAME}/var/lib/systemd
[root]# zfs create -o canmount=off -o mountpoint=/usr ${SYS_ROOT}/${SYSTEM_NAME}/usr
System Datasets Link to heading
The rest of the datasets will be independent from the boot environment and will not change between boot environments.
I keep some datasets like /var/cache
’s’ dataset seperate to avoid having to snapshot and backup their data. I also keep /var/log
’s’ dataset seperate so the logs are always available as well as the datasets for my containers and VMs.
NOTE: For the following for loop work in zsh, you need to make zsh split over whitespace with set -o shwordsplit
[root]# SYSTEM_DATASETS='var/lib/systemd/coredump var/log var/log/journal var/lib/lxc var/lib/lxd var/lib/machines var/lib/libvirt var/cache usr/local'
[root]# SYS_ROOT=vault/sys; SYSTEM_NAME=chin;
[root]# for ds in ${SYSTEM_DATASETS}; do zfs create -o mountpoint=legacy ${SYS_ROOT}/${SYSTEM_NAME}/${ds}; done
Create and turn on posixacls
for systemd-journald’s /var/log/journal
dataset.
[root]# zfs create -o mountpoint=legacy -o acltype=posixacl ${SYS_ROOT}/${SYSTEM_NAME}/var/log/journal
User Datasets Link to heading
I create extensive user datasets outside the boot environment.
[root]# USER_DATASETS='john john/local john/config john/cache'
[root]# for ds in ${USER_DATASETS}; do zfs create -o mountpoint=legacy ${SYS_ROOT}/${SYSTEM_NAME}/home/${ds}; done
I create ${SYS_ROOT}/${SYSTEM_NAME}/home/john/local/share
and set it to canmount=off
just to create the ZFS dataset structure.
[root]# zfs create -o mountpoint=/home/john/.local/share -o canmount=off ${SYS_ROOT}/${SYSTEM_NAME}/home/john/local/share
[root]# zfs create -o mountpoint=legacy ${SYS_ROOT}/${SYSTEM_NAME}/home/john/local/share/Steam
Storage Datasets Link to heading
I’ll be creating my data directories under vault/data
, they exist outside the different systems and are shared between them.
[root]# DATA_ROOT=vault/data;
[root]# zfs create -o mountpoint=none ${DATA_ROOT}
DATA_DATASETS='Books Computer Personal Pictures University Workspace Reference'
[root]# for ds in ${DATA_DATASETS}; do zfs create -o mountpoint=legacy ${DATA_ROOT}/${ds}; done
Final Structure Link to heading
So my system ends up as.
[root]# zfs list -o name | grep -E 'chin|data'
vault/data 768K 860G 96K none
vault/data/Books 96K 860G 96K legacy
vault/data/Computer 96K 860G 96K legacy
vault/data/Personal 96K 860G 96K legacy
vault/data/Pictures 96K 860G 96K legacy
vault/data/Reference 96K 860G 96K legacy
vault/data/University 96K 860G 96K legacy
vault/data/Workspace 96K 860G 96K legacy
vault/sys/chin 1.97M 860G 96K none
vault/sys/chin/ROOT 192K 860G 96K none
vault/sys/chin/ROOT/default 96K 860G 96K /
vault/sys/chin/home 672K 860G 96K legacy
vault/sys/chin/home/john 576K 860G 96K legacy
vault/sys/chin/home/john/cache 96K 860G 96K legacy
vault/sys/chin/home/john/config 96K 860G 96K legacy
vault/sys/chin/home/john/local 288K 860G 96K legacy
vault/sys/chin/home/john/local/share 192K 860G 96K /home/john/.local/share
vault/sys/chin/home/john/local/share/Steam 96K 860G 96K legacy
vault/sys/chin/usr 192K 860G 96K /usr
vault/sys/chin/usr/local 96K 860G 96K legacy
vault/sys/chin/var 864K 860G 96K /var
vault/sys/chin/var/cache 96K 860G 96K legacy
vault/sys/chin/var/lib 576K 860G 96K /var/lib
vault/sys/chin/var/lib/lxc 96K 860G 96K legacy
vault/sys/chin/var/lib/lxd 96K 860G 96K legacy
vault/sys/chin/var/lib/machines 96K 860G 96K legacy
vault/sys/chin/var/lib/libvirt 96K 860G 96K legacy
vault/sys/chin/var/lib/systemd 192K 860G 96K /var/lib/systemd
vault/sys/chin/var/lib/systemd/coredump 96K 860G 96K legacy
vault/sys/chin/var/log 96K 860G 96K legacy
User Delegation Link to heading
As of zfsonlinux 0.7.0 ZFS delegation using zfs allow
works on linux. I delegate all datasets under ${SYS_ROOT}/${SYSTEM_NAME}/home/john
to my user ‘john’ giving the abiity to snapshot and create datasets.
zfs allow john create,mount,mountpoint,snapshot ${SYS_ROOT}/${SYSTEM_NAME}/home/john
Checking permissions shows john’s permissions.
zfs allow ${SYS_ROOT}/${SYSTEM_NAME}/home/john
---- Permissions on vault/sys/chin/home/john -------------------------
Local+Descendent permissions:
user john create
[root@chin ~]# zfs allow john snapshot ${SYS_ROOT}/${SYSTEM_NAME}/home/john
[root@chin ~]# zfs allow ${SYS_ROOT}/${SYSTEM_NAME}/home/john
---- Permissions on vault/sys/chin/home/john -------------------------
Local+Descendent permissions:
user john create,snapshot
Available options:
NAME TYPE NOTES
allow subcommand Must also have the permission that is
being allowed
clone subcommand Must also have the 'create' ability and
'mount'
ability in the origin file system
create subcommand Must also have the 'mount' ability
destroy subcommand Must also have the 'mount' ability
hold subcommand Allows adding a user hold to a snapshot
mount subcommand Allows mount/umount of ZFS datasets
promote subcommand Must also have the 'mount' and 'promote'
ability in the origin file system
receive subcommand Must also have the 'mount' and 'create'
ability
release subcommand Allows releasing a user hold which
might destroy the snapshot
rename subcommand Must also have the 'mount' and 'create'
ability in the new parent
rollback subcommand
send subcommand
share subcommand Allows sharing file systems over NFS or
SMB protocols
snapshot subcommand
groupquota other Allows accessing any groupquota@...
property
groupused other Allows reading any groupused@... property
userprop other Allows changing any user property
userquota other Allows accessing any userquota@...
property
userused other Allows reading any userused@... property
aclinherit property
aclmode property
atime property
canmount property
casesensitivity property
checksum property
compression property
copies property
dedup property
devices property
exec property
logbias property
mlslabel property
mountpoint property
nbmand property
normalization property
primarycache property
quota property
readonly property
recordsize property
refquota property
refreservation property
reservation property
secondarycache property
setuid property
shareiscsi property
sharenfs property
sharesmb property
snapdir property
utf8only property
version property
volblocksize property
volsize property
vscan property
xattr property
zoned property
Prepare Pool Link to heading
With the datasets created, the pool can be configured.
As a precaution and to prevent later issues, unmount the pool and all datasets.
[root]# zfs umount -a
With the pool ready should be exported. This is a necessary step to prevent problems with importing.
[root]# zpool export vault
Setup Installation Link to heading
Import the pool to the location where the installation will be done, /mnt
.
[root]# zpool import -d /dev/disk/by-id -R /mnt vault
The root dataset should be mounted to /mnt
, check with zfs mount
.
An important cache file was created with the pool. Copy it into the new system.
[root]# cp /etc/zfs/zpool.cache /mnt/etc/zfs/zpool.cache
If this cache does not exist, create one.
[root]# zpool set cachefile=/etc/zfs/zpool.cache vault
The datasets can now be mounted. If there are any non ZFS data sets such as a boot partition, or swap, it should be mounted normally.
Create the mount points and mount the non ZFS managed datasets and boot partition. Replace xY for the boot partition. Repeat for all your datasets.
[root]# mkdir /mnt/boot
[root]# mount /dev/sdxY /mnt/boot
[root]# mkdir /mnt/home
[root]# mount -t zfs ${SYS_ROOT}/${SYSTEM_NAME}/home /mnt/home
[root]# mkdir /mnt/usr/local
[root]# mount -t zfs ${SYS_ROOT}/${SYSTEM_NAME}/usr/local /mnt/usr/local
[root]# # Repeat...
With all datasets successfully mounted, legacy datasets can be added to the new fstab. To start with, an fstab can be generated, it will need to be edited to remove any non legacy datasets.
[root]# genfstab -U -p /mnt >> /mnt/etc/fstab
The fstab should contain any partitions or datasets the final system needs, including swap if used.
Edit the mirrorlist to your desired location.
[root]# nano /etc/pacman.d/mirrorlist
Install Link to heading
With everything set up the installation can finally be started.
Install the base system
[root]# pacstrap -i /mnt base base-devel
Configure Ramdisk Link to heading
The mkinitcpio will need some different hooks.
If no separate datasets are used the following hooks should be in the mkinitcpio in a specific order. fsck is not needed with ZFS and should only be there if ext3 or ext4 are used.
Make sure keyboard comes before ZFS so that recovery can be done using the keyboard if necessary..
[root]# nano /mnt/etc/mkinitcpio.conf
# ...
HOOKS="base udev autodetect modconf block keyboard zfs filesystems"
# ...
If a separate data set is used for /usr
the ‘usr’ hook should be enabled. I have also found the ‘shutdown’ hook is also needed to make /var
unmount properly on shutdown.
# ...
HOOKS="base udev autodetect modconf block keyboard zfs usr filesystems shutdown"
# ...
Enter Chroot Link to heading
The install can now be chrooted into.
[root]# arch-chroot /mnt /bin/bash
Setup ZFS Repositories Link to heading
I find using the archzfs repository is the easiest way to install ZFS. If it is preferable, ZFS can also be compiled from source using the AUR, but the archzfs repo has ZFS pre-compiled making it an simple install.
Before proceeding with the install, the ZFS repositories need to be added.
Add the archzfs
repository to /etc/pacman.conf
. The archzfs
repository should be listed first so that is it is the preferred server. Place it above all other mirrors.
[root]# nano /etc/pacman.conf
# REPOSITORIES
[archzfs]
Server = http://archzfs.com/$repo/x86_64
# Other repositories...
Next sign the repository key. Confirm it is correct by checking the Arch unofficial user repositories listing before using.
[root]# pacman-key -r 5E1ABF240EE7A126
[root]# pacman-key --lsign-key 5E1ABF240EE7A126
Install ZFS Link to heading
Now ZFS can be installed, there are a few options from the archzfs repository:
zfs-linux-git
- Packages tracking the zfsonlinux master branch. Recompiled on each kernel release.zfs-linux
- ZOL release packages. Correspond to specific version releases.zfs-linux-lts
- Arch lts linux kernel with ZOL release packages. For people concerned with stability.
I was originally using the git packages but after running into a problem I switched over to the zfs-linux
repository which is the ZOL release version. Unless you are very concerned with staying on the extreme bleeding edge I would recommend using the zfs-linux
repository.
Update the mirrors and install ZFS.
[root]# pacman -Syyu
[root]# pacman -S zfs-linux
Install System Link to heading
At this point the system can be installed as usual. Proceed through until the point where the bootloader would normally be configured.
Bootloader Link to heading
EFI Bootloader Link to heading
My preferred bootloader for its simplicity is ‘gummiboot’, now called ‘systemd-boot
’. When using an EFI system it is what is recommended by the Arch wiki, and what i’d recommend. It will already be installed on Arch by default.
Install systemd-boot
to wherever the esp is mounted, /boot
generally.
[root]# bootctl --path=/boot install
Make the bootloader entry. When using ZFS the extra parameter zfs=<root dataset>
must be added to the list of options. Other than that, bootloader parameters should be the same as a normal install.
[root]# nano /boot/loader/entries/arch.conf
title Arch Linux
linux /vmlinuz-linux
initrd /initramfs-linux.img
options zfs=vault/sys/chin/ROOT/default rw
If you decide to go with a different bootloader, the setup should be the same as normal except for adding zfs=<root dataset>
to the options
BIOS Bootloader Link to heading
If you have a BIOS system you will want to use grub.
After installing grub, run (replace sdx with your drive you’re booting from)
[root]# grub-install --target=i386-pc /dev/sdx
Setup a custom boot entry
# /etc/grub.d/40_custom
#!/bin/sh
exec tail -n +3 $0
set timeout=2
set default=0
# (0) Arch Linux
menuentry "Arch Linux" {
linux /vmlinuz-linux zfs=vault/ROOT/default rw
initrd /intel-ucode.img /initramfs-linux.img
}
After editing run
[root]# grub-mkconfig -o /boot/grub/grub.cfg
You might get the following output
/dev/sda
Installing for i386-pc platform.
grub-install: error: failed to get canonical path of `/dev/ata-SAMSUNG_SSD_830_Series_S0VVNEAC702110-part2'.
A workaround is to symlink the expected partition to the id
[root]# ln -s /dev/sda2 /dev/ata-SAMSUNG_SSD_830_Series_S0VVNEAC702110-part2
Clean Up Link to heading
Once finishing everything necessary to finish installation, it is important to export a pool properly before restarting. Failing to do so can result in the pool not importing at boot.
Exit out of the install.
[root]# exit
Export Pool Link to heading
After exiting out of the install, unmount any normal partitions, followed by any ZFS datasets. The command zfs unmount -a
should take care of unmounting all of the ZFS datasets however if the pool doesn’t want to export they may need to be unmounted by hand.
[root]# umount /mnt/boot
[root]# zfs umount -a
Now the pool can be exported.
[root]# zpool export vault
First Tasks Link to heading
The system should start up normally for the first boot; however, a few tasks are necessary to make sure the system continues to boot properly.
Set the cache file.
[root]# zpool set cachefile=/etc/zfs/zpool.cache vault
To make sure pools are imported automatically, enable zfs.target
.
[root]# systemctl enable zfs.target
If your datasets refuse to automount on boot you may have to play around with switching from legacy mounting to ZFS managed mounting, or vise versa. You may also have to enable certain units such as zfs-import-cache
and zfs-mount
.
Due to problems with the machines host ID being unavailable to the system at boot, the initramfs image needs to be adjusted to store the host ID. The easiest way to do this is to run a program which will save the host ID into the image. The alternative is to pass the host ID to the bootloader in an additional option with spl.spl_hostid=0x<hostid>
.
You can generate a hostid with zgenhostid
[root]# zgenhostid $(hostid)
Now that the system will properly remember it’s host ID, the initramfs should be regenerated
[root]# mkinitcpio -p linux
Problems Link to heading
That should conclude process of setting up ZFS on Arch Linux. Make sure the system boots properly and that all datasets are mounted at boot. If some datasets do not seem to be mounting properly make sure the properties are set right.
ZFS properties can be queried with zfs get <property> <dataset>
so the home
dataset can be checked with:
[root]# zfs get mountpoint vault/sys/chin/home
A property can be set with zfs set <property> <dataset>
.
Follow-up Link to heading
Following getting an installation working, there are plenty of features to play with in in ZFS which I get into in part 3 of this series, Arch Linux on ZFS - Part 3: Backups, Snapshots and Other Features. A few ow these key features to take look at are:
- snapshots - Take atomic snaphots of a system that can be used as a source of backup or saved and rolled back to in an emergency.
- rollback - Revert a dataset back to the state it was in. Can be useful for reverting system breaking changes.
- send and recieve - Systems built directly into ZFS for sending and recieving a stream of data. Can be used in combination with snapshots to send a stream of data over SSH and do incremental backups.
All of the code used in this post is available on my github. I have split the code up into three parts, the code used to setup before the chroot, the code used in the chroot, and the code used after reboot. The scripts are not runnable, but they are a good reference.
Updated on June 1st, 2018