In the last section of this series I discussed using ZFS snapshots, ZFS send and using other interesting features ZFS has to offer. In this post I discuss setting up a arch system using a ZFS pool as a root filesystem

I originally posted this on June 23rd 2016, since then I have learned a fair bit about setting up a system on ZFS, the following set up is my revised configuration as of September 2nd 2017.

Part Two - Installation Link to heading

Pre-Install Setup Link to heading

Boot partitions Link to heading

I would recomend using a dedicated partition wheather using BIOS or UEFI. While you are supposed to be able to store your kernel and any other boot images on a ZFS dataset when using a BIOS bootloader like GRUB, in my experience I have had more luck using a boot partition.

ZFS Setup Link to heading

Pool Type and Redundancy Link to heading

ZFS has several different levels of redundancy based on the number of discs that are used and in what setting they are configured. They are different than traditional RAID levels.

A crude description of the different levels:

Stripe
- No redundancy, similar to RAID0.
- Default pool creation mode.
RAIDZ
- Minimum of 3 discs.
- One disc can be lost without pool failure.
- One of the discs is redundant and will not provide storage.
RAIDZ2
- Minimum of 4 discs.
- Two discs can be lost without pool failure.
- Two of the discs are redundant and will not provide storage.
RAIDZ3
- Minimum of 5 discs.
- Three discs can be lost without pool failure.
- Three of the discs are redundant and will not provide storage.
Mirror - Minimum of 2 discs.
- Half of the discs can be lost without pool failure.
- Storage from half of the discs are used as redundancy and will not provide storage.
- Provides the best performance and flexibility at a cost of storage space.

Disk Setup Link to heading

It’s recommended to use the disk id names as recommended by the ZOL project. The identification and partition number of each drive can be found with:

[root]# ls /dev/disk/by-id/
ata-SanDisk_SDSSDXPS480G_152271401093
ata-SanDisk_SDSSDXPS480G_154501401266

Note: In the following post I will be using the above two sandisk SSDs as an example.

Once the disc IDs are known, the pool can be created. While the disks can be partitioned manually in GPT or MBR, it is not necessary to partition entire drives before creating the pool as ZFS will partition the drives itself as Solaris Root (bf00) in the creation of a new pool.

ZFS has issues being booted off of. The simplest option is to partition the boot partition in another format as would be done with a regular install.

I do not use a swap partition and thus have no experience using one; however, the Arch wiki explains the process of sending one up if necessary.

Pool Creation Link to heading

After deciding on a pool type and getting the disc IDs, a pool can be created with the zpool create command.

The syntax is:

zpool create [-fnd] [-o property=value] ... \
              [-O file-system-property=value] ... \
              [-m mountpoint] [-R root] ${POOL_NAME} ${DISK}	...

Flags:

-f - Force.
-n - Display creation but don’t create pool.
-d - Do not enable any features unless specified.
-o - Set a pool property.
-O - Set a property on root filesystem.
-m - Mountpoint.
-R - Set an alternate root location.

First, probe for ZFS on the system, there should be no output from modprobe.

[root]# modprobe zfs

Then the zpool create command can be used to create a new pool.

When creating a pool ashift=12 will specify advanced format disks, this will force 4096 size blocksize. Here I create a mirrored pool named ‘vault’ with my two SSD’s.

[root]# zpool create -f -o ashift=12 vault mirror \
                ata-SanDisk_SDSSDXPS480G_152271401093 \
                ata-SanDisk_SDSSDXPS480G_154501401266

Test the pool was created successfully with zpool status.

[root]# zpool status
pool: vault
state: ONLINE
scan: scrub repaired 0 in 0h8m with 0 errors on Mon Jun 13 00:08:39 2016
config:

NAME                                       STATE     READ WRITE CKSUM
vault                                      ONLINE       0     0     0
  mirror-0                                 ONLINE       0     0     0
    ata-SanDisk_SDSSDXPS480G_152271401093  ONLINE       0     0     0
    ata-SanDisk_SDSSDXPS480G_154501401266  ONLINE       0     0     0

errors: No known data errors

Properties Link to heading

There are many properties that can be set on an entire pool or specific data sets. A property that will almost always be wanted on the entire pool is compression. ZFS uses “LZ4” compression which is a great compromise between performance and amount of compression.

Set compression on

[root]# zfs set compression=on vault

By default atime=on is enabled, it can be turned off to increase performance, or set to relatime which is the default on many Linux filesystems.

In recording access time relatime is a good compromise, it will record access time but much more infrequently than the default atime=on setting.

[root]# zfs set atime=on vault
[root]# zfs set relatime=on vault

Dataset Creation Link to heading

Datasets are similar to partitions; however, they come with many benefits partitions do not have. In addition to being hierarchically organizable, they do not require a specific quota. All of the datasets will share space in a given pool. These qualities in datasets mean that they can be used extensively without repercussions.

ZFS has the ability to manage datasets itself, or to tell the datasets to fall back to the system controlled legacy management where datasets will be managed with the fstab.

I have found using legacy management works best. If legacy mounting fails, which it does in certain circumstances I use ZFS managed mounting.

I’ll use a few variables to represent different locations in the pool for datasets.

SYS_ROOT=vault/sys - The location of any systems on the pool. Alternatively Set to the root dataset, e.g. vault, if you plan on only ever running a single system on the ZFS pool.
DATA_ROOT=vault/data - System shared data.

Key Datasets Link to heading

In order to create a setup that may be used with boot environments the root filesystem will be contained inside an additional ‘ROOT’ dataset. When used with boot environments this will allow the root filesystem to be cloned between environments while sharing any datasets that are not inside ‘ROOT’. It is not necessary to use boot environments but there is no downside to creating a system that is compatible with them for the future.

At minimum the following datasets should be created inside the root pool, ${SYS_ROOT}/${SYSTEM_NAME}/ROOT, and ${SYS_ROOT}/${SYSTEM_NAME}/ROOT/default. Most people will also want a separate ${SYS_ROOT}/${SYSTEM_NAME}/home dataset that is not contained within the boot environment ‘ROOT’ data set.

${SYS_ROOT}/${SYSTEM_NAME}/ROOT
- Will contain boot environments.
- The dataset we will be using as the filesystem root ‘default’ will reside within.
- Will not be mounted using property mountpoint=none.
${SYS_ROOT}/${SYSTEM_NAME}/ROOT/default
- The root dataset boot environment, can be named anything but ‘default’ is the convention for the initial boot environment.
- Uses ZFS mounting with property mountpoint=/.
${SYS_ROOT}/${SYSTEM_NAME}/home
- The home dataset.
- Does not go within ${SYS_ROOT}/${SYSTEM_NAME}/ROOT so that it is shared between boot environments.
- Uses legacy mounting with property mountpoint=legacy.
- Will be mounted at /home.

For example, my current system’s boot environment which will be mounted to /:

vault/sys/chin/ROOT/default

In this configuration it makes it easy to dual boot multiple systems off of a single ZFS pool. To create a new system just add a new dataset under vault/sys, and set it up as normal. This should even work dual booting Linux and FreeBSD.

Create the ${SYS_ROOT}/${SYSTEM_NAME}/ROOT/default and ${SYS_ROOT}/${SYSTEM_NAME}/home datasets and set their mount points.

[root]# SYS_ROOT=vault/sys; SYSTEM_NAME=chin
[root]# zfs create -o mountpoint=none -p ${SYS_ROOT}/${SYSTEM_NAME}
[root]# zfs create -o mountpoint=none ${SYS_ROOT}/${SYSTEM_NAME}/ROOT
[root]# zfs create -o mountpoint=/ ${SYS_ROOT}/${SYSTEM_NAME}/ROOT/default
[root]# zfs create -o mountpoint=legacy ${SYS_ROOT}/${SYSTEM_NAME}/home

Additional Datasets Link to heading

It is not necessary to create any additional datasets; however, since there is virtually no cost to using them and doing so gives the ability to manipulate properties for each dataset individually.

I set the properties following the tuning recommendations in various places including the Arch wiki:

canmount=off Datasets Link to heading

Set /var, /var/lib and /usr to canmount=off meaning they’re not mounted and are only there to create the ZFS dataset structure. This will put their data in the boot environment dataset. Their properties will be inherited by any child datasets.

Properties for /var:

xattr=sa - Stores data in inodes and can increase performance.

Create the datasets:

[root]# zfs create -o canmount=off -o mountpoint=/var -o xattr=sa ${SYS_ROOT}/${SYSTEM_NAME}/var
[root]# zfs create -o canmount=off -o mountpoint=/var/lib ${SYS_ROOT}/${SYSTEM_NAME}/var/lib
[root]# zfs create -o canmount=off -o mountpoint=/var/lib/systemd ${SYS_ROOT}/${SYSTEM_NAME}/var/lib/systemd
[root]# zfs create -o canmount=off -o mountpoint=/usr ${SYS_ROOT}/${SYSTEM_NAME}/usr

System Datasets Link to heading

The rest of the datasets will be independent from the boot environment and will not change between boot environments.

I keep some datasets like /var/cache’s’ dataset seperate to avoid having to snapshot and backup their data. I also keep /var/log ’s’ dataset seperate so the logs are always available as well as the datasets for my containers and VMs.

NOTE: For the following for loop work in zsh, you need to make zsh split over whitespace with set -o shwordsplit

[root]# SYSTEM_DATASETS='var/lib/systemd/coredump var/log var/log/journal var/lib/lxc var/lib/lxd var/lib/machines var/lib/libvirt var/cache usr/local'
[root]# SYS_ROOT=vault/sys; SYSTEM_NAME=chin;
[root]# for ds in ${SYSTEM_DATASETS}; do zfs create -o mountpoint=legacy ${SYS_ROOT}/${SYSTEM_NAME}/${ds}; done

Create and turn on posixacls for systemd-journald’s /var/log/journal dataset.

[root]# zfs create -o mountpoint=legacy -o acltype=posixacl ${SYS_ROOT}/${SYSTEM_NAME}/var/log/journal

User Datasets Link to heading

I create extensive user datasets outside the boot environment.

[root]# USER_DATASETS='john john/local john/config john/cache'
[root]# for ds in ${USER_DATASETS}; do zfs create -o mountpoint=legacy ${SYS_ROOT}/${SYSTEM_NAME}/home/${ds}; done

I create ${SYS_ROOT}/${SYSTEM_NAME}/home/john/local/share and set it to canmount=off just to create the ZFS dataset structure.

[root]# zfs create -o mountpoint=/home/john/.local/share -o canmount=off ${SYS_ROOT}/${SYSTEM_NAME}/home/john/local/share
[root]# zfs create -o mountpoint=legacy ${SYS_ROOT}/${SYSTEM_NAME}/home/john/local/share/Steam

Storage Datasets Link to heading

I’ll be creating my data directories under vault/data, they exist outside the different systems and are shared between them.

[root]# DATA_ROOT=vault/data;
[root]# zfs create -o mountpoint=none ${DATA_ROOT}

DATA_DATASETS='Books Computer Personal Pictures University Workspace Reference'
[root]# for ds in ${DATA_DATASETS}; do zfs create -o mountpoint=legacy ${DATA_ROOT}/${ds}; done

Final Structure Link to heading

So my system ends up as.

[root]# zfs list -o name | grep -E 'chin|data'

vault/data                                   768K   860G    96K  none
vault/data/Books                              96K   860G    96K  legacy
vault/data/Computer                           96K   860G    96K  legacy
vault/data/Personal                           96K   860G    96K  legacy
vault/data/Pictures                           96K   860G    96K  legacy
vault/data/Reference                          96K   860G    96K  legacy
vault/data/University                         96K   860G    96K  legacy
vault/data/Workspace                          96K   860G    96K  legacy
vault/sys/chin                              1.97M   860G    96K  none
vault/sys/chin/ROOT                          192K   860G    96K  none
vault/sys/chin/ROOT/default                   96K   860G    96K  /
vault/sys/chin/home                          672K   860G    96K  legacy
vault/sys/chin/home/john                     576K   860G    96K  legacy
vault/sys/chin/home/john/cache                96K   860G    96K  legacy
vault/sys/chin/home/john/config               96K   860G    96K  legacy
vault/sys/chin/home/john/local               288K   860G    96K  legacy
vault/sys/chin/home/john/local/share         192K   860G    96K  /home/john/.local/share
vault/sys/chin/home/john/local/share/Steam    96K   860G    96K  legacy
vault/sys/chin/usr                           192K   860G    96K  /usr
vault/sys/chin/usr/local                      96K   860G    96K  legacy
vault/sys/chin/var                           864K   860G    96K  /var
vault/sys/chin/var/cache                      96K   860G    96K  legacy
vault/sys/chin/var/lib                       576K   860G    96K  /var/lib
vault/sys/chin/var/lib/lxc                    96K   860G    96K  legacy
vault/sys/chin/var/lib/lxd                    96K   860G    96K  legacy
vault/sys/chin/var/lib/machines               96K   860G    96K  legacy
vault/sys/chin/var/lib/libvirt                96K   860G    96K  legacy
vault/sys/chin/var/lib/systemd               192K   860G    96K  /var/lib/systemd
vault/sys/chin/var/lib/systemd/coredump       96K   860G    96K  legacy
vault/sys/chin/var/log                        96K   860G    96K  legacy

User Delegation Link to heading

As of zfsonlinux 0.7.0 ZFS delegation using zfs allow works on linux. I delegate all datasets under ${SYS_ROOT}/${SYSTEM_NAME}/home/john to my user ‘john’ giving the abiity to snapshot and create datasets.

zfs allow john create,mount,mountpoint,snapshot ${SYS_ROOT}/${SYSTEM_NAME}/home/john

Checking permissions shows john’s permissions.

zfs allow ${SYS_ROOT}/${SYSTEM_NAME}/home/john

---- Permissions on vault/sys/chin/home/john -------------------------
Local+Descendent permissions:
        user john create
[root@chin ~]# zfs allow john snapshot ${SYS_ROOT}/${SYSTEM_NAME}/home/john
[root@chin ~]# zfs allow ${SYS_ROOT}/${SYSTEM_NAME}/home/john
---- Permissions on vault/sys/chin/home/john -------------------------
Local+Descendent permissions:
        user john create,snapshot

Available options:

NAME             TYPE           NOTES
allow            subcommand     Must also have the permission that is
                                being allowed
clone            subcommand     Must also have the 'create' ability and
                                'mount'
                                ability in the origin file system
create           subcommand     Must also have the 'mount' ability
destroy          subcommand     Must also have the 'mount' ability
hold             subcommand     Allows adding a user hold to a snapshot
mount            subcommand     Allows mount/umount of ZFS datasets
promote          subcommand     Must also have the 'mount' and 'promote'
                                ability in the origin file system
receive          subcommand     Must also have the 'mount' and 'create'
                                ability
release          subcommand     Allows releasing a user hold which
                                might destroy the snapshot
rename           subcommand     Must also have the 'mount' and 'create'
                                ability in the new parent
rollback         subcommand
send             subcommand
share            subcommand     Allows sharing file systems over NFS or
                                SMB protocols
snapshot         subcommand
groupquota       other          Allows accessing any groupquota@...
                                property
groupused        other          Allows reading any groupused@... property
userprop         other          Allows changing any user property
userquota        other          Allows accessing any userquota@...
                                property
userused         other          Allows reading any userused@... property
aclinherit       property
aclmode          property
atime            property
canmount         property
casesensitivity  property
checksum         property
compression      property
copies           property
dedup            property
devices          property
exec             property
logbias          property
mlslabel         property
mountpoint       property
nbmand           property
normalization    property
primarycache     property
quota            property
readonly         property
recordsize       property
refquota         property
refreservation   property
reservation      property
secondarycache   property
setuid           property
shareiscsi       property
sharenfs         property
sharesmb         property
snapdir          property
utf8only         property
version          property
volblocksize     property
volsize          property
vscan            property
xattr            property
zoned            property

Prepare Pool Link to heading

With the datasets created, the pool can be configured.

As a precaution and to prevent later issues, unmount the pool and all datasets.

[root]# zfs umount -a

With the pool ready should be exported. This is a necessary step to prevent problems with importing.

[root]# zpool export vault

Setup Installation Link to heading

Import the pool to the location where the installation will be done, /mnt.

[root]# zpool import -d /dev/disk/by-id -R /mnt vault

The root dataset should be mounted to /mnt, check with zfs mount.

An important cache file was created with the pool. Copy it into the new system.

[root]# cp /etc/zfs/zpool.cache /mnt/etc/zfs/zpool.cache

If this cache does not exist, create one.

[root]# zpool set cachefile=/etc/zfs/zpool.cache vault

The datasets can now be mounted. If there are any non ZFS data sets such as a boot partition, or swap, it should be mounted normally.

Create the mount points and mount the non ZFS managed datasets and boot partition. Replace xY for the boot partition. Repeat for all your datasets.

[root]# mkdir /mnt/boot
[root]# mount /dev/sdxY /mnt/boot
[root]# mkdir /mnt/home
[root]# mount -t zfs ${SYS_ROOT}/${SYSTEM_NAME}/home /mnt/home
[root]# mkdir /mnt/usr/local
[root]# mount -t zfs ${SYS_ROOT}/${SYSTEM_NAME}/usr/local /mnt/usr/local
[root]# # Repeat...

With all datasets successfully mounted, legacy datasets can be added to the new fstab. To start with, an fstab can be generated, it will need to be edited to remove any non legacy datasets.

[root]# genfstab -U -p /mnt >> /mnt/etc/fstab

The fstab should contain any partitions or datasets the final system needs, including swap if used.

Edit the mirrorlist to your desired location.

[root]# nano /etc/pacman.d/mirrorlist

Install Link to heading

With everything set up the installation can finally be started.

Install the base system

[root]# pacstrap -i /mnt base base-devel

Configure Ramdisk Link to heading

The mkinitcpio will need some different hooks.

If no separate datasets are used the following hooks should be in the mkinitcpio in a specific order. fsck is not needed with ZFS and should only be there if ext3 or ext4 are used.

Make sure keyboard comes before ZFS so that recovery can be done using the keyboard if necessary..

[root]# nano /mnt/etc/mkinitcpio.conf

# ...
HOOKS="base udev autodetect modconf block keyboard zfs filesystems"
# ...

If a separate data set is used for /usr the ‘usr’ hook should be enabled. I have also found the ‘shutdown’ hook is also needed to make /var unmount properly on shutdown.

# ...
HOOKS="base udev autodetect modconf block keyboard zfs usr filesystems shutdown"
# ...

Enter Chroot Link to heading

The install can now be chrooted into.

[root]# arch-chroot /mnt /bin/bash

Setup ZFS Repositories Link to heading

I find using the archzfs repository is the easiest way to install ZFS. If it is preferable, ZFS can also be compiled from source using the AUR, but the archzfs repo has ZFS pre-compiled making it an simple install.

Before proceeding with the install, the ZFS repositories need to be added.

Add the archzfs repository to /etc/pacman.conf. The archzfs repository should be listed first so that is it is the preferred server. Place it above all other mirrors.

[root]# nano /etc/pacman.conf

# REPOSITORIES
[archzfs]
Server = http://archzfs.com/$repo/x86_64

# Other repositories...

Next sign the repository key. Confirm it is correct by checking the Arch unofficial user repositories listing before using.

[root]# pacman-key -r 5E1ABF240EE7A126
[root]# pacman-key --lsign-key 5E1ABF240EE7A126

Install ZFS Link to heading

Now ZFS can be installed, there are a few options from the archzfs repository:

zfs-linux-git - Packages tracking the zfsonlinux master branch. Recompiled on each kernel release.
zfs-linux - ZOL release packages. Correspond to specific version releases.
zfs-linux-lts - Arch lts linux kernel with ZOL release packages. For people concerned with stability.

I was originally using the git packages but after running into a problem I switched over to the zfs-linux repository which is the ZOL release version. Unless you are very concerned with staying on the extreme bleeding edge I would recommend using the zfs-linux repository.

Update the mirrors and install ZFS.

[root]# pacman -Syyu
[root]# pacman -S  zfs-linux

Install System Link to heading

At this point the system can be installed as usual. Proceed through until the point where the bootloader would normally be configured.

Bootloader Link to heading

EFI Bootloader Link to heading

My preferred bootloader for its simplicity is ‘gummiboot’, now called ‘systemd-boot’. When using an EFI system it is what is recommended by the Arch wiki, and what i’d recommend. It will already be installed on Arch by default.

Install systemd-boot to wherever the esp is mounted, /boot generally.

[root]# bootctl --path=/boot install

Make the bootloader entry. When using ZFS the extra parameter zfs=<root dataset> must be added to the list of options. Other than that, bootloader parameters should be the same as a normal install.

[root]# nano /boot/loader/entries/arch.conf

title     Arch Linux
linux     /vmlinuz-linux
initrd    /initramfs-linux.img
options   zfs=vault/sys/chin/ROOT/default rw

If you decide to go with a different bootloader, the setup should be the same as normal except for adding zfs=<root dataset> to the options

BIOS Bootloader Link to heading

If you have a BIOS system you will want to use grub.

After installing grub, run (replace sdx with your drive you’re booting from)

[root]# grub-install --target=i386-pc /dev/sdx

Setup a custom boot entry

# /etc/grub.d/40_custom

#!/bin/sh
exec tail -n +3 $0

set timeout=2
set default=0

# (0) Arch Linux
menuentry "Arch Linux" {
    linux /vmlinuz-linux zfs=vault/ROOT/default rw
    initrd /intel-ucode.img /initramfs-linux.img
}

After editing run

[root]# grub-mkconfig -o /boot/grub/grub.cfg

You might get the following output

/dev/sda
Installing for i386-pc platform.
grub-install: error: failed to get canonical path of `/dev/ata-SAMSUNG_SSD_830_Series_S0VVNEAC702110-part2'.

A workaround is to symlink the expected partition to the id

[root]# ln -s /dev/sda2 /dev/ata-SAMSUNG_SSD_830_Series_S0VVNEAC702110-part2

Clean Up Link to heading

Once finishing everything necessary to finish installation, it is important to export a pool properly before restarting. Failing to do so can result in the pool not importing at boot.

Exit out of the install.

[root]# exit

Export Pool Link to heading

After exiting out of the install, unmount any normal partitions, followed by any ZFS datasets. The command zfs unmount -a should take care of unmounting all of the ZFS datasets however if the pool doesn’t want to export they may need to be unmounted by hand.

[root]# umount /mnt/boot
[root]# zfs umount -a

Now the pool can be exported.

[root]# zpool export vault

First Tasks Link to heading

The system should start up normally for the first boot; however, a few tasks are necessary to make sure the system continues to boot properly.

Set the cache file.

[root]# zpool set cachefile=/etc/zfs/zpool.cache vault

To make sure pools are imported automatically, enable zfs.target.

[root]# systemctl enable zfs.target

If your datasets refuse to automount on boot you may have to play around with switching from legacy mounting to ZFS managed mounting, or vise versa. You may also have to enable certain units such as zfs-import-cache and zfs-mount.

Due to problems with the machines host ID being unavailable to the system at boot, the initramfs image needs to be adjusted to store the host ID. The easiest way to do this is to run a program which will save the host ID into the image. The alternative is to pass the host ID to the bootloader in an additional option with spl.spl_hostid=0x<hostid>.

You can generate a hostid with zgenhostid

[root]# zgenhostid $(hostid)

Now that the system will properly remember it’s host ID, the initramfs should be regenerated

[root]# mkinitcpio -p linux

Problems Link to heading

That should conclude process of setting up ZFS on Arch Linux. Make sure the system boots properly and that all datasets are mounted at boot. If some datasets do not seem to be mounting properly make sure the properties are set right.

ZFS properties can be queried with zfs get <property> <dataset> so the home dataset can be checked with:

[root]# zfs get mountpoint vault/sys/chin/home

A property can be set with zfs set <property> <dataset>.

Follow-up Link to heading

Following getting an installation working, there are plenty of features to play with in in ZFS which I get into in part 3 of this series, Arch Linux on ZFS - Part 3: Backups, Snapshots and Other Features. A few ow these key features to take look at are:

snapshots - Take atomic snaphots of a system that can be used as a source of backup or saved and rolled back to in an emergency.
rollback - Revert a dataset back to the state it was in. Can be useful for reverting system breaking changes.
send and recieve - Systems built directly into ZFS for sending and recieving a stream of data. Can be used in combination with snapshots to send a stream of data over SSH and do incremental backups.

All of the code used in this post is available on my github. I have split the code up into three parts, the code used to setup before the chroot, the code used in the chroot, and the code used after reboot. The scripts are not runnable, but they are a good reference.

Updated on June 1st, 2018