In the last section of this series I explained how to embed ZFS on the Arch Linux install media so that an installation can be completed from the ISO. While in this section I will go through the installation of Arch Linux using ZFS as the root filesystem.

Part Two - Installation

Note: The commands used in this post are available on my github.

Pre-Install Setup

Boot partitions

I would recomend using a dedicated partition wheather using BIOS or UEFI. While you are supposed to be able to store your kernel and any other boot images on a ZFS dataset when using a BIOS bootloader like GRUB, in my experience I have had more luck using a boot partition.

ZFS Setup

Pool Type and Redundancy

ZFS has several different levels of redundancy based on the number of discs that are used and in what setting they are configured. They are different than traditional RAID levels.

A crude description of the different levels:

  • Stripe
    • No redundancy, similar to RAID0.
    • Default pool creation mode.
  • RAIDZ
    • Minimum of 3 discs.
    • One disc can be lost without pool failure.
    • One of the discs is redundant and will not provide storage.
  • RAIDZ2
    • Minimum of 4 discs.
    • Two discs can be lost without pool failure.
    • Two of the discs are redundant and will not provide storage.
  • RAIDZ3
    • Minimum of 5 discs.
    • Three discs can be lost without pool failure.
    • Three of the discs are redundant and will not provide storage.
  • Mirror - Minimum of 2 discs.
    • Half of the discs can be lost without pool failure.
    • Storage from half of the discs are used as redundancy and will not provide storage.
    • Provides the best performance and flexibility at a cost of storage space.

Disk Setup

It’s recommended to use the disk id names as recommended by the ZOL project. The identification and partition number of each drive can be found with:

[root]# ls /dev/disk/by-id/
ata-SanDisk_SDSSDXPS480G_152271401093
ata-SanDisk_SDSSDXPS480G_154501401266

Note: In the following post I will be using the above two sandisk SSDs as an example.

Once the disc IDs are known, the pool can be created. While the disks can be partitioned manually in GPT or MBR, it is not necessary to partition entire drives before creating the pool as ZFS will partition the drives itself as Solaris Root (bf00) in the creation of a new pool.

ZFS has issues being booted off of. The simplest option is to partition the boot partition in another format as would be done with a regular install.

I do not use a swap partition and thus have no experience using one; however, the Arch wiki explains the process of sending one up if necessary.

Pool Creation

After deciding on a pool type and getting the disc IDs, a pool can be created with the zpool create command.

The syntax is:

zpool create [-fnd] [-o property=value] ... \
              [-O file-system-property=value] ... \
              [-m mountpoint] [-R root] ${POOL_NAME} ${DISK}	...

Flags:

  • -f - Force.
  • -n - Display creation but don’t create pool.
  • -d - Do not enable any features unless specified.
  • -o - Set a pool property.
  • -O - Set a property on root filesystem.
  • -m - Mountpoint.
  • -R - Set an alternate root location.

First, probe for ZFS on the system, there should be no output from modprobe.

[root]# modprobe zfs

Then the zpool create command can be used to create a new pool.

When creating a pool ashift=12 will specify advanced format disks, this will force 4096 size blocksize. Here I create a mirrored pool named ‘vault’ with my two SSD’s.

[root]# zpool create -f -o ashift=12 vault mirror \
                ata-SanDisk_SDSSDXPS480G_152271401093 \
                ata-SanDisk_SDSSDXPS480G_154501401266

Test the pool was created successfully with zpool status.

[root]# zpool status
pool: vault
state: ONLINE
scan: scrub repaired 0 in 0h8m with 0 errors on Mon Jun 13 00:08:39 2016
config:

NAME                                       STATE     READ WRITE CKSUM
vault                                      ONLINE       0     0     0
  mirror-0                                 ONLINE       0     0     0
    ata-SanDisk_SDSSDXPS480G_152271401093  ONLINE       0     0     0
    ata-SanDisk_SDSSDXPS480G_154501401266  ONLINE       0     0     0

errors: No known data errors

Properties

There are many properties that can be set on an entire pool or specific data sets. A property that will almost always be wanted on the entire pool is compression. ZFS uses “LZ4” compression which is a great compromise between performance and amount of compression.

Set compression on

[root]# zfs set compression=on vault

By default atime=on is enabled, it can be turned off to increase performance, or set to relatime which is the default on many Linux filesystems.

In recording access time relatime is a good compromise, it will record access time but much more infrequently than the default atime=on setting.

[root]# zfs set atime=on vault
[root]# zfs set relatime=on vault

Dataset Creation

Datasets are similar to partitions; however, they come with many benefits partitions do not have. In addition to being hierarchically organizable, they do not require a specific quota. All of the datasets will share space in a given pool. These qualities in datasets mean that they can be used extensively without repercussions.

ZFS has the ability to manage datasets itself, or to tell the datasets to fall back to the system controlled legacy management where datasets will be managed with the fstab.

I have found using legacy management works best. If legacy mounting fails, which it does in certain circumstances I use ZFS managed mounting.

Key Datasets

In order to create a setup that may be used with boot environments the root filesystem will be contained inside an additional ‘ROOT’ dataset. When used with boot environments this will allow the root filesystem to be cloned between environments while sharing any datasets that are not inside ‘ROOT’. It is not necessary to use boot environments but there is no downside to creating a system that is compatible with them for the future.

At minimum the following data sets should be created inside the root pool, vault/ROOT, and vault/ROOT/default. Most people will also want a separate vault/home dataset that is not contained within the boot environment ‘ROOT’ data set.

  • vault/ROOT
    • Will contain boot environments.
    • The dataset we will be using as the filesystem root ‘default’ will reside within.
    • Will not be mounted using property mountpoint=none.
  • vault/ROOT/default
    • The root dataset boot environment, can be named anything but ‘default’ is the convention for the initial boot environment.
    • Uses legacy mounting with property mountpoint=legacy.
    • Will be mounted at /.
  • vault/home
    • The home dataset.
    • Does not go within vault/ROOT so that it is shared between boot environments.
    • Uses ZFS mounting with property mountpoint=/home.
    • Will be mounted at /home.

Create the vault/ROOT/default and vault/home datasets and set their mount points.

[root]# zfs create -o mountpoint=none vault/ROOT
[root]# zfs create -o mountpoint=legacy vault/ROOT/default
[root]# zfs create -o mountpoint=/home vault/home
Additional Datasets

It is not necessary to create any additional datasets; however, since there is virtually no cost to using them and doing so gives the ability to manipulate properties for each dataset individually, it can make sense to use them for certain key directories such as /tmp, /usr and /var.

I set the properties following the tuning recommendations in various places including the Arch wiki:

For /tmp

  • sync=disabled - Disabling sync will increase performance by ignoring sync requests.
  • devices=off - Prevent use of device nodes for security.
  • setuid=off - Can help prevent privilege-escalation attacks.
  • mountpoint=/tmp - Use ZFS mounting. Create dataset for /tmp:
[root]# zfs create -o setuid=off \
                   -o devices=off \
                   -o sync=disabled \
                   -o mountpoint=/tmp vault/tmp

Since we are using a custom dataset for /tmp, mask (disable) systemd’s automatic tmpfs-backed tmp.

[root]# systemctl mask tmp.mount

For /var:

  • xattr=sa - Stores data in inodes and can increase performance.
  • mountpoint=legacy - Use legacy mounting.

Create dataset for /var:

[root]# zfs create -o xattr=sa -o mountpoint=legacy vault/var

For /usr:

  • mountpoint=legacy - Use legacy mounting.
[root]# zfs create -o mountpoint=legacy vault/usr

Prepare Pool

With the datasets created, the pool can be configured.

As a precaution and to prevent later issues, unmount the pool and all datasets.

[root]# zfs umount -a

The legacy datasets should all be added to the fstab. vault/home and vault/tmp will be mounted by ZFS and do not need to be in the fstab.

# <file system> <dir> <type>  <options> <dump> <pass>
vault/ROOT/default       /       zfs     rw,relatime,xattr,noacl         0 0
vault/var                /var    zfs     rw,relatime,xattr,noacl         0 0
vault/usr                /usr    zfs     rw,relatime,xattr,noacl         0 0

The dataset that is going to be used as ‘root’ and booted from needs to have the bootfs property set.

[root]# zpool set bootfs=vault/ROOT/default vault

With the pool ready it cannot be exported. This is a necessary step to prevent problems with importing.

[root]# zpool export vault

Setup Installation

Import the pool to the location where the installation will be done, /mnt.

[root]# zpool import -d /dev/disk/by-id -R /mnt vault

Unmount zfs mounted datasets if they were automatically mounted

[root]# zfs umount /mnt/tmp
[root]# zfs umount /mnt/home

Mount the root dataset

[root]# mount -t zfs vault/ROOT/default /mnt

An important cache file was created with the pool. Copy it into the new system.

[root]# cp /etc/zfs/zpool.cache /mnt/etc/zfs/zpool.cache

If this cache does not exist, create one.

[root]# zpool set cachefile=/etc/zfs/zpool.cache vault

The datasets can now be mounted. If there are any non ZFS data sets such as a boot partition, it should be mounted normally.

Create the mount points.

[root]# mkdir /mnt/{home,boot,var,usr,tmp}

Mount the non ZFS managed datasets and boot partition. Replace xY for the boot partition

[root]# mount /dev/sdxY /mnt/boot
[root]# mount -t zfs vault/var /mnt/var && mount -t zfs vault/usr /mnt/usr

Mount the ZFS managed datasets.

[root]# zfs mount vault/home && zfs mount vault/tmp

Check everything is successfully mounted

[root]# zfs mount
vault/ROOT/default              /mnt
vault/home                      /mnt/home
vault/tmp                       /mnt/tmp
vault/var                       /mnt/var
vault/usr                       /mnt/usr

With all datasets successfully mounted, legacy datasets can be added to the new fstab. To start with, an fstab can be generated, it will need to be edited to remove any non legacy datasets.

[root]# genfstab -U -p /mnt >> /mnt/etc/fstab

The fstab will look similar to our earlier fstab except it should include the boot partition, and any other partitions or datasets the final system needs including swap if used.

Since my system does not use swap, only the three legacy data sets and a boot partition were needed in my fstab

# <file system> <dir> <type>  <options> <dump> <pass>
vault/ROOT/default       /       zfs     rw,relatime,xattr,noacl         0 0
vault/var                /var    zfs     rw,relatime,xattr,noacl         0 0
vault/usr                /usr    zfs     rw,relatime,xattr,noacl         0 0
UUID=F2F4-47DC          /boot  	vfat rw,relatime,fmask=0022,dmask=0022,codepage=437,iocharset=iso8859-1,shortname=mixed,errors=remount-ro	0 2

Edit the mirrorlist to your desired location.

[root]# nano /etc/pacman.d/mirrorlist

Install

With everything set up the installation can finally be started.

Install the base system

[root]# pacstrap -i /mnt base base-devel

Configure Ramdisk

The mkinitcpio will need some different hooks.

If no separate datasets are used the following hooks should be in the mkinitcpio in a specific order. fsck is not needed with ZFS and should only be there if ext3 or ext4 are used.

Make sure keyboard comes before ZFS so that recovery can be done using the keyboard if necessary..

[root]# nano /mnt/etc/mkinitcpio.conf
# ...
HOOKS="base udev autodetect modconf block keyboard zfs filesystems"
# ...

If a separate data set is used for /usr the ‘usr’ hook should be enabled. I have also found the ‘shutdown’ hook is also needed to make /var unmount properly on shutdown.

# ...
HOOKS="base udev autodetect modconf block keyboard zfs usr filesystems shutdown"
# ...

Enter Chroot

The install can now be chrooted into.

[root]# arch-chroot /mnt /bin/bash

Setup ZFS Repositories

I find using the archzfs repository is the easiest way to install ZFS. If it is preferable, ZFS can also be compiled from source using the AUR, but the archzfs repo has ZFS pre-compiled making it an simple install.

Before proceeding with the install, the ZFS repositories need to be added.

Add the archzfs repository to /etc/pacman.conf. The archzfs repository should be listed first so that is it is the preferred server. Place it above all other mirrors.

[root]# nano /etc/pacman.conf
# REPOSITORIES
[archzfs]
Server = http://archzfs.com/$repo/x86_64

# Other repositories...

Next sign the repository key. Confirm it is correct by checking the Arch unofficial user repositories listing before using.

[root]# pacman-key -r 5E1ABF240EE7A126
[root]# pacman-key --lsign-key 5E1ABF240EE7A126

Install ZFS

Now ZFS can be installed, there are a few options from the archzfs repository:

I was originally using the git packages but after running into a problem I switched over to the zfs-linux repository which is the ZOL release version. Unless you are very concerned with staying on the extreme bleeding edge I would recommend using the zfs-linux repository.

Update the mirrors and install ZFS.

[root]# pacman -Syyu
[root]# pacman -S  zfs-linux

Install System

At this point the system can be installed as usual. Proceed through until the point where the bootloader would normally be configured.

Bootloader

EFI Bootloader

My preferred bootloader for its simplicity is ‘gummiboot’, now called ‘systemd-boot’. When using an EFI system it is what is recommended by the Arch wiki, and what i’d recommend. It will already be installed on Arch by default.

Install systemd-boot to wherever the esp is mounted, /boot generally.

[root]# bootctl --path=/boot install

Make the bootloader entry. When using ZFS the extra parameter zfs=<root dataset> must be added to the list of options. Other than that, bootloader parameters should be the same as a normal install.

[root]# nano /boot/loader/entries/arch.conf
title     Arch Linux
linux     /vmlinuz-linux
initrd    /initramfs-linux.img
options   zfs=vault/ROOT/default rw

If you decide to go with a different bootloader, the setup should be the same as normal except for adding zfs=<root dataset> to the options

BIOS Bootloader

If you have a BIOS system you will want to use grub.

After installing grub, run (replace sdx with your drive you’re booting from)

[root]# grub-install --target=i386-pc /dev/sdx

Setup a custom boot entry

# /etc/grub.d/40_custom

#!/bin/sh
exec tail -n +3 $0

set timeout=2
set default=0

# (0) Arch Linux
menuentry "Arch Linux" {
    linux /vmlinuz-linux zfs=vault/ROOT/default rw
    initrd /intel-ucode.img /initramfs-linux.img
}

After editing run

[root]# grub-mkconfig -o /boot/grub/grub.cfg

You might get the following output

/dev/sda
Installing for i386-pc platform.
grub-install: error: failed to get canonical path of `/dev/ata-SAMSUNG_SSD_830_Series_S0VVNEAC702110-part2'.

A workaround is to symlink the expected partition to the id

[root]# ln -s /dev/sda2 /dev/ata-SAMSUNG_SSD_830_Series_S0VVNEAC702110-part2

Clean Up

Once finishing everything necessary to finish installation, it is important to export a pool properly before restarting. Failing to do so can result in the pool not importing at boot.

Exit out of the install.

[root]# exit

Export Pool

After exiting out of the install, unmount any normal partitions, followed by any ZFS datasets. The command zfs unmount -a should take care of unmounting all of the ZFS datasets however if the pool doesn’t want to export they may need to be unmounted by hand.

[root]# umount /mnt/boot
[root]# zfs umount -a

Now the pool can be exported.

[root]# zpool export vault

First Tasks

The system should start up normally for the first boot; however, a few tasks are necessary to make sure the system continues to boot properly.

Set the cache file.

[root]# zpool set cachefile=/etc/zfs/zpool.cache vault

To make sure pools are imported automatically, enable zfs.target.

[root]# systemctl enable zfs.target

If your datasets refuse to automount on boot you may have to play around with switching from legacy mounting to ZFS managed mounting, or vise versa. You may also have to enable certain units such as zfs-import-cache and zfs-mount.

Due to problems with the machines host ID being unavailable to the system at boot, the initramfs image needs to be adjusted to store the host ID. The easiest way to do this is to run a program which will save the host ID into the image. The alternative is to pass the host ID to the bootloader in an additional option.

You can clone the file from my github.

[root]# git clone https://gist.github.com/8ae4bc7e2f5236c714f8e822001ac842.git writehostid

Or copy it to a file in the new system.

#include <stdio.h>
#include <errno.h>
#include <unistd.h>

int main() {
    int res;
    res = sethostid(gethostid());
    if (res != 0) {
        switch (errno) {
            case EACCES:
            fprintf(stderr, "Error! No permission to write the"
                         " file used to store the host ID.\n"
                         "Are you root?\n");
            break;
            case EPERM:
            fprintf(stderr, "Error! The calling process's effective"
                            " user or group ID is not the same as"
                            " its corresponding real ID.\n");
            break;
            default:
            fprintf(stderr, "Unknown error.\n");
        }
        return 1;
    }
    return 0;
}

Compile the program, give it execute permissions, and execute it.

[root]# gcc -o writehostid writehostid.c
[root]# chmod +x writehostid
[root]# ./writehostid

Now that the system will properly remember it’s host ID, the initramfs should be regenerated

[root]# mkinitcpio -p linux

Problems

That should conclude process of setting up ZFS on Arch Linux. Make sure the system boots properly and that all datasets are mounted at boot. If some datasets do not seem to be mounting properly make sure the properties are set right.

I have had trouble having the home dataset and tmp dataset mounted by the the fstab and have had better success mounting them with ZFS. If they are not being mounted make sure their property is set to mountpoint=<mountpoint>.

ZFS properties can be queried with zfs get <property> <dataset> so the home dataset and tmp dataset can be checked with

[root]# zfs get mountpoint vault/tmp
[root]# zfs get mountpoint vault/home

A property can be set with zfs set <property> <dataset>.

Follow-up

Following getting an installation working, there are plenty of features to play with in in ZFS which I get into in part 3 of this series, Arch Linux on ZFS - Part 3: Backups, Snapshots and Other Features. A few ow these key features to take look at are:

  • snapshots - Take atomic snaphots of a system that can be used as a source of backup or saved and rolled back to in an emergency.
  • rollback - Revert a dataset back to the state it was in. Can be useful for reverting system breaking changes.
  • send and recieve - Systems built directly into ZFS for sending and recieving a stream of data. Can be used in combination with snapshots to send a stream of data over SSH and do incremental backups.

All of the code used in this post is available on my github. I have split the code up into three parts, the code used to setup before the chroot, the code used in the chroot, and the code used after reboot. The scripts are not runnable, but they are a good reference.