Use Your Storage Space More Effectively with ZFS: Exploring vdevs

Prerequisites

Log in to the ECS Cloud Console, create a new instance and choose the newest Ubuntu distribution that is available. Make sure the version number starts with an even number such as 16, 18, 20, as these are Long Term Support, stable releases. Odd version numbers represent development releases which don’t have the same level of stability and support longevity. Assign at least 2GB of memory to the instance and go with 4GB or more if you want to improve performance. The more memory ZFS has, the more data it can cache so that frequently accessed information is delivered much faster to users and applications.

Install the ZFS Utilities

Connect to your instance with an SSH client and log in as root. Upgrade all packages to bring your system up to date with all of the latest bug and security fixes:

apt update && apt upgrade
systemctl reboot
apt install zfsutils-linux

Create Disk vdevs

The benefits of this structure are:

  1. You get to use all of the disks’ capacity to store data since you don’t have to dedicate any space to redundancy (usable storage space is maximized).
  2. Adding multiple vdevs of this type in a single pool gives you the largest increases in read/write performance.
  3. Can be converted to a redundant vdev later on by attaching a device and creating a mirror (zpool attach). You will have to mirror each disk vdev in the pool though.
  1. Only one copy of data exists and if any disk vdev in a pool fails, you lose the entire pool.
  2. ZFS can’t automatically recover corrupted data (no self healing benefits)
lsblk
zpool create -f first vdb vdc
root@ubuntu:~# zpool create first vdb vdc
invalid vdev specification
use '-f' to override the following errors:
/dev/vdb1 is part of potentially active pool 'fourth'
root@ubuntu:~# zpool create second vdd vde
invalid vdev specification
use '-f' to override the following errors:
/dev/vdd does not contain an EFI label but it may contain partition
information in the MBR.
/dev/vde does not contain an EFI label but it may contain partition
information in the MBR.
zpool status
root@ubuntu:~# zpool status
pool: first
state: ONLINE
scan: none requested
config:
NAME STATE READ WRITE CKSUM
first ONLINE 0 0 0
vdb ONLINE 0 0 0
vdc ONLINE 0 0 0
errors: No known data errors
zpool list
zpool list -v
zpool add -f first vdd vde
zpool destroy first

Create Mirror vdev

Mirrors have the following advantages:

  1. Compared to Raid-Z, resilvering (rebuilding lost data) is faster and less stressful on the system and physical devices.
  2. The structure is flexible, you can add devices to a mirror to increase reliability and you can remove devices when more than two are used (with zpool detach).
  3. Since all of the physical devices contain the same data, ZFS can read it faster by splitting requests across every disk.
  1. A large part of the storage capacity has to be dedicated to redundancy, giving you the least amount of usable space out of all vdevs.
  2. Since every physical device has to store the same data, write performance doesn’t increase (you get the same performance you would get out of a single disk).
zpool create -f second mirror vdb vdc
root@ubuntu:~# zpool status
pool: second
state: ONLINE
scan: none requested
config:
NAME STATE READ WRITE CKSUM
second ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
vdb ONLINE 0 0 0
vdc ONLINE 0 0 0
errors: No known data errors
zpool attach -f second vdb vdd
zpool destroy second
zpool create -f second mirror vdb vdc
zpool add -f second mirror vdd vde
root@ubuntu:~# zpool status
pool: second
state: ONLINE
scan: none requested
config:
NAME STATE READ WRITE CKSUM
second ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
vdb ONLINE 0 0 0
vdc ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
vdd ONLINE 0 0 0
vde ONLINE 0 0 0
errors: No known data errors
zpool destroy second

Create Raid-Z vdev

In a Raid-Z array, what is called parity is added to all data so that it can be reconstructed in case of partial failures. Parity information is distributed to all physical devices, so there is no dedicated device that stores it. With single parity, data can be recovered after losing any one of the storage devices. With double parity (Raid-Z2) you can lose two and with triple you can lose three.

  1. While mirrors made of 2 devices cut usable space to 50%, 3 devices to 33% and so on, Raid-Z offers more usable space if you use more devices per vdev. For example, a single parity (Raid-Z1) made of 3 identical devices, offers 66% of usable storage (⅔–2 devices out of 3 for data, 1 for parity). 4 devices give you 75% usable storage (¾, 3 out of 4 for data, 1 for parity) and so on. Avoid using too many devices per vdev though, since that leads to increased potential for failure, plus more computation power needed for parity calculations and resilvering.
  2. Better write performance, in comparison to mirrors.
  1. Resilvering Raid-Z is slower than resilvering mirrors.
  2. Inflexible, you can’t add/remove devices to/from a Raid-Z vdev after it has been created.
  3. Read performance is slightly lower than what you get with mirrors.
zpool create -f third raidz1 vdb vdc vdd vde
zpool destroy third

Create Log (SLOG), Cache (L2ARC) and Spare vdevs

As previously mentioned, log and cache vdevs are useful only if the physical devices used to back them are faster than those used for storing data. Example: you use hard-disks in a Raid-Z2 array and you add an SSD to a log vdev and another SSD to a cache vdev to significantly increase your storage pool performance. This type of structure is useful to decrease financial costs, especially when you need enormous amounts of storage. But if cost is not an issue or you don’t need to store very large amounts of data, it’s easier to use faster storage devices in your pool, exclusively. This will make your array perform slightly better than a hybrid hard-disk + SSD array. To simplify:

  1. Pool consisting of slow storage devices in disk, mirror or Raid-Z vdevs, without SLOG and L2ARC has the weakest performance.
  2. Pool made of hard-disks plus faster devices like SSDs in log and cache vdevs will perform significantly better
  3. Pool made entirely of fast storage devices, without log and cache devices will perform the best, however the differences between “2” and “3” are less impressive than the differences between “1” and “2”.
zpool create -f fourth vdb vdc
zpool add -f fourth log vdd
zpool add -f fourth cache vde
root@ubuntu:~# zpool status
pool: fourth
state: ONLINE
scan: none requested
config:
NAME STATE READ WRITE CKSUM
fourth ONLINE 0 0 0
vdb ONLINE 0 0 0
vdc ONLINE 0 0 0
logs
vdd ONLINE 0 0 0
cache
vde ONLINE 0 0 0
errors: No known data errors

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Alibaba Cloud

Alibaba Cloud

Follow me to keep abreast with the latest technology news, industry insights, and developer trends. Alibaba Cloud website:https://www.alibabacloud.com