My adventure into SSD caching with ZFS (Home NAS)

Recently I decided to throw away my old defunct 2009 MacBook Pro which was rotting in my cupboard and I decided to retrieve the only useful part before doing so, the 80GB Intel SSD I had installed a few years earlier. Initially I thought about simply adding it to my desktop as a bit of extra space but in 2017 80GB really wasn’t worth it and then I had a brainwave… Lets see if we can squeeze some additional performance out of my HP Microserver Gen8 NAS running ZFS by installing it as a cache disk.

I installed the SSD to the cdrom tray of the Microserver using a floppy disk power to SATA power converter and a SATA cable, unfortunately it seems the CD ROM SATA port on the motherboard is only a 3gbps port although this didn’t matter so much as it was an older 3gbps SSD anyway. Next I booted up the machine and to my suprise the disk was not found in my FreeBSD install, then I realised that the SATA port for the CD drive is actually provided by the RAID controller, so I rebooted into intelligent provisioning and added an additional RAID0 array with just the 1 disk to act as my cache, in fact all of the disks in this machine are individual RAID0 arrays so it looks like just a bunch of disks (JBOD) as ZFS offers additional functionality over normal RAID (mainly scrubbing, deduplication and compression).

Configuration

Lets have a look at the zpool before adding the cache drive to make sure there are no errors or uglyness…

[root@netdisk] ~# zpool status vol0
  pool: vol0
 state: ONLINE
  scan: scrub repaired 0 in 3h21m with 0 errors on Wed Apr 26 16:40:16 2017
config:

	NAME                                            STATE     READ WRITE CKSUM
	vol0                                            ONLINE       0     0     0
	  mirror-0                                      ONLINE       0     0     0
	    gptid/b8ebf047-25cd-11e7-b7e2-000c29ccceef  ONLINE       0     0     0
	    gptid/c0df3410-25cd-11e7-b7e2-000c29ccceef  ONLINE       0     0     0
	  mirror-1                                      ONLINE       0     0     0
	    gptid/c8b062cb-25cd-11e7-b7e2-000c29ccceef  ONLINE       0     0     0
	    gptid/355426bd-25ce-11e7-b7e2-000c29ccceef  ONLINE       0     0     0

errors: No known data errors

Now lets prep the drive for use in the zpool using gpart. I want to split the SSD into two seperate partitions, one for L2ARC (read caching) and one for ZIL (write caching). I have decided to split the disk into 20GB for ZIL and 50GB for L2ARC. Be warned using 1 SSD like this is considered unsafe because it is a single point of failure in terms of delayed writes (a redundant configuration with 2 SSDs would be more appropriate) and the heavy write cycles on the SSD from the ZIL is likely to kill it over time.

[root@netdisk] ~# gpart create -s gpt da6
[root@netdisk] ~# gpart show da6
=>       34  150994877  da6  GPT  (72G)
         34  150994877       - free -  (72G)
[root@netdisk] ~# gpart add -t freebsd-zfs -s 20G da6
da6p1 added
[root@netdisk] ~# gpart add -t freebsd-zfs -s 50G da6
da6p2 added
[root@netdisk] ~# gpart show da6
=>       34  150994877  da6  GPT  (72G)
         34   41943040    1  freebsd-zfs  (20G)
   41943074  104857600    2  freebsd-zfs  (50G)
  146800674    4194237       - free -  (2.0G)
[root@netdisk] ~# glabel list da6p1
Geom name: da6p1
Providers:
1. Name: gptid/a3d641e5-2c03-11e7-9146-000c29ccceef
   Mediasize: 21474836480 (20G)
   Sectorsize: 512
   Stripesize: 0
   Stripeoffset: 17408
   Mode: r0w0e0
   secoffset: 0
   offset: 0
   seclength: 41943040
   length: 21474836480
   index: 0
Consumers:
1. Name: da6p1
   Mediasize: 21474836480 (20G)
   Sectorsize: 512
   Stripesize: 0
   Stripeoffset: 17408
   Mode: r0w0e0
[root@netdisk] ~# glabel list da6p2
Geom name: da6p2
Providers:
1. Name: gptid/a6dd9f4d-2c03-11e7-9146-000c29ccceef
   Mediasize: 53687091200 (50G)
   Sectorsize: 512
   Stripesize: 0
   Stripeoffset: 17408
   Mode: r0w0e0
   secoffset: 0
   offset: 0
   seclength: 104857600
   length: 53687091200
   index: 0
Consumers:
1. Name: da6p2
   Mediasize: 53687091200 (50G)
   Sectorsize: 512
   Stripesize: 0
   Stripeoffset: 17408
   Mode: r0w0e0

Now we have a couple of partitions configured on the disk as described above, they can now be added to the ZFS zpool.

[root@netdisk] ~# zpool add vol0 log gptid/a3d641e5-2c03-11e7-9146-000c29ccceef
[root@netdisk] ~# zpool add vol0 cache gptid/a6dd9f4d-2c03-11e7-9146-000c29ccceef
[root@netdisk] ~# zpool status vol0
  pool: vol0
 state: ONLINE
  scan: scrub repaired 0 in 3h21m with 0 errors on Wed Apr 26 16:40:16 2017
config:

	NAME                                            STATE     READ WRITE CKSUM
	vol0                                            ONLINE       0     0     0
	  mirror-0                                      ONLINE       0     0     0
	    gptid/b8ebf047-25cd-11e7-b7e2-000c29ccceef  ONLINE       0     0     0
	    gptid/c0df3410-25cd-11e7-b7e2-000c29ccceef  ONLINE       0     0     0
	  mirror-1                                      ONLINE       0     0     0
	    gptid/c8b062cb-25cd-11e7-b7e2-000c29ccceef  ONLINE       0     0     0
	    gptid/355426bd-25ce-11e7-b7e2-000c29ccceef  ONLINE       0     0     0
	logs
	  gptid/a3d641e5-2c03-11e7-9146-000c29ccceef    ONLINE       0     0     0
	cache
	  gptid/a6dd9f4d-2c03-11e7-9146-000c29ccceef    ONLINE       0     0     0

errors: No known data errors

Now it’s time to see if adding the cache has made much of a difference. I suspect not as my Home NAS sucks, it is a HP Microserver Gen8 with the crappy Celeron CPU and only 4GB RAM, anyway, lets test it and find out. First off lets throw fio at the mount point for this zpool and see what happens both with the ZIL and L2ARC enabled and disabled.

Caching Disabled FIO Results

[root@netdisk] ~# fio disk_perf.fio
random_rw: (g=0): rw=rw, bs=4K-4K/4K-4K/4K-4K, ioengine=posixaio, iodepth=1
fio-2.14
Starting 1 thread
random_rw: Laying out IO file(s) (1 file(s) / 1024MB)
Jobs: 1 (f=1): [M(1)] [100.0% done] [10806KB/10580KB/0KB /s] [2701/2645/0 iops] [eta 00m:00s]
random_rw: (groupid=0, jobs=1): err= 0: pid=101627: Fri Apr 28 12:21:47 2017
  read : io=522756KB, bw=10741KB/s, iops=2685, runt= 48669msec
    slat (usec): min=2, max=2559, avg= 9.31, stdev=21.63
    clat (usec): min=2, max=69702, avg=34.66, stdev=563.31
     lat (usec): min=12, max=69760, avg=43.97, stdev=567.97
    clat percentiles (usec):
     |  1.00th=[    2],  5.00th=[    2], 10.00th=[    3], 20.00th=[    3],
     | 30.00th=[    3], 40.00th=[    3], 50.00th=[    3], 60.00th=[   14],
     | 70.00th=[   14], 80.00th=[   15], 90.00th=[   15], 95.00th=[   15],
     | 99.00th=[   27], 99.50th=[   59], 99.90th=[ 9536], 99.95th=[10304],
     | 99.99th=[15040]
  write: io=525820KB, bw=10804KB/s, iops=2701, runt= 48669msec
    slat (usec): min=2, max=8124, avg=15.87, stdev=43.54
    clat (usec): min=2, max=132195, avg=305.77, stdev=1973.41
     lat (usec): min=21, max=132485, avg=321.64, stdev=1977.32
    clat percentiles (usec):
     |  1.00th=[    3],  5.00th=[    3], 10.00th=[    3], 20.00th=[    3],
     | 30.00th=[    3], 40.00th=[    3], 50.00th=[    4], 60.00th=[   24],
     | 70.00th=[   24], 80.00th=[   25], 90.00th=[   25], 95.00th=[   35],
     | 99.00th=[10304], 99.50th=[10560], 99.90th=[11712], 99.95th=[19584],
     | 99.99th=[50432]
    lat (usec) : 4=49.44%, 10=4.09%, 20=21.92%, 50=22.63%, 100=0.23%
    lat (usec) : 250=0.08%, 500=0.02%, 750=0.01%, 1000=0.01%
    lat (msec) : 2=0.01%, 4=0.01%, 10=0.90%, 20=0.63%, 50=0.02%
    lat (msec) : 100=0.01%, 250=0.01%
  cpu          : usr=3.08%, sys=2.92%, ctx=265022, majf=0, minf=0
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=130689/w=131455/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: io=522756KB, aggrb=10741KB/s, minb=10741KB/s, maxb=10741KB/s, mint=48669msec, maxt=48669msec
  WRITE: io=525820KB, aggrb=10804KB/s, minb=10804KB/s, maxb=10804KB/s, mint=48669msec, maxt=48669msec
[root@netdisk] ~# 

Caching Enabled FIO Results

[root@netdisk] ~# fio disk_perf.fio
random_rw: (g=0): rw=rw, bs=4K-4K/4K-4K/4K-4K, ioengine=posixaio, iodepth=1
fio-2.14
Starting 1 thread
random_rw: Laying out IO file(s) (1 file(s) / 1024MB)
Jobs: 1 (f=1): [M(1)] [100.0% done] [10902KB/10854KB/0KB /s] [2725/2713/0 iops] [eta 00m:00s]
random_rw: (groupid=0, jobs=1): err= 0: pid=101640: Fri Apr 28 12:24:18 2017
  read : io=522756KB, bw=10462KB/s, iops=2615, runt= 49967msec
    slat (usec): min=2, max=23575, avg=12.84, stdev=155.63
    clat (usec): min=2, max=135652, avg=34.24, stdev=644.41
     lat (usec): min=12, max=135654, avg=47.08, stdev=663.37
    clat percentiles (usec):
     |  1.00th=[    2],  5.00th=[    2], 10.00th=[    3], 20.00th=[    3],
     | 30.00th=[    3], 40.00th=[    3], 50.00th=[    3], 60.00th=[    3],
     | 70.00th=[   13], 80.00th=[   14], 90.00th=[   15], 95.00th=[   15],
     | 99.00th=[   27], 99.50th=[   70], 99.90th=[ 9920], 99.95th=[10432],
     | 99.99th=[11968]
  write: io=525820KB, bw=10523KB/s, iops=2630, runt= 49967msec
    slat (usec): min=2, max=139302, avg=22.47, stdev=556.00
    clat (usec): min=2, max=119615, avg=306.30, stdev=1969.21
     lat (usec): min=20, max=139752, avg=328.78, stdev=2046.20
    clat percentiles (usec):
     |  1.00th=[    2],  5.00th=[    3], 10.00th=[    3], 20.00th=[    3],
     | 30.00th=[    3], 40.00th=[    3], 50.00th=[    3], 60.00th=[    3],
     | 70.00th=[   23], 80.00th=[   24], 90.00th=[   25], 95.00th=[   32],
     | 99.00th=[10176], 99.50th=[10560], 99.90th=[19072], 99.95th=[24192],
     | 99.99th=[44800]
    lat (usec) : 4=62.92%, 10=2.93%, 20=15.75%, 50=16.37%, 100=0.28%
    lat (usec) : 250=0.14%, 500=0.03%, 750=0.01%, 1000=0.01%
    lat (msec) : 2=0.01%, 4=0.01%, 10=0.93%, 20=0.57%, 50=0.04%
    lat (msec) : 100=0.01%, 250=0.01%
  cpu          : usr=3.21%, sys=2.51%, ctx=265920, majf=0, minf=0
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=130689/w=131455/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: io=522756KB, aggrb=10462KB/s, minb=10462KB/s, maxb=10462KB/s, mint=49967msec, maxt=49967msec
  WRITE: io=525820KB, aggrb=10523KB/s, minb=10523KB/s, maxb=10523KB/s, mint=49967msec, maxt=49967msec
[root@netdisk] ~# 

Observations

Ok, so the initial result is a little dissapointing, but hardly unexpected, my NAS sucks and there are lots of bottle necks, CPU, memory and the fact only 2 of the SATA ports are 6gbps. There is no real difference performance wise in comparison between the results, the IOPS, bandwidth and latency appear very similar. However lets bare in mind fio is a pretty hardcore disk benchmark utility, how about some real world use cases?

Next I decided to test a few typical file transactions that this NAS is used for, Samba shares to my workstation. For the first test I wanted to test reading a 3GB file over the network with both the cache enabled and disabled, I would run this multiple times to ensure the data is hot in the L2ARC and to ensure the test is somewhat repeatable, the network itself is an uncongested 1gbit link and I am copying onto the secondary SSD in my workstation. The dataset for these tests has compression and deduplication disabled.

Samba Read Test

Attempt    without caching    with caching
1          48.1MB/s           52.2MB/s
2          49.6MB/s           66.4MB/s
3          47.4MB/s           65.6MB/s

Not bad once the data becomes hot in the L2ARC cache reads appear to gain a decent advantage compared to reading from the disk directly. How does it perform when writing the same file back accross the network using the ZIL vs no ZIL.

Samba Write Test

Attempt    without caching    with caching
1          34.2MB/s           57.3MB/s
2          33.6MB/s           55.7MB/s
3          36.7MB/s           57.1MB/s

Another good result in the real world test, this certainately helps the write transfer speed however I do wonder what would happen if you filled the ZIL transferring a very large file, however this is unlikely with my use case as I typically only deal with a couple of files of several hundred megabytes at any given time so a 20GB ZIL should suit me reasonably well.

Is ZIL and L2ARC worth it?

I would imagine with a big beefy ZFS server running in a company somewhere with a large disk pool and lots of users with multiple enterprise level SSD ZIL and L2ARC would be well worth the investment, however at home I am not so sure. Yes I did see an increase in read speeds with cached data and a general increase in write speeds however it is use case dependant. In my use case I rarely access the same file frequently, my NAS primarily serves as a backup and for archived data, and although the write speeds are cool I am not sure its a deal breaker. If I built a new home NAS today I’d probably concentrate the budget on a better CPU, more RAM (for ARC cache) and more disks. However if I had a use case where I frequently accessed the same files and needed to do so in a faster fashion then yes, I’d probably invest in an SSD for caching. I think if you have a spare SSD lying around and you want something fun todo with it, sure chuck it in your ZFS based NAS as a cache mechanism. If you were planning on buying an SSD for caching then I’d really consider your needs and decide if the money can be spent on alternative stuff which would improve your experience with your NAS. I know my NAS would benefit more from an extra stick of RAM and a more powerful CPU, but as a quick evening project with some parts I had hanging around adding some SSD cache was worth a go.

Leave a Reply

Your email address will not be published. Required fields are marked *