Wednesday, April 19, 2017

Replacing a failed hard drive in an (encrypted) ZFS Pool on FreeNAS

Disk Failure Email Alert

Device: /dev/ada5, failed to read SMART Attribute Data
Device: /dev/ada5, not capable of SMART self-check
Device: /dev/ada5, Read SMART Self-Test Log Failed
The volume tank (ZFS) state is DEGRADED: One or more devices could not be opened. Sufficient replicas exist for the pool to continue functioning in a degraded state.
Device: /dev/ada5, Read SMART Error Log Failed
 
Disk went to UNAVAIL on FreeNAS. It was a complete disk failure where the drive would not even turn on.

Checking status of zfs pools:
[..]

  pool: tank
 state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas exist for
 the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
   see: http://illumos.org/msg/ZFS-8000-2Q
  scan: scrub repaired 0 in 3h22m with 0 errors on Sat Mar 11 15:55:33 2017
config:

 NAME                                                STATE     READ WRITE CKSUM
 tank                                                DEGRADED     0     0     0
   raidz1-0                                          DEGRADED     0     0     0
     gptid/2dc7c53c-2b99-11e6-94d8-0015177adaa2.eli  ONLINE       0     0     0
     gptid/2e7247c8-2b99-11e6-94d8-0015177adaa2.eli  ONLINE       0     0     0
     gptid/2f1f7b49-2b99-11e6-94d8-0015177adaa2.eli  ONLINE       0     0     0
     9773863262585734395                             UNAVAIL      0     0     0  was /dev/gptid/2fa84005-2b99-11e6-94d8-0015177adaa2.eli
     gptid/306d3755-2b99-11e6-94d8-0015177adaa2.eli  ONLINE       0     0     0

errors: No known data errors

 Here's how to replace the failed disk:
  1.  Offline the disk if you can by going to Storage > Volumes > Select your pool (e.g. tank) > Volume Status (on the status bar at the bottom).  I could not do this since the disk was already unavailable and hence offline
  2. NOTE the disk's name, like ada5xx
  3. Next, navigate to the Storage > View Disks and jot down the disk's serial number.  You will need this to physically identify the disk later.
  4. Shutdown your computer
  5. Physically remove the disk and replace it with a new one. 
  6. Boot up your machine
  7. Go back to  Storage > Volumes > Select your pool (e.g. tank) > Volume Status
  8. Select the defective disk's slot and click on Replace
  9. If it's an encrypted pool, enter your passphrase as well
  10. The resilvering process should start
For encrypted pools, once resilvering is completed do the following to avoid losing the pool on the next reboot. Go to Storage > Volumes > Select your pool containing the new disk

  1. Re-key your pool's encryption passphrase 
  2. Re-create a passphrase
  3. Download your key
  4. Add a recovery key.
For more detailed information, refer to the FreeNAS documentation.

 You can watch the pool rebuilding (resilvering) either on the GUI or on the CLI:

[root@freenas] ~# zpool status tank
  pool: tank
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Wed Apr 19 21:42:55 2017
        49.1G scanned out of 5.89T at 301M/s, 5h39m to go
        9.79G resilvered, 0.81% done
config:

        NAME                                                STATE     READ WRITE CKSUM
        tank                                                ONLINE       0     0     0
          raidz1-0                                          ONLINE       0     0     0
            gptid/2dc7c53c-2b99-11e6-94d8-0015177adaa2.eli  ONLINE       0     0     0
            gptid/2e7247c8-2b99-11e6-94d8-0015177adaa2.eli  ONLINE       0     0     0
            gptid/2f1f7b49-2b99-11e6-94d8-0015177adaa2.eli  ONLINE       0     0     0
            gptid/c3e3e9d1-2583-11e7-8b27-0015177adaa2.eli  ONLINE       0     0     0  (resilvering)
            gptid/306d3755-2b99-11e6-94d8-0015177adaa2.eli  ONLINE       0     0     0

errors: No known data errors