woensdag 28 september 2011

choosing disks and controllers for ZFS


I looked at my ZFS setup and realised that I have 2 raid-1 setups, both with the same type of disk:

server ~ # zpool status
  pool: mir-2tb
 state: ONLINE
 scrub: none requested
config:

        NAME                                                       STATE     READ WRITE CKSUM
        mir-2tb                                                    ONLINE       0     0     0
          mirror-0                                                 ONLINE       0     0     0
            disk/by-id/ata-Hitachi_HDS5C3020ALA632_ML0220F30SHWWD  ONLINE       0     0     0
            disk/by-id/ata-Hitachi_HDS5C3020ALA632_ML0220F30U4RKD  ONLINE       0     0     0

errors: No known data errors

  pool: mir-2tb2
 state: ONLINE
 scrub: none requested
config:

        NAME                                                       STATE     READ WRITE CKSUM
        mir-2tb2                                                   ONLINE       0     0     0
          mirror-0                                                 ONLINE       0     0     0
            disk/by-id/ata-SAMSUNG_HD204UI_S2H7J9BZC05894          ONLINE       0     0     0
            disk/by-id/ata-SAMSUNG_HD204UI_S2H7J9BZC05884          ONLINE       0     0     0

errors: No known data errors
server ~ #

So that's one time ZFS with two hitachi disks and one time ZFS mirror with two samsung disks. That's bad because if there is a problem with a certain type of disk (such as a firmware problem with samsung HD204UI's, see here) there is a high chance the mirror will fail. ZFS can then detect the problem, but possibly not recover.I quickly bought another hitachi disk and added it to the samsung mirror. That allowed me to swap a samsung disk to the hitachi mirror pool. So now I have:

server ~ # zpool status  pool: mir-2tb
 state: ONLINE
 scrub: none requested
config:


        NAME                                                       STATE     READ WRITE CKSUM
        mir-2tb                                                    ONLINE       0     0     0
          mirror-0                                                 ONLINE       0     0     0
            disk/by-id/ata-Hitachi_HDS5C3020ALA632_ML0220F30SHWWD  ONLINE       0     0     0
            disk/by-id/ata-SAMSUNG_HD204UI_S2H7J9BZC05894          ONLINE       0     0     0
            disk/by-id/ata-Hitachi_HDS5C3020ALA632_ML0220F30U4RKD  ONLINE       0     0     0
errors: No known data errors

  pool: mir-2tb2
 state: ONLINE
 scrub: none requested
config:

        NAME                                                       STATE     READ WRITE CKSUM
        mir-2tb2                                                   ONLINE       0     0     0
          mirror-0                                                 ONLINE       0     0     0
            disk/by-id/ata-Hitachi_HDS5C3020ALA632_ML0220F30TG22D  ONLINE       0     0     0
            disk/by-id/ata-SAMSUNG_HD204UI_S2H7J9BZC05884          ONLINE       0     0     0

errors: No known data errors
server ~ #


So now one pool has a raid-1 of 3 disks and the other a raid-1 of 2 disks. Availability thus even improved a bit.

Great, problem fixed. Not quite! Just making sure the disks in a pool are of mixed species is not enough.
It's not visible in the zpool status but the disks of the same pool were all on the same sata controller. I have an onboard sata controller and a promise sata controller. Here is a selective lspci:

00:11.0 SATA controller: ATI Technologies Inc SB700/SB800 SATA Controller [AHCI mode] (rev 40)
05:05.0 Mass storage controller: Promise Technology, Inc. PDC40718 (SATA 300 TX4) (rev 02)

So if a controller were to either die, corrupt data or have intermittant problems that could again result in the pool being lost because ZFS wouldn't be able to fix the problem using a source that still has correct data. Easy fix, just make sure that the disks in a zfs pool are spread out over different controllers. That way the chance of problems again diminishes.

Too paranoid? Not really. Before I replaced my server a couple of months ago I was using an onboard sata controller that was corrupting the data transferred to the disks. I never knew until I switched to ZFS.

Don't blow your bits. Spread them around.

Geen opmerkingen:

Een reactie posten