vrijdag 29 juli 2022

dell H755N raid controller's NVMe performance

 NVMe drives are very fast compared to SSDs using the SAS interface. Especially the newest PCIe 4 generation. They also have a drawback: They are new(ish), and that means that not all the features that we have come to depend on are always there. One such feature is hardware raid support. It makes a bit of sense, since the NVMe drives connect 'directly' to the PCIe bus. I write it with quotes because often there is some kind of switch between the NVMe drives and the motherboard. That way the limited amount of available PCIe busses won't be eaten up by all the drives.

For example, the Intel 6334 has 64 PCIe lanes. In the case of the intel P5600 NVMe 4 lanes are combined into a bus. Theoretically that cpu could therefore support 64/4=16 NVMe's. But many server chassis can contain 24 NVMe drives. And some PCIe lanes are needed for the network and other communications. So it's not possible to give each NVMe drive it's own dedicated lanes. Hence some switching matrix is needed to share the lanes between several NVMe's.


By now, some hardware vendors have implemented NVMe support on their RAID controllers. One of them is the Dell H755N (actually a Broadcom / LSI MegaRAID 12GSAS/PCIe Secure SAS39xx). All communication between the NVMe's and the motherboard/cpu flows through the raid-controller. This is similar to how an NVMe PCIe switch works, but with additional features.

Such features can slow down the performance of the storage devices. For example when using ZFS, which is a software storage implementation that works better than hardware RAID does, the best way to reach the harddisks or SSDs is by using a HBA and not a RAID controller. Many RAID controllers implement a 'passthrough' mode though, which should theoretically enable the OS to reach the disk without any overhead.


I wanted to figure out what that's like with the dell H755N controller.

I am comparing 2 different systems. They are similar, but not exactly the same:

- Dell R650 with H755N (8GB nvram, firmware 52.16.1-4074) and 2 intel P5600 1.6TB NVMe's.

Dell R640 with H750 (8GB nvram, firmware 52.16.1-4074) and 2 intel P5600 1.6TB NVMe's.


The only proper way to benchmark a storage device and be able to compare the results to others is to use fio. There is a wrapper around fio that makes it easier to run the tool many times with different parameters: ezfio.

I've run ezfio in 3 different configurations:

- on the r640 with the H750, where the NVMe does not go through the raid controller but directly to the PCIe bus. This is the benchmark to compare to.

- on the r650 with the H755n while the RAID controller is configured in passthrough mode for the NVMe

- on the r650 with the H755n in raid-1 configuration.


Here are the results:


r640 + H750 passthrough:



r650 + H755n passthough:


r650 + H755n raid-1:




TLDR?
Using the H755N raid controller for NVMe's is slower than a non-raid controller in passthrough mode. The difference is especially noticeable (50-100% performance drop) with small blocksizes (anything under 32KB) and small amounts of threads. There is also a difference with larger blocksizes and larger amounts amounts of threads, but less so. That last difference could possibly be explained by the different generation of CPU being used and the small turbo frequency difference. Using the raid-1 mode over 2 NVMe's cuts the performance down by 70-80%.