I finally bought myself an SSD. I've been using spinning run for ever and recently we started using SSDs for the servers at work. I just had to go and buy one for myself. The windows boot time was starting to really annoy me.
At €99,- for 60GB it's not cheap, but it should be fast. The box says up to 535MB/sec read and 490MB/sec write. I doubt that'll even be achieven in real world situations. Ofcourse I do want to try :)
So first I hooked it up to one of the ports of my promise sata 300tx4 controller. This was the hdparm result:
/dev/sdl:
Timing cached reads: 2514 MB in 2.00 seconds = 1257.10 MB/sec
Timing buffered disk reads: 274 MB in 3.00 seconds = 91.33 MB/sec
That's... not any faster than my harddisks... :(
I couldn't believe it. So I moved the SSD over to my onboard ati sata controller, and hdparm did something different:
/dev/sdm:
Timing cached reads: 2830 MB in 2.00 seconds = 1414.96 MB/sec
Timing buffered disk reads: 1074 MB in 3.00 seconds = 357.86 MB/sec
That's better! Not quite there yet, but atleast now I get the idea I spent my money wisely.
Especially compared to my OS disk, and a newer 2TB disk I bought:
/dev/hda: (pata disk)
Timing cached reads: 2516 MB in 2.00 seconds = 1257.80 MB/sec
Timing buffered disk reads: 166 MB in 3.02 seconds = 54.90 MB/sec
/dev/disk/by-id/ata-Hitachi_HDS5C3020ALA632_ML0220F30TG22D:
Timing cached reads: 2222 MB in 2.00 seconds = 1111.31 MB/sec
Timing buffered disk reads: 300 MB in 3.02 seconds = 99.50 MB/sec
I find this huge performance difference between my 2 sata controllers a bit disconcernting. Especially since the promise controller is a sata-II capable device (300MB/sec), but not even achieving sata-I speeds (150MB/sec). At the same time my onboard sata controller is sata-III capable (600MB/sec) and not achieving that, although it does get a little bit above the sata-II spec. What could cause this?
One limiting factor is the PCI bus that the promise card is on. That limits it to 266MB/sec so that's not the bottleneck. The sata cables according to the sata wikipedia page should all go up to sata-III. Maybe the rest of the system is too slow to keep up? Let's do an easier test:
server ~ # mount /dev/disk/by-id/ata-OCZ-VERTEX3_OCZ-797F5Y0407KD83KC /mnt/ssd
server ~ # dd if=/dev/zero of=/mnt/ssd/zero-write count=20000000
20000000+0 records in
20000000+0 records out
10240000000 bytes (10 GB) copied, 41.3909 s, 247 MB/s
server ~ # dd if=/mnt/ssd/zero-write of=/dev/null count=20000000
20000000+0 records in
20000000+0 records out
10240000000 bytes (10 GB) copied, 30.1224 s, 340 MB/s
server ~ #
While doing this test the io-utilisation was at 40-60% ('iostat -x 1 sdm) when writing and 75% when reading. So the device could handle more, but it was just waiting for the system. In fact, taking that 247MB/sec and correcting for the IO utilisation, the device indeed seems to be able to handle about 500MB/sec. Wow.
So why isn't the device being 100% utilised? When looking at 'top' while doing these tests it shows one of the 2 cores to be 100% busy when writing to the device, but only 50% busy when reading.
read: 40% waiting and 60% system time. 0% idle.
increasing the blocksize to 1MB increases read speed to 360MB/sec
switching to direct io and nonblocking io the speed increases further:
server ~ # dd if=/mnt/ssd/zero-write of=/dev/null count=20000000 bs=1048576 iflag=direct,nonblock
9765+1 records in
9765+1 records out
10240000000 bytes (10 GB) copied, 24.5059 s, 418 MB/s
and writing:
server ~ # dd if=/dev/zero of=/mnt/ssd/zero-write count=5000 bs=1048576 oflag=direct,nonblock
5000+0 records in
5000+0 records out
5242880000 bytes (5.2 GB) copied, 14.8777 s, 352 MB/s
75% io wait, 23% system time. 75% utilisation
So using a larger blocksize with nonblocking io and direct read/write the io increases slightly. But we're still not where whe should be.
Let's cut ext4 out of the loop:
server ~ # dd if=/dev/disk/by-id/ata-OCZ-VERTEX3_OCZ-797F5Y0407KD83KC of=/dev/null count=20000000 bs=1048576 iflag=direct,nonblock
57241+1 records in
57241+1 records out
60022480896 bytes (60 GB) copied, 136.918 s, 438 MB/s
server ~ #
this results in 90% utilisation and 87% cpu waiting on io, with 12% cpu system time.
write:
server ~ # dd if=/dev/zero of=/dev/disk/by-id/ata-OCZ-VERTEX3_OCZ-797F5Y0407KD83KC count=20000 bs=1048576 oflag=direct,nonblock skip=2
20000+0 records in
20000+0 records out
20971520000 bytes (21 GB) copied, 58.2396 s, 360 MB/s
server ~ #
80% util, 75% wait, 25% system
Based on all if this I think the performance of this SSD is CPU bound.
The difference between the promise and the onboard sata controller may be related to AHCI: the stock kernel driver for promise doesn't support AHCI.
Now that I'm done, I'm cleaning up the SSD before handing it over to my windows pc:
i=0; while [ $i -lt 117231408 ]; do echo $i:64000; i=$(((i+64000))); done | hdparm --trim-sector-ranges-stdin --please-destroy-my-drive /dev/disk/by-id/ata-OCZ-VERTEX3_OCZ-797F5Y0407KD83KC
This makes sure that all data is 'trimmed' aka removed without writing to the device. This keeps the SSD fast, because a write to a block of data that already has data in it is much slower that a write to a block that is empty, aka trimmed.
This is what bonnie has to say:
As you can see, the SSD is much, much faster than any disk. ZFS is reaaaalllly slow, but that's because I'm running it through fuse so all IO needs to go through userspace. But that's ok, I don't need speed on those disks. Only safety. And once zfsonlinux gets around to finishing their code it should speed up tremendously.
If I get to buy a faster PC I'll update this doc with new measurements.
At €99,- for 60GB it's not cheap, but it should be fast. The box says up to 535MB/sec read and 490MB/sec write. I doubt that'll even be achieven in real world situations. Ofcourse I do want to try :)
So first I hooked it up to one of the ports of my promise sata 300tx4 controller. This was the hdparm result:
/dev/sdl:
Timing cached reads: 2514 MB in 2.00 seconds = 1257.10 MB/sec
Timing buffered disk reads: 274 MB in 3.00 seconds = 91.33 MB/sec
That's... not any faster than my harddisks... :(
I couldn't believe it. So I moved the SSD over to my onboard ati sata controller, and hdparm did something different:
/dev/sdm:
Timing cached reads: 2830 MB in 2.00 seconds = 1414.96 MB/sec
Timing buffered disk reads: 1074 MB in 3.00 seconds = 357.86 MB/sec
That's better! Not quite there yet, but atleast now I get the idea I spent my money wisely.
Especially compared to my OS disk, and a newer 2TB disk I bought:
/dev/hda: (pata disk)
Timing cached reads: 2516 MB in 2.00 seconds = 1257.80 MB/sec
Timing buffered disk reads: 166 MB in 3.02 seconds = 54.90 MB/sec
/dev/disk/by-id/ata-Hitachi_HDS5C3020ALA632_ML0220F30TG22D:
Timing cached reads: 2222 MB in 2.00 seconds = 1111.31 MB/sec
Timing buffered disk reads: 300 MB in 3.02 seconds = 99.50 MB/sec
I find this huge performance difference between my 2 sata controllers a bit disconcernting. Especially since the promise controller is a sata-II capable device (300MB/sec), but not even achieving sata-I speeds (150MB/sec). At the same time my onboard sata controller is sata-III capable (600MB/sec) and not achieving that, although it does get a little bit above the sata-II spec. What could cause this?
One limiting factor is the PCI bus that the promise card is on. That limits it to 266MB/sec so that's not the bottleneck. The sata cables according to the sata wikipedia page should all go up to sata-III. Maybe the rest of the system is too slow to keep up? Let's do an easier test:
server ~ # mount /dev/disk/by-id/ata-OCZ-VERTEX3_OCZ-797F5Y0407KD83KC /mnt/ssd
server ~ # dd if=/dev/zero of=/mnt/ssd/zero-write count=20000000
20000000+0 records in
20000000+0 records out
10240000000 bytes (10 GB) copied, 41.3909 s, 247 MB/s
server ~ # dd if=/mnt/ssd/zero-write of=/dev/null count=20000000
20000000+0 records in
20000000+0 records out
10240000000 bytes (10 GB) copied, 30.1224 s, 340 MB/s
server ~ #
While doing this test the io-utilisation was at 40-60% ('iostat -x 1 sdm) when writing and 75% when reading. So the device could handle more, but it was just waiting for the system. In fact, taking that 247MB/sec and correcting for the IO utilisation, the device indeed seems to be able to handle about 500MB/sec. Wow.
So why isn't the device being 100% utilised? When looking at 'top' while doing these tests it shows one of the 2 cores to be 100% busy when writing to the device, but only 50% busy when reading.
read: 40% waiting and 60% system time. 0% idle.
increasing the blocksize to 1MB increases read speed to 360MB/sec
switching to direct io and nonblocking io the speed increases further:
server ~ # dd if=/mnt/ssd/zero-write of=/dev/null count=20000000 bs=1048576 iflag=direct,nonblock
9765+1 records in
9765+1 records out
10240000000 bytes (10 GB) copied, 24.5059 s, 418 MB/s
and writing:
server ~ # dd if=/dev/zero of=/mnt/ssd/zero-write count=5000 bs=1048576 oflag=direct,nonblock
5000+0 records in
5000+0 records out
5242880000 bytes (5.2 GB) copied, 14.8777 s, 352 MB/s
75% io wait, 23% system time. 75% utilisation
So using a larger blocksize with nonblocking io and direct read/write the io increases slightly. But we're still not where whe should be.
Let's cut ext4 out of the loop:
server ~ # dd if=/dev/disk/by-id/ata-OCZ-VERTEX3_OCZ-797F5Y0407KD83KC of=/dev/null count=20000000 bs=1048576 iflag=direct,nonblock
57241+1 records in
57241+1 records out
60022480896 bytes (60 GB) copied, 136.918 s, 438 MB/s
server ~ #
this results in 90% utilisation and 87% cpu waiting on io, with 12% cpu system time.
write:
server ~ # dd if=/dev/zero of=/dev/disk/by-id/ata-OCZ-VERTEX3_OCZ-797F5Y0407KD83KC count=20000 bs=1048576 oflag=direct,nonblock skip=2
20000+0 records in
20000+0 records out
20971520000 bytes (21 GB) copied, 58.2396 s, 360 MB/s
server ~ #
80% util, 75% wait, 25% system
Based on all if this I think the performance of this SSD is CPU bound.
The difference between the promise and the onboard sata controller may be related to AHCI: the stock kernel driver for promise doesn't support AHCI.
Now that I'm done, I'm cleaning up the SSD before handing it over to my windows pc:
i=0; while [ $i -lt 117231408 ]; do echo $i:64000; i=$(((i+64000))); done | hdparm --trim-sector-ranges-stdin --please-destroy-my-drive /dev/disk/by-id/ata-OCZ-VERTEX3_OCZ-797F5Y0407KD83KC
This makes sure that all data is 'trimmed' aka removed without writing to the device. This keeps the SSD fast, because a write to a block of data that already has data in it is much slower that a write to a block that is empty, aka trimmed.
This is what bonnie has to say:
Version 1.96 | Sequential Output | Sequential Input | Random Seeks | Sequential Create | Random Create | ||||||||||||||||||||||
Concurrency | Size | Per Char | Block | Rewrite | Per Char | Block | Num Files | Create | Read | Delete | Create | Read | Delete | ||||||||||||||
K/sec | % CPU | K/sec | % CPU | K/sec | % CPU | K/sec | % CPU | K/sec | % CPU | /sec | % CPU | /sec | % CPU | /sec | % CPU | /sec | % CPU | /sec | % CPU | /sec | % CPU | /sec | % CPU | ||||
hda-pata | 4 | 6576M | 640 | 94 | 26899 | 11 | 14683 | 4 | 1828 | 98 | 45783 | 6 | 145.1 | 7 | 16 | 8091 | 9 | +++++ | +++ | 23202 | 22 | 30607 | 40 | +++++ | +++ | +++++ | +++ |
Latency | 36690us | 2432ms | 4584ms | 17325us | 229ms | 2529ms | Latency | 12317us | 849us | 828us | 107us | 67us | 73us | ||||||||||||||
hda-pata | 4 | 6576M | 681 | 93 | 26794 | 10 | 14465 | 4 | 1453 | 98 | 43385 | 6 | 135.2 | 7 | 16 | 6404 | 25 | +++++ | +++ | 23371 | 60 | 18094 | 66 | +++++ | +++ | 24027 | 60 |
Latency | 13723us | 2087ms | 3965ms | 20442us | 267ms | 2308ms | Latency | 13195us | 2236us | 2291us | 252us | 136us | 67us | ||||||||||||||
hda-pata | 4 | 6576M | 492 | 96 | 26432 | 10 | 14338 | 4 | 1921 | 94 | 39876 | 4 | 120.6 | 6 | 16 | 4261 | 4 | +++++ | +++ | 17685 | 13 | 21240 | 21 | +++++ | +++ | 23140 | 17 |
Latency | 38599us | 2195ms | 2586ms | 20359us | 404ms | 2291ms | Latency | 12161us | 633us | 895us | 1212us | 257us | 3930us | ||||||||||||||
zfs-2 | 4 | 6576M | 20 | 22 | 18509 | 11 | 15498 | 10 | 1598 | 98 | 64493 | 8 | 142.7 | 4 | 16 | 1362 | 21 | +++++ | +++ | 6838 | 23 | 2714 | 21 | +++++ | +++ | 8875 | 28 |
Latency | 940ms | 2100ms | 2053ms | 27966us | 268ms | 1879ms | Latency | 134ms | 6559us | 4142us | 66330us | 105us | 10113us | ||||||||||||||
zfs-2 | 4 | 6576M | 30 | 19 | 19664 | 12 | 15545 | 10 | 998 | 98 | 65135 | 7 | 164.6 | 5 | 16 | 2227 | 19 | 30995 | 8 | 6253 | 24 | 3056 | 18 | +++++ | +++ | 9731 | 22 |
Latency | 432ms | 1860ms | 2054ms | 25254us | 218ms | 1825ms | Latency | 97645us | 2929us | 10128us | 76339us | 82us | 18389us | ||||||||||||||
zfs-2 | 4 | 6576M | 31 | 19 | 18794 | 12 | 15585 | 10 | 1655 | 98 | 59008 | 9 | 165.1 | 5 | 16 | 1337 | 22 | 27141 | 17 | 2681 | 26 | 1413 | 20 | +++++ | +++ | 8789 | 27 |
Latency | 384ms | 1890ms | 1993ms | 21819us | 236ms | 1747ms | Latency | 61531us | 3114us | 7322us | 339ms | 105us | 42091us | ||||||||||||||
ssd-promise | 4 | 6576M | 604 | 99 | 97059 | 24 | 47298 | 15 | 1700 | 99 | 119279 | 19 | 5231 | 220 | 16 | 15393 | 17 | +++++ | +++ | +++++ | +++ | +++++ | +++ | +++++ | +++ | +++++ | +++ |
Latency | 38396us | 721ms | 804ms | 10871us | 9170us | 81333us | Latency | 83us | 531us | 549us | 97us | 8us | 41us | ||||||||||||||
ssd-promise | 4 | 6576M | 766 | 98 | 97783 | 24 | 47870 | 15 | 1810 | 99 | 120092 | 20 | 4713 | 200 | 16 | 13357 | 15 | +++++ | +++ | +++++ | +++ | +++++ | +++ | +++++ | +++ | +++++ | +++ |
Latency | 11147us | 661ms | 787ms | 11471us | 3283us | 97766us | Latency | 210us | 531us | 547us | 85us | 10us | 46us | ||||||||||||||
ssd-promise | 4 | 6576M | 774 | 98 | 97767 | 24 | 47757 | 15 | 1464 | 99 | 120094 | 20 | 4672 | 191 | 16 | 10697 | 19 | +++++ | +++ | +++++ | +++ | +++++ | +++ | +++++ | +++ | +++++ | +++ |
Latency | 11036us | 226ms | 775ms | 12233us | 2866us | 91878us | Latency | 439us | 531us | 549us | 90us | 9us | 64us | ||||||||||||||
ssd-ati | 4 | 6576M | 648 | 99 | 454837 | 43 | 201613 | 29 | 1316 | 99 | 482448 | 35 | 12924 | 381 | 16 | 12573 | 14 | +++++ | +++ | +++++ | +++ | +++++ | +++ | +++++ | +++ | +++++ | +++ |
Latency | 38218us | 175ms | 109ms | 12974us | 124ms | 3048us | Latency | 90us | 524us | 556us | 87us | 7us | 30us | ||||||||||||||
ssd-ati | 4 | 6576M | 669 | 99 | 466455 | 47 | 202751 | 28 | 2659 | 99 | 475295 | 38 | 11540 | 333 | 16 | 15038 | 17 | +++++ | +++ | +++++ | +++ | +++++ | +++ | +++++ | +++ | +++++ | +++ |
Latency | 38867us | 127ms | 111ms | 4088us | 121ms | 2753us | Latency | 256us | 524us | 558us | 91us | 63us | 63us | ||||||||||||||
ssd-ati | 4 | 6576M | 625 | 99 | 467713 | 50 | 176020 | 57 | 1345 | 99 | 484839 | 47 | 11420 | 342 | 16 | 21595 | 24 | +++++ | +++ | +++++ | +++ | +++++ | +++ | +++++ | +++ | +++++ | +++ |
Latency | 39029us | 113ms | 54983us | 11317us | 74753us | 4429us | Latency | 207us | 534us | 573us | 86us | 65us | 66us |
As you can see, the SSD is much, much faster than any disk. ZFS is reaaaalllly slow, but that's because I'm running it through fuse so all IO needs to go through userspace. But that's ok, I don't need speed on those disks. Only safety. And once zfsonlinux gets around to finishing their code it should speed up tremendously.
If I get to buy a faster PC I'll update this doc with new measurements.