dinsdag 18 oktober 2011

scrubbing bits under linux; making sure your data is safe

Let's say you've got a couple of photo's you'd like to keep. Actually as the years have gone by, it's more like a few hundred, perhaps a few thousand photos you've collected and would really hate to lose. And let's assume you've lost data before due to a harddisk crash or otherwise. You swore never to be troubled by that problem again, so you set up a mirror, being raid-1, on your computer.
And let's assume you use Linux to manage that. Windows could too, but in a minute I'll show you why windows is no good for a mirror.

It all has to do with bit rot. You see, data on your disks isn't safe. It's not just because disks can die and you won't be able to access them. It's that data on your disks slowly rots and 1's can turn into 0's. 0's can turn into 1's. That makes your data corrupt. This happens because these days the bits on harddisks are so tiny that small influences can change them. Possibly cosmic rays. Or a magnet slightly too close to the disk. Perhaps the magnetic field of one of the computer fans. Or more likely a small patch of magnetic substrate that was just a bit off at production and isn't able to hold its contents for more than a month. What ever the reason, the fact is that data slowly rots.

Now you have a mirror. A raid set. Raid-1 even. You're not safe. But you're a bit safer. What you could do is have your computer run a check on the 2 disks, to make sure that the contents of the disks is still exactly the same. You could have it run every night. This is called a 'data scrub'.

 And then one night, maybe a year from now, your computer will beep and state it has found a different. Perhaps it's only one bit. But what now? Which of the disks is right? Disk A says it's a 0. Disk B says it's a 1. If you're lucky you have backups and you can restore the file. If not... it's anyone's guess. Oh did I mention that backups can suffer from the same problem ? This is not just limited to harddisks. CD's rot too. So do tapes, floppy's, zip's, jazz's, etc.

Now, please, amuse me. Go a google for 'windows data scrub' and do one for 'linux data scrub'. Notice any difference? There are no data scrubbing tools for windows. But there are for Linux. The reason is that when Windows finds a difference, it will use the 'master' disk as the source to write to the 'slave' disk. Even when the master is wrong.

So under windows having a mirror is not much help. You won't even notice that there was a problem. Under linux you can have the scrub tell you there was a problem, and you can decide to do a file restore from backup.

If you are using raid-5 instead of raid-1 you have the added benefit of being able to repair the bad data. Read here on how to set up scrubbing for md-raid software raid setups under Linux.

And if you're really smart you're using a checksumming filesystem, such as ZFS or BTRFS. These calculate a checksum of the data and by that allow the OS to figure out which of the two disks has the correct data, and overwrite the bad data with the good data. Automatically. Problem fixed.

If you are not scrubbing yet, you don't value your photo's enough.

1 opmerking:

  1. nice article...the problem for many home users: raid-5 is costly. and there ain't no cure for that...

    BeantwoordenVerwijderen