It's been a while since I took some time to analyze the following problem:
server ~ # zpool status -v
pool: mir-2tb
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: http://www.sun.com/msg/ZFS-8000-8A
scrub: scrub completed after 9h26m with 1 errors on Mon Sep 26 09:01:29 2011
config:
NAME STATE READ WRITE CKSUM
mir-2tb ONLINE 0 0 1
mirror-0 ONLINE 0 0 2
disk/by-id/ata-Hitachi_HDS5C3020ALA632_ML0220F30SHWWD ONLINE 0 0 2
disk/by-id/ata-SAMSUNG_HD204UI_S2H7J9BZC05894 ONLINE 0 0 2
disk/by-id/ata-Hitachi_HDS5C3020ALA632_ML0220F30U4RKD ONLINE 0 0 2
errors: Permanent errors have been detected in the following files:
/mnt/origin/mir-2tb/diskfailure/sdb1.image.dd
So there is a problem here. What this is saying is that the same block of data is not readable from 3 different disks. And the file sdb1.image.dd experienced data loss because of this.
Every week on monday, the counters for the amount of checksum errors will go up. That's because I'm running a scrub every monday morning. And once the scrub is done, the counter is 2 checksum errors higher for each disk, and one checksum error higher for the pool. So it's not like there are more and more sections of data that can't be read.
How did this happen? How small is that chance, that 3 drives are unreadable for the same specific block of data? It's infinitesimally small. And that's not what's going on here. There are 2 possible reasons for this:
- What's going on here is that the ZFS pool was created initially with one source disk, with apparantly 2 bad sectors or bad blocks. Data was copied to that pool before there was a 2nd disk to mirror the data. Once the data was copied and the source disk was empty, the source disk was added to the pool as a mirror. But because of the bad blocks/sectors, 2 blocks could not be copied to the other disk. Those 2 sectors therefore also contain invalid data. Later a 3rd disk was added, but is has the same problem as the 2nd disk. Because the bad sectors can not be read, this will not resolve itself. The data is lost.
- What's going on here is that there is a very specific type of data that triggers a bug in ZFS where the data cannot be stored or the checksum cannot correctly be calculated. The data is there but the checksum fails because of this bug. There is no actual data lost.
I can't remember whether I started this pool with the scenario of having 1 disk and adding more later. I no longer have the source file so I cannot overwrite the data with it. The only option I have is to try and find the block that is broken and force a write to it. This will force the disk to overwrite the data, making it readable again or in case of not being able to write will force the disk to re-allocate the sector.
However, the disks are reporting no problems at all regarding pending sector re-allocations, the only thing I notice is:
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 100 100 051 Pre-fail Always - 0
2 Throughput_Performance 0x0026 055 055 000 Old_age Always - 19396
3 Spin_Up_Time 0x0023 068 066 025 Pre-fail Always - 9994
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 35
5 Reallocated_Sector_Ct 0x0033 252 252 010 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 252 252 051 Old_age Always - 0
8 Seek_Time_Performance 0x0024 252 252 015 Old_age Offline - 0
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 5924
10 Spin_Retry_Count 0x0032 252 252 051 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 252 252 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 70
181 Program_Fail_Cnt_Total 0x0022 098 098 000 Old_age Always - 48738283
191 G-Sense_Error_Rate 0x0022 100 100 000 Old_age Always - 37
192 Power-Off_Retract_Count 0x0022 252 252 000 Old_age Always - 0
194 Temperature_Celsius 0x0002 064 052 000 Old_age Always - 31 (Min/Max 14/48)
195 Hardware_ECC_Recovered 0x003a 100 100 000 Old_age Always - 0
196 Reallocated_Event_Count 0x0032 252 252 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 252 252 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 252 252 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0036 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x002a 100 100 000 Old_age Always - 86
223 Load_Retry_Count 0x0032 252 252 000 Old_age Always - 0
225 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 70
And that's not very helpfull either. There is one counter (Program_Fail_Cnt_Total) that's huge but some googling turns up that many users of this disk report that counter and it's not a problem for any of them. That doesn't worry me.
server ~ # zpool status -v
pool: mir-2tb
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: http://www.sun.com/msg/ZFS-8000-8A
scrub: scrub completed after 9h26m with 1 errors on Mon Sep 26 09:01:29 2011
config:
NAME STATE READ WRITE CKSUM
mir-2tb ONLINE 0 0 1
mirror-0 ONLINE 0 0 2
disk/by-id/ata-Hitachi_HDS5C3020ALA632_ML0220F30SHWWD ONLINE 0 0 2
disk/by-id/ata-SAMSUNG_HD204UI_S2H7J9BZC05894 ONLINE 0 0 2
disk/by-id/ata-Hitachi_HDS5C3020ALA632_ML0220F30U4RKD ONLINE 0 0 2
errors: Permanent errors have been detected in the following files:
/mnt/origin/mir-2tb/diskfailure/sdb1.image.dd
So there is a problem here. What this is saying is that the same block of data is not readable from 3 different disks. And the file sdb1.image.dd experienced data loss because of this.
Every week on monday, the counters for the amount of checksum errors will go up. That's because I'm running a scrub every monday morning. And once the scrub is done, the counter is 2 checksum errors higher for each disk, and one checksum error higher for the pool. So it's not like there are more and more sections of data that can't be read.
How did this happen? How small is that chance, that 3 drives are unreadable for the same specific block of data? It's infinitesimally small. And that's not what's going on here. There are 2 possible reasons for this:
- What's going on here is that the ZFS pool was created initially with one source disk, with apparantly 2 bad sectors or bad blocks. Data was copied to that pool before there was a 2nd disk to mirror the data. Once the data was copied and the source disk was empty, the source disk was added to the pool as a mirror. But because of the bad blocks/sectors, 2 blocks could not be copied to the other disk. Those 2 sectors therefore also contain invalid data. Later a 3rd disk was added, but is has the same problem as the 2nd disk. Because the bad sectors can not be read, this will not resolve itself. The data is lost.
- What's going on here is that there is a very specific type of data that triggers a bug in ZFS where the data cannot be stored or the checksum cannot correctly be calculated. The data is there but the checksum fails because of this bug. There is no actual data lost.
I can't remember whether I started this pool with the scenario of having 1 disk and adding more later. I no longer have the source file so I cannot overwrite the data with it. The only option I have is to try and find the block that is broken and force a write to it. This will force the disk to overwrite the data, making it readable again or in case of not being able to write will force the disk to re-allocate the sector.
However, the disks are reporting no problems at all regarding pending sector re-allocations, the only thing I notice is:
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 100 100 051 Pre-fail Always - 0
2 Throughput_Performance 0x0026 055 055 000 Old_age Always - 19396
3 Spin_Up_Time 0x0023 068 066 025 Pre-fail Always - 9994
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 35
5 Reallocated_Sector_Ct 0x0033 252 252 010 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 252 252 051 Old_age Always - 0
8 Seek_Time_Performance 0x0024 252 252 015 Old_age Offline - 0
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 5924
10 Spin_Retry_Count 0x0032 252 252 051 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 252 252 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 70
181 Program_Fail_Cnt_Total 0x0022 098 098 000 Old_age Always - 48738283
191 G-Sense_Error_Rate 0x0022 100 100 000 Old_age Always - 37
192 Power-Off_Retract_Count 0x0022 252 252 000 Old_age Always - 0
194 Temperature_Celsius 0x0002 064 052 000 Old_age Always - 31 (Min/Max 14/48)
195 Hardware_ECC_Recovered 0x003a 100 100 000 Old_age Always - 0
196 Reallocated_Event_Count 0x0032 252 252 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 252 252 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 252 252 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0036 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x002a 100 100 000 Old_age Always - 86
223 Load_Retry_Count 0x0032 252 252 000 Old_age Always - 0
225 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 70
And that's not very helpfull either. There is one counter (Program_Fail_Cnt_Total) that's huge but some googling turns up that many users of this disk report that counter and it's not a problem for any of them. That doesn't worry me.
I wish that zfs had some more tools to make this analysis easier. Why does the CRC occur? What block exactly is having the problem.
So I'm at a loss here... there is a checksum error, but the
disks don't report a problem with data on the disks. Is the problem
therefore with ZFS, or with one of the disks that has (via ZFS)
propagated to the other disks ? I would be very interested to hear
anyones feedback on this.