If you think your data is protected against bit rot because it’s on a RAID array, think again.
Bit rot is a real problem. All storage media is at risk. Magnetic media like discs and tapes can lose data integrity over time. Even RAM is susceptible to cosmic rays. You may not even know about it because the disc keeps working but a bit or a byte here and there has changed.
A typical home user with a 2TB disk has an almost 100% chance of bit rot. If it just changes a pixel in a video, it’s not going to be a problem. But if it’s in your database or operating system software, that failure could be much more significant.
Today there is no protection. This problem is undetectable and then there is no means of recovery.
You might think that RAID is the answer but it’s not. An Amplidata white paper, The RAID Catastrophe, explains why:
- It doesn’t do hashing or CRC to verify the data.
- It doesn’t track bit rot or correct it (nor do most operating systems).
- There are no ‘previous versions’ to restore; just the current version of each file.
- Storage volumes are increasing dramatically so a rebuild in the event of a total drive failure can be very lengthy (even a couple of days on 2TB HDD arrays).
- Increasing the number of drives or the size of each drive increases the overall risk of failure; even if the MTBF for any individual drive seems low.
- If one drive in an array fails, it is quite likely that a second one will also fail soon because RAID writes to all drives in the array at the same time.
- RAID-6 double parity is a partial answer to this but it imposes a big performance overhead.
- Forensic data recovery from RAIDs is very, very difficult, time-consuming and expensive. It’s like trying to turn an omelette back into the original eggs.
There is an additional problem with RAID storage. If you have a problem with a drive, you replace it and the rebuild fails for some reason you have no way of recovering the data on the array. At least on a single disk, you can recover all the data except on the failed sector(s). This is another example of IT industry focusing on MTBF rather than on complexity or recovery time.
The IT industry needs a radical rethink about storage. The old orthodoxy isn’t good enough. It may be heresy to admit it but perhaps we need to move beyond RAID and towards new, unbreakable storage technology like Amplidata’s.
