Sunday, October 22, 2006

It seems a mad idea.

I've been trying to write a md personality , in fact I had hoped tho be coding it but I've it's a Satuday Afternoon - and I had lazy morning.

The Sky Plus box, was runing out of disk space agains so there were some important telly watching to do to ensure it didn't fill up - or so SWMBO told me.

Back to the new md personality - zerobad - the aim of this personality is to mask bad sectors and other errors from the higher layers . In general this is not a good thing as it prevents any code in the higher layer from doing its own error recovery - and for journalling filesystems would break the guarantees the journalling subsytems are trying to provide.

So why do it then?

The rationale for this device as come from the amount of time I have been spending doing block copies of harddrives with bad sectors on them. eg.
dd if=/dev/src of=out conv=noerror,sync

The reason for doing this copy is almost always fikesystem recovery of failling disks, and by doing this first we get two really useful features.

  1. We protect ourselves from the disk failling any further during the recovery process as we have a stable copy of the filing system.
  2. We have a mechanism with which we can undo our work by making another copy.
  3. We can always give the disk back to our customer in the same state as it arrived in , excepting any further failures (see 1).
  4. Some recovery tools don't work well on disks with bad sectors in some critical places. This seems to be particularly true for chkdsk.exe - and worse sometimes the windows can refuse to recognise these volumes.

However this approach has a real downside , it takes a substantial amount of time to make this copy and this slows down how fast we can get a users data back for them. The time to first file recovered is a critical one particulary if the user can specify which files are most important.

So as an alternative if would clearly be good if we could mount the original disk in a way which did the same translation as dd's conv=noerror,sync. At that is exactly what zerobad is intented to do.

Specifically zerobad will only mask read and will still propagate write errors so that as far as possible journalling and similiar filesystem guarantee's can be met.

It's worth looking at which of the advantages we originally noted our copy approach has can be still supportted by our new approach.

  1. Clearly isn't possible as we are now accessing the disk in a random access manner, but more on this later.
  2. It seems at first sight we might lose this advantage as well, but in fact the linux kernel already provides precisely what we need to keep this feature - writeable snapshots. If we always use a writeable snapshot to make any recovery changes to we can keep our undoablitiy - in fact we can possible do better than before if we have multiple snapshots.
  3. Given the above we certianly have this ,although out access pattern has chnaged which might affect the chances of failure. I'm not sure there is any way to judge this easily - in simple terms we expect more long seeks, but less actual disk reads than before. I don't have any figures on how this will affect disk faillures.
  4. As long as the disk recovery tools can operate on our snapshot / md personality this is given too.
In fact you can also see we can achieve (1) by doing a backgrund copy of the drive to a safe store , although in this case we stil need enough space to store the entire disk image we are working on which does remove the advantgae the beign able to use snapshots would otherwise give of only needing enough disk space required to store the changes. Another use for this personality has been suggested recently - raid5 recovery as the raid5 system won't automatically recovery from when a disk has failled and a second disk has bad sectors - in fact it could fail the second disk making recovery and even mounting of the array impossible. With the zerobad personality you could mask the bad sector allowing recovery and mounting to continue.

You would have corruptted data in the bad sectors of the array but at least you will have the majority of the information intact.

This use is not for the faint-hearted but any sort of diskrecovery is problematic and I tools like zerobad while being dangerous in the hands the inexperinced can be godsend at other times.

And this is for unix if we wanted to be handheld we'ed probably not be running linux in the first place.

Finally the eagle eyed amongst you will have noted I mentioned a recovery tool a different operating system which I would like to run on the new volume. There are a number of approaches such as virtualisations and iSCSI which make this possible but the discussion of this is outside if the scope of what I wantted to talk about here.