ESTool kills raid superblock
Hopefully useful information, use at your own riskThe Event
When one of my 2 samsung disks reported a "Offline uncorrectable" I had the idea I should use samsungs estool to check and possibly repair my disk. It came with a comfortable bootable cd image and worked straight forward. So I checked both disks which were reported OK. The major amount of space on these disks is used for an md raid1 (md1 on sda3, sdb3) which holds the home directories. I rebooted and said raid1 was gone.
What happened?
Rereading the text on the download page revealed that estool actually does write tests. In addition google told me that such an event had happened before. It seems that estool implements the idea that some space beginning of the the 3rd partion is unused.
Recovery
When I took a closer look at my system I found it was actualy working.
On top of my /dev/md1 sits LVM2. Lvm had discovered sda3 and as sdb3 as
identical physical volumes and had used one of them. Since it lookded
as if the data was still OK I started recreating the raid superblock.
mdadm --create
only creates the superblock while leavin the
content intact, so I did a
mdadm --create /dev/md1 --assume-clean --level=raid1 --raid-devices=2 /dev/sda3 missing
on the disk that had not been used by lvm. Note that
--assume-clean
is probably not necessary, but thats what
I did. After fscking the recreated
md device I added the second device to the array.
mdadm --manage --add /dev/md1 /dev/sdb3
This of course triggered a resync. Since the other disk had been in use in between this was unavoidable.