[ale] Many thanks to Phil Turel
Phil Turmel
philip at turmel.org
Mon Aug 6 14:26:06 EDT 2018
Hi Derek,
On 08/06/2018 01:46 PM, Derek Atkins wrote:
> Phil Turmel via Ale <ale at ale.org> writes:
>
>> You're welcome, Malcolm.
>>
>> Very interesting and unusual bit of corruption, on all but the first
>> superblock, and precisely on the single 512-byte sectors of those other
>> superblocks. Never seen anything like it.
>
> So how did you debug it? And how did you fix it?
I used xfs_db, based on a clue from an old mailing list entry with a
similar error message.
Within xfs_db, "sb 0" would move the cursor to the first superblock,
which I could then "print", report the block # with "fsb", and report
the sector number with "daddr". Repeat with "sb 1", "sb 2", and "sb 3".
With the sector numbers, I could get hex for the superblock and
surrounding sectors with:
dd if=/dev/whatever bs=512 skip=sector count=16 |hexdump -C
That showed me the scrambled data in just one sector in the latter
superblocks, with proper data structures following.
I then used dd to extract the good superblock:
dd if=/dev/whatever bs=512 count=1 of=tempsb.dat
And write it to the other locations:
dd if=tempsb.dat bs=512 count=1 seek=sector of=/dev/whatever
xfs_repair then worked, but with a handful of corrections, due to the
inability to mount to replay the log.
> If it's that regular a pattern it could be anything from a rotary issue
> in the HDD to a failed memory stick.
The original failing device was an M.2 mini-PCIe SSD. And it was
failing, and gave up the ghost completely later.
I have no idea what failure mode made it possible to write just the one
scrambled 512-byte sector to the beginning of each allocation group,
except the first. Smells like an offset calculation bug to me.
Phil
More information about the Ale
mailing list