[ale] Bad SATA interactions

Sun Nov 4 14:11:54 EST 2012

Interesting. Unfortunately, btrfs isn't suitable for my workload due to
performance issues, and so I tend to only use it when I need to treat files
in ways which btrfs uniquely boosts performance. Sadly, it doesn't work
(well) for anything that depends on SQLite or heavy fsync usage unless you
disable the fsync call (such as by using libeatmydata, which converts fsync
into a no-op or by using a similar approach but rate-limiting calls to
fsync to be less frequent).

Dpkg is particularly bad about that; a system using it can take tens of
times longer on btrfs vs. ext4. Mozilla apps, too, but not as bad.

I think I remember reading something about a recent commit to change that,
but the kernel I am presently using is pretty old son won't try again
probably until F18,depending on what kernel it has.
On Nov 4, 2012 1:48 PM, "Phil Turmel" <philip at turmel.org> wrote:

> On 11/04/2012 12:38 PM, Michael Trausch wrote:
> > So I had an interesting few days... Aside from the fact that I have been
> > sick, it turns out I have had an interesting problem appear.
> >
> > I changed motherboards recently, to test UEFI and so forth out. When I
> did
> > so I started having some problems that traditionally scream "memory
> > errors", except my RAM was just fine.
> >
> > I hadn't immediately thought to check the drive's SMART log because I am
> > used to distributions signaling via the UI when such events happen. Well,
> > it turns out that Fedora doesn't do smart monitoring by default!
> >
> > I had an apparently bad SATA cable (am running tests now to see if the
> new
> > cable is actually the solution here). The symptom was UDMA CRC error
> counts
> > through the roof, which the drive detected and then aborted the
> > corresponding command.
> >
> > I mention this as we recently had a thread on silent corruption.
> >
> > So, to the question part: even with smartctl and friends not installed
> and
> > running, shouldn't modern file systems be storing checksums to catch this
> > sort of thing without obscure errors? I thought that ext4 had such
> support,
> > but I would appear to be incorrect there.
>
> Btrfs has content checksums.  Ext4 has experimental journal checksums,
> but that has been the subject of recent bugs, and is not yet recommended
> for production.
>
> The key issue is that much of the efficiency gains in modern I/O systems
> is based upon buffering / implicit write-back cacheing, where multiple
> small writes to the same sector of a file coalesce into a single, later,
> actual write.  Since many applications depend on this for performance,
> it cannot be disabled by default.  Filesystems that attempt to generate
> checksums between those writes must either abort them when a subsequent
> write comes, or keep multiple versions of the sector in memory.  Either
> way, the checksum must then be written to the inodes in a way that
> synchronizes with the actual sector write itself.
>
> Btrfs can maintain synchronization because it doesn't rewrite in
> place--it always allocates new space for rewritten sectors, eventually
> garbage-collecting the superceded ones.  For filesystems that rewrite
> file contents in place, I haven't yet seen a solution.  I'm not entirely
> sure there is one, at a usable level of performance.
>
> Note that btrfs has a mount option, "nodatacow", to disable data
> copy-on-write for performance reasons on certain applications, like
> large database files.  This also disables checksums, as the FS can no
> longer ensure synchronization.
>
> HTH,
>
> Phil
> _______________________________________________
> Ale mailing list
> Ale at ale.org
> http://mail.ale.org/mailman/listinfo/ale
> See JOBS, ANNOUNCE and SCHOOLS lists at
> http://mail.ale.org/mailman/listinfo
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ale.org/pipermail/ale/attachments/20121104/3ece710f/attachment.html>