[ale] slocate updatedb locked files?
Jeff Lightner
jlightner at water.com
Fri Jan 12 17:18:39 EST 2007
No. The slocate runs at 4 AM and the backup runs at 9 PM. As noted it appears it hangs first on an slocate job so later when the backup tries to run it hangs on the already locked file as well. This is an OS only backup (DB is done separately during the day and not on ext3 filesystems). Also I'd think if the issue were a conflict of two regularly scheduled jobs hitting the same file at the same time that it would either be the same file each time or something in close proximity. As mentioned in these 2 events the files were in different filesystems. Of course they all share the same RAID (PERC on Dell) disks.
It could be some issue with the PERC or the physical drives but since it has only occurred twice in 4 months I'm not ready to say it is. I was just wondering if someone else had similar experience with slocate/updatedb as it started there both times though it could be they were the things that found the lock first rather than what actually caused it.
________________________________
From: ale-bounces at ale.org [mailto:ale-bounces at ale.org] On Behalf Of Warren Myers
To: ale at ale.org
Sent: Friday, January 12, 2007 3:35 PM
To: Atlanta Linux Enthusiasts
Subject: Re: [ale] slocate updatedb locked files?
I don't know if this will be on the right track, but since you asked for a response to your question, here's a thought (might be way off base, not sure).
How far apart are the cron jobs for slocate and the incremental (or full) backup?
I've had hangs in the past from cron jobs taking too long and overlapping, though it's pretty infrequent since anacron exists - since moving to CentOS 4 from 3, I haven't had any overlapping cron jobs hang against each other.
Warren
On 1/10/07, Jeff Lightner <jlightner at water.com> wrote:
Some time back (mid-October) we noticed a dramatic increase in CPU load during a weekend. On researching this we were able to trace it back to a file on an ext3 filesystem that would cause any process that attempted to access it to hang (thereby adding to the runq). Since the slocate cron job ran every day as did an incremental or full backup they too would get hung and add to the runq. This was the first time such an issue had been noted in over a year of running this RH AS 3 system. The issue was solved by rebooting the server. The file in question was easily accessible after the reboot.
Overnight our incremental backup failed and I see once again that slocate/updatedb has hung on a file but it is not the same file nor even the same filesystem as the prior one though it is an ext3 filesystem.
We' re going to reboot to clear the problem but I'm wondering if this is being caused by slocate/updatedb or is it just the first thing that finds it. If the latter; what is causing the initial file lock.
P.S. Before anyone suggests it - of course we have tried doing progressive kills on all processes referencing the file - the kills including kill -9 do not work.
_______________________________________________
Ale mailing list
Ale at ale.org
http://www.ale.org/mailman/listinfo/ale
--
http://warrenmyers.com
"God may not play dice with the universe, but something strange is going on with the prime numbers." --Paul Erd?s
"It's not possible. We are the type of people who have everything in our favor going against us." --Ben Jarhvi, Short Circuit 2
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Ale
mailing list