[ale] Dealing with really big log files....
Jeff Hubbs
jeffrey.hubbs at gmail.com
Sun Mar 22 12:55:19 EDT 2009
Just some thoughts thrown out...
A 114GiB log file certainly will compress like mad, either via gzip or
bzip2 - the former is faster to compute; the latter generally gives
smaller output. Once you've done that and pulled over the compressed
copy for local use, use rsync -z to keep your local copy synced to the
server's.
You might want to consider loading the whole schmeer into an RDBMS
locally for further analysis.
Kenneth Ratliff wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On Mar 22, 2009, at 10:15 AM, Greg Freemyer wrote:
>
>
>> If you have the disk space and few hours to let it run, I would just
>> "split" that file into big chinks. Maybe a million lines each.
>>
>
> Well, I could just sed the range of lines I want out in the same time
> frame, and keep the result in one log file as well, which is my
> preference. I've got about 400 gigs of space left on the disk, so I've
> got some room. I mean, I don't really care about the data that goes
> before, that should have been vaporized to the ether long before, I
> just need to isolate the section of the log I do want so I can parse
> it and give an answer to a customer.
>
>
>> I'd recommend the source and destination of your split command be on
>> different physical drives if you can manage it. Even if that means
>> connecting up a external usb drive to hold the split files.
>>
>
> Not a machine I have physical access to, sadly. I'd love to have a
> local copy to play with and leave the original intact on the server,
> but pulling 114 gigs across a transatlantic link is not really an
> option at the moment.
>
>
>> If you don't have the disk space, you could try something like:
>>
>> head -2000000 my_log_file | tail -50000 > /tmp/my_chunk_of_interest
>>
>> Or grep has a option to grab lines before and after a line that has
>> the pattern in it.
>>
>> Hopefully one of those 3 will work for you.
>>
>
> mysql's log file is very annoying in that it doesn't lend itself to
> easy grepping by line count. It doesn't time stamp every entry, it's
> more of a heartbeat thing (like once a second or every couple seconds,
> it injects the date and time in front of the process it's currently
> running). There's no set number of lines between heartbeats, so one
> heartbeat might have a 3 line select query, the next heartbeat might
> be processing 20 different queries including a 20 line update.
>
> I do have a script that will step through the log file and parse out
> what updates were made to what database and what table at what time,
> but it craps out when run against the entire log file, so I'm mostly
> just trying to pare the log file down to a size where it'll work with
> my other tools :)
>
>
>> FYI: I work with large binary data sets all the time, and we use split
>> to keep each chunk to 2 GB. Not specifically needed anymore, but if
>> you have read error etc. if is just the one 2 GB chunk you have to
>> retrieve from backup. if also affords you the ability to copy the
>> data to FAT32 filesystem for portability.
>>
>
> Normally, we rotate logs nightly and keep about a weeks worth, so the
> space or individual size comparisons are usually not an issue. In this
> case, logrotate busted for mysql sometime back in November and the
> beast just kept eating.
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v2.0.9 (Darwin)
>
> iEYEARECAAYFAknGUTIACgkQXzanDlV0VY53YgCgkJxWJK6AAOZ+c2QTPN/gYLJH
> v/YAoPZXNIBckyfhfbMGrAZ6TNEqcIxV
> =IOjT
> -----END PGP SIGNATURE-----
>
> _______________________________________________
> Ale mailing list
> Ale at ale.org
> http://mail.ale.org/mailman/listinfo/ale
>
>
More information about the Ale
mailing list