<table cellspacing="0" cellpadding="0" border="0" ><tr><td valign="top" style="font: inherit;">I would chek out splunk its uber 1337<br><br><div><div>&nbsp;</div> <div>&nbsp;</div> <div>-----==-=====--==-=====--==-=====--==-</div><div>Tomorrow’s security today!<br>http://rmccurdy.com </div><div>-----==-=====--==-=====--==-=====--==- </div><div>&nbsp;</div></div><br><br>--- On <b>Sun, 3/22/09, scott mcbrien <i>&lt;smcbrien@gmail.com&gt;</i></b> wrote:<br><blockquote style="border-left: 2px solid rgb(16, 16, 255); margin-left: 5px; padding-left: 5px;">From: scott mcbrien &lt;smcbrien@gmail.com&gt;<br>Subject: Re: [ale] Dealing with really big log files....<br>To: ale@ale.org<br>Date: Sunday, March 22, 2009, 12:35 PM<br><br><div id="yiv1076444877">You could write a perl script to break it apart for you. &nbsp;The pseudo code would look something like:<div><br></div><div>open original log file</div><div><br></div><div>while input from

 file</div><div>&nbsp;&nbsp;read first line</div>

<div>&nbsp;&nbsp;pattern match for the thing that looks like a date</div><div><div>&nbsp;&nbsp;open a different file (probably with date as part of the name)</div><div><br></div><div>&nbsp;&nbsp;while read line contains date<br></div></div><div>&nbsp;&nbsp; &nbsp;write out the line<br>

</div><div>&nbsp;&nbsp; &nbsp;read the next line</div><div><br></div><div>&nbsp;&nbsp;close the file&nbsp;<br></div><div><br></div><div>close the original log file</div><div><br></div><div>variations would include adding some directory structure around where to place the logs when they're broken apart, or instead of separating by day, separating by month or year.</div>

<div><br></div><div>-Scott</div><div><br><div class="gmail_quote">On Sun, Mar 22, 2009 at 10:54 AM, Kenneth Ratliff <span dir="ltr">&lt;<a rel="nofollow" target="_blank" href="mailto:lists@noctum.net">lists@noctum.net</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

<div class="im">-----BEGIN PGP SIGNED MESSAGE-----<br>

Hash: SHA1<br>

<br>

</div><div class="im">On Mar 22, 2009, at 10:15 AM, Greg Freemyer wrote:<br>

<br>

&gt; If you have the disk space and few hours to let it run, I would just<br>

&gt; "split" that file into big chinks. &nbsp;Maybe a million lines each.<br>

<br>

</div>Well, I could just sed the range of lines I want out in the same time<br>

frame, and keep the result in one log file as well, which is my<br>

preference. I've got about 400 gigs of space left on the disk, so I've<br>

got some room. I mean, I don't really care about the data that goes<br>

before, that should have been vaporized to the ether long before, I<br>

just need to isolate the section of the log I do want so I can parse<br>

it and give an answer to a customer.<br>

<div class="im"><br>

&gt; I'd recommend the source and destination of your split command be on<br>

&gt; different physical drives if you can manage it. &nbsp;Even if that means<br>

&gt; connecting up a external usb drive to hold the split files.<br>

<br>

</div>Not a machine I have physical access to, sadly. I'd love to have a<br>

local copy to play with and leave the original intact on the server,<br>

but pulling 114 gigs across a transatlantic link is not really an<br>

option at the moment.<br>

<div class="im"><br>

&gt; If you don't have the disk space, you could try something like:<br>

&gt;<br>

&gt; head -2000000 my_log_file | tail -50000 &gt; /tmp/my_chunk_of_interest<br>

&gt;<br>

&gt; Or grep has a option to grab lines before and after a line that has<br>

&gt; the pattern in it.<br>

&gt;<br>

&gt; Hopefully one of those 3 will work for you.<br>

<br>

</div>mysql's log file is very annoying in that it doesn't lend itself to<br>

easy grepping by line count. It doesn't time stamp every entry, it's<br>

more of a heartbeat thing (like once a second or every couple seconds,<br>

it injects the date and time in front of the process it's currently<br>

running). There's no set number of lines between heartbeats, so one<br>

heartbeat might have a 3 line select query, the next heartbeat might<br>

be processing 20 different queries including a 20 line update.<br>

<br>

I do have a script that will step through the log file and parse out<br>

what updates were made to what database and what table at what time,<br>

but it craps out when run against the entire log file, so I'm mostly<br>

just trying to pare the log file down to a size where it'll work with<br>

my other tools :)<br>

<div class="im"><br>

&gt; FYI: I work with large binary data sets all the time, and we use split<br>

&gt; to keep each chunk to 2 GB. &nbsp;Not specifically needed anymore, but if<br>

&gt; you have read error etc. if is just the one 2 GB chunk you have to<br>

&gt; retrieve from backup. &nbsp;if also affords you the ability to copy the<br>

&gt; data to FAT32 filesystem for portability.<br>

<br>

</div>Normally, we rotate logs nightly and keep about a weeks worth, so the<br>

space or individual size comparisons are usually not an issue. In this<br>

case, logrotate busted for mysql sometime back in November and the<br>

beast just kept eating.<br>

<div class="im">-----BEGIN PGP SIGNATURE-----<br>

Version: GnuPG v2.0.9 (Darwin)<br>

<br>

</div>iEYEARECAAYFAknGUTIACgkQXzanDlV0VY53YgCgkJxWJK6AAOZ+c2QTPN/gYLJH<br>

v/YAoPZXNIBckyfhfbMGrAZ6TNEqcIxV<br>

=IOjT<br>

<div><div></div><div class="h5">-----END PGP SIGNATURE-----<br>

<br>

_______________________________________________<br>

Ale mailing list<br>

<a rel="nofollow" target="_blank" href="mailto:Ale@ale.org">Ale@ale.org</a><br>

<a rel="nofollow" target="_blank" href="http://mail.ale.org/mailman/listinfo/ale">http://mail.ale.org/mailman/listinfo/ale</a><br>

</div></div></blockquote></div><br></div>

</div><pre>_______________________________________________<br>Ale mailing list<br>Ale@ale.org<br>http://mail.ale.org/mailman/listinfo/ale<br></pre></blockquote></td></tr></table><br>