[analog-help] Merging log files
Aengus
analog07 at eircom.net
Tue May 1 16:11:03 PDT 2007
Mick Burrell <mick at midtowncottages.co.uk> wrote:
> On 1 May 2007, at 22:07, Aengus wrote:
>
>> Mick Burrell <mick at midtowncottages.co.uk> wrote:
>>> I have a site on a server which produces log files in what seems to
>>> be plain text (type .log) and compressed (type .gz). Analog handles
>>> these just fine but the server only keeps them for three weeks. I'd
>>> like to be able to download these on a weekly basis and merge the
>>> files to produce a monthly (or longer period) report. I realise I
>>> can't just duplicate the files or I'd get repeat entries.
>>
>> You don't need to merge the logfiles - Analog can read multiple
>> logfiles.
>
> So if I download a file containing data for weeks 1, 2 and 3 and the
> next week download the file which by then will contain weeks 2, 3 and
> 4, do you mean that Analog will not count double for weeks 2 and 3?
No. If you're downloading overlapping data, then you'd have to remove
the overlap yourself. Analog will assume that overlapped data is coming
from different web servers, and will not "filter it out". I'm not sure
why you would need to do that though - don't you have access to "closed"
logfiles? Most web servers "rotate" their logs either on a fixed
schedule (hourly, daily weekly or monthly) or when the logfile reaches a
fixed size. While you might want to include the "live" logfile in your
analysis (especially if the rotation schedule is fairly long), you
shouldn't archive these "live" logs, just the ones that have been
rotated.
>From your initial description, I imagine that the .gz files are the
"rotated" logs - they are "done" and don't overlap. You could consider
the .log files as "temporary", constantly being updated until the end of
the day, when it is gzipped, and a new .log is created. In that
situation, you only want to archive the .gz files.
That's a fairly common scenario for managing web server logs - I'd be
surprised if your server is using .log and .gz files any differently.
>> The simplest solution is add the "historical" files to a zip file,
>> and just run Analog against this zip file.
>
> The files are small at present but I'm not sure I understand what you
> mean by this. Sorry - I'm no doubt being dim.
If you have daily log files, it's often handy to stick a months worth of
logs into a zip file, so that you have one file of 20MB, instead of 30
files of 10-15MB each. If you want to archive your log files on a
network drive, Analog will run much faster reading a 20MB .zip file over
the network than 400MB of plain text logfiles. (Log files will often
achieve 20:1 compression).
Even if the size of the logs isn't an issue, if the only reason you're
keeping them is to run Analog against them, then collecting them in zip
files can be a tidy way to maintain archived logs.
Aengus
More information about the analog-help
mailing list