Note: Apache2Piwik described below is deprecated: we recommend using import_logs.py (click for more info!)
Log files are known to contain a wealth of information about activity on a website, and are usually analyzed with tools such as AWStats or Webalizer. Being able to transfer it to Piwik, a powerful web analysis tool, can greatly enhance data mining and presentation. This, in turn, means more control over your web property, better informed decisions and greater potential for optimalization.
This page contains the following sections: Apache2Piwik requirements, How to use guide, List of missing reports when using log files, Performance of the script, and Credits.
- access to Piwik installation
- access to Apache logs with read privileges (you can specify log format in settings.py)
- Python 2.6 with MySQLdb, GeoIP for Python and httpagentparser
How to import Apache logs in Piwik?
Follow these steps for a test export with Apache2Piwik:
- Important: create backup of your Piwik MySQL Database.
- create `settings.py` as a copy of settings.py.sample and edit MySQL Piwik Database configuration
- execute apache2piwik.py – see examples below
Example 1 – importing log file, all settings set in settings.py file:
$ python2.6 ./apache2piwik.py
Started processing /path/to/file/logfile1 file...
Finished in 2m16s.
Started processing /path/to/file/logfile2 file...
Finished in 2m59s.
$ python2.6 ./apache2piwik.py start
$ python2.6 ./apache2piwik.py stop
$ python apache2piwik.py -g
- Images files are automatically ignored. You can customize ignored extensions in settings.py file. You can also ignore specific logs with regular expressions there
- Search bots are not excluded at this stage. We might add a feature to exclude bots in a future version.
- When you import data in the past, or when you want to reprocess your reports from the logs, you can delete piwik_archive_* tables. See more information in this FAQ.
- Apache2Piwik imports data into the idsite specified in settings.py. You can override this by “-i [idsite]” command line parameter
Apache log import Performance
- If your URLs contain session id, add a regular expression in URL_REGEXPR directive in settings.py to cut it out
- Do you have any monitoring or cron scripts that call some URLs every X minutes?
If so, add them to IGNORED_LOGS directive in settings.py
- The script is designed more for a “single website” use case, or for a few websites. We haven’t tested in a “web hosting” environment type load at this stage, but we hope to in the future.
The project has been developed initially for CLANMO GmbH, an award-winning mobile interactive agency from Köln, Germany.
If you have any suggestion, bug report, or feedback about Apache2Piwik, please leave in a comment in above page directly.