This page explains the requirements and how to run the Piwik Log Analytics tool to import your server logs in Piwik.

Requirements

  • Install Piwik (or update). This should take around 5 minutes
  • To execute the script you need access to the server via SSH or some way of executing scripts on your server
  • Python 2.6 required. Note: the script that loads and parses the log files is written in Python, but Piwik itself behind the API is written in PHP5
  • You will also need one or more log files to parse and analyze with Piwik (inside each log file the log lines must be ordered by date)
  • Note: we recommend that you use the extended log format which includes user agent, referrer URL, an full URLs (including hostnames) in the logs. If these fields are missing from the logs, analytics data in Piwik will be less accurate.
  • Setup Geo Location for accurate country and city detection. Piwik guesses visitors’ countries based on the visitor’s browser language, but this information is not available in the access logs, so Geo Location is a must have.
  • Piwik 1.7.2 at minimum is required, but we always recommend to update to the latest version.

Differences using Log Analytics VS using Javascript client

When using the server logs import (compared to JavaScript Tracking) there are be a few user data points missing: screen resolutions, browser plugins, and page titles are not available (report Actions > Page Titles will be mostly empty). Tracking cookies cannot be used resulting in a
few missing data points. See also this faq.

How to: run the Log File analysis script with default options

Once you have Piwik running, you will find the script in misc/log-analytics/import_logs.py

$ python /path/to/piwik/misc/log-analytics/import_logs.py

This will display the help information. The only required parameter is

--url=http://analytics.example.com

to specify the Piwik base URL. Then, you can specify one or many log files to import.

There are many more options available. See the help output, and the README for more information and explanations about available parameters.

For example, if you wish to track all requests (static files, bot requests, http errors, http redirects) the following command would be used:

python /path/to/piwik/misc/log-analytics/import_logs.py --url=http://analytics.example.com access.log
--idsite=1234 --recorders=4 --enable-http-errors --enable-http-redirects --enable-static
--enable-bots

How to: import more data including bots, static files, and HTTP errors tracking

By default, the script does not track static files (JS, CSS, images, etc.) and excludes all bot traffic.

You can enable these using the following commands:

  • --enable-bots

will track search/spam bots in Piwik, using a custom variable with the name of the bot. When enabled, the log file will take longer to process since all bot page views are sent to Piwik.

Example of Custom Variables reporting Bots user agents:

  • --enable-static

will specify tracking of all static files (images, JS, CSS) in Piwik. This will add some time to the general log file processing.

  • --enable-http-errors

will specify tracking of HTTP errors (4xx, 5xx status) as a page view in Piwik, with a custom variable HTTP-code set to 404, 500, etc. The page title for this page view will show the URL referrer if it is specified in the log file (which can help finding out which pages have a link to a 404 for example).

  • --enable-http-redirects

will track HTTP redirect (302,301,3xx) as a page view, with a custom title, and a custom variable. Note: HTTP status 304 responses (“Not modified”) are tracked as page views.

  • --enable-reverse-dns

will enable the reverse DNS (used to generate the Visitors > Providers report), expect a big performance hit as reverse DNS is very slow.

  • --recorders=N

specifies the number of threads: we recommend setting it to the number of CPU cores in the system (or slightly more or less depending on your server configuration)

  • --recorder-max-payload-size=N

The importer uses the bulk tracking feature of Piwik to achieve greater speed. By default, 300 pageviews (or log lines) will be sent to Piwik at once. You can experiment with this number to try and achieve better performance, but there is an upper limit to the speed you can get.

How to: exclude some particular log lines

There are several ways to exclude particular log lines or visitors from being tracked.

  • you can exclude specific IP addresses or IP ranges from being tracked. To configure excluded IPs, log into Piwik as Super User, then click Settings > Websites.
  • the script provides an option to exclude visits with specific User Agent HTTP headers — via

    --useragent-exclude

  • the script provides an option to enforce a whitelist of all URL hostnames that should be considered — all other log lines with a hostname not in the list will not be imported. See the option

    --hostname

  • it is also possible to exclude specific log lines where the URL path matches a particular URL path. See the option --exclude-path

For example to exclude all files from the URL example.org/assets/ you would write --exclude-path="/assets"

To exclude two paths you would write: --exclude-path="path1/here" --exclude-path="/sub/path2"

Frequently Asked Questions

For more information and guides, check out our Log Analytics tool FAQs

If you have feature requests for better Server Log processing with Piwik, please let us know using the feedback form below. We look forward to your feedback and hope Piwik will deliver huge value for all server logs.