If your website has more than a few hundreds visits per day (bravo!), waiting for Piwik to process your data may take a few minutes. The best way to avoid these waiting times is to set up a cron job on your server so that your data is automatically processed every hour.

To automatically trigger the Piwik archives, you can set up a script that will execute every hour.

There are instructions below for Linux/Unix systems using a crontab, but also instructions for Windows users with the Windows Task Scheduler, and for tools such as CPanel. If you don’t have access to the server, you can also setup a web cron.

Linux/Unix: How to Set up a Crontab to Automatically Archive the Reports.

A crontab is a time-based scheduling service in a Unix-like server. The crontab requires php-cli or php-cgi installed. You will also need SSH access to your server in order to set it up. Let’s create a new crontab with the text editor nano:

nano /etc/cron.d/piwik-archive

and then add the lines:

MAILTO="youremail@example.com"
5 * * * * www-data /usr/bin/php5 /path/to/piwik/console core:archive --url=http://example.org/piwik/ > /home/example/piwik-archive.log

The Piwik archive script will run every hour (at 5 minutes past). Generally, it completes in less than one minute. On larger websites (10,000 visits and more), Piwik archiving can take up to 30 minutes.

Breakdown of the parameters:

  • MAILTO=”youremail@example.com” If there is an error during the script execution, the script output and error messages will be sent to the youremail@example.com address.
  • www-data is the user that the cron job will be executed by, it should generally be your web server user. It is sometimes “apache”.
  • /usr/bin/php5 is the path to your PHP executable. It varies depending on your server configuration and operating system. You can execute the command “which php5″ or “which php” in a linux shell, to find out the the path of your PHP5 executable. If you don’t know the path, ask your web host or sysadmin.
  • --url=http://example.org/piwik/

is the only required parameter in the script, which must be set to your Piwik base URL eg. http://analytics.example.org/ or http://example.org/piwik/

  • > /home/example/piwik-archive.log is the path where the script will write the output. You can replace this path with /dev/null if you prefer not to log the last piwik cron output text. The script output contains useful information such as which websites are archived, how long it takes to process for each date & website, etc.

Description of the ‘linux cron’ utility: The cron utility uses two different types of configuration files: the system crontab and user crontabs. The only difference between these two formats is the sixth field.

  • In the system crontab, the sixth field is the name of a user for the command to run as. This gives the system crontab the ability to run commands as any user.
  • In a user crontab, the sixth field is the command to run, and all commands run as the user who created the crontab; this is an important security feature.

If you set up your crontab as a user crontab, you would instead write:

5 * * * * /usr/bin/php5 /path/to/piwik/console core:archive --url=http://example.org/piwik/ > /dev/null

This cron job will trigger the day/week/month/year archiving process at 5 minutes past every hour. This will make sure that when you visit your Piwik dashboard, the data has already been processed; Piwik will load quickly.

Test the cron command: Make sure the crontab will actually work by running the script as the crontab user in the shell:

# su www-data -c "/usr/bin/php5 /path/to/piwik/console core:archive --url=http://example.org/piwik/"

You should see the script output with the list of websites being archived, and a summary at the end stating that there was no error.

Windows: How to Set up Auto-Archiving Using Windows Scheduler

To open the task scheduler on Windows (XP, 7, 2003/2008 Server) click All Programs, point to Accessories, point to System Tools, and then click Scheduled Tasks.

Click ‘Add Scheduled Task’ and name the task e.g. “Piwik Auto Archiving”. Click on the tab ‘Trigger’ and add a new trigger. Select to create a trigger after a timetable, to be executed daily and every hour. Confirm the settings and switch to the action tab.

In the task properties, in the “Run” input field, you should input the command to run the archiving script, for example

C:\xampp\php\php.exe "D:\www\piwik\console" core:archive --url=http://piwik.example.org/

See also this screenshot of the Piwik archiving scheduled task properties window:

piwik archive scheduled task on windows

CPanel: How to Set up the Cron Script Using CPanel

It is easy to set up automatic archiving if you use a user interface such as CPanel, Webmin or Plesk. Here are the instructions for CPanel:

  1. Log in to CPanel for the domain with the Piwik installation
  2. Click on “Cron Jobs”
  3. Leave email blank
  4. In ‘Minutes’ put 00 and leave the rest blank.
  5. You then need to paste in the path to the PHP5 executable, then the path to the Piwik /console script, then the parameter with your Piwik base URL –url=piwik.example.org/
    Here is an example for a Hostgator install (in this example you would need to change ‘yourcpanelsitename’ to whatever your particular domains cpanel username is)

    /usr/local/bin/php -f /home/yourcpanelsitename/public_html/piwik/console core:archive --url=example.org/piwik/ > /home/example/piwik-archive-output.log
    

“yourcpanelsitename” tends to be the first eight letters of your domain (unless you changed it when you set up your cpanel account)
6. Click “Add New Cron Job”

Piwik will process your reports automatically at the hour.

Web Cron When Your Web Host Does Not Support Cron Tasks

If possible, we highly recommend that you run a cron or scheduled task. However, on some shared hosting, or on particular server configurations, running a cron or scheduled task may not be easy or possible.

Some web hosts let you set up a web cron, which is a simple URL that the host will automatically visit at a scheduled time. If your web host lets you create a web cron, you can input the following URL in their hosting interface:

http://your-server.org/path/to/piwik/misc/cron/archive.php?token_auth=XYZ

Replace the XYZ by the super user 32 characters token_auth. To find the token_auth, log in as a super user in Piwik, click on the API link at the top and the token_auth is displayed on the page.

You can test the web cron by pasting the URL in your browser, wait a few minutes for processing to finish and then check the output.

The web cron should be triggered at least once per hour. You may also use a ‘Website Monitoring’ service (free or paid) to automatically request this page every hour.

Important Tips for Medium to High Traffic Websites

Disable browser triggers for Piwik archiving and limit Piwik reports to updating every hour

After you have set up the automatic archive script as explained above, you can set up Piwik so that requests in the user interface do not trigger archiving, but instead read the pre-archived reports. Login as the super user, click on Settings > General Settings, and select:

  • Allow Piwik archiving to trigger when reports are viewed from the browser: No
  • Reports for today will be processed at most every: 3600 seconds

Click save to save your changes. Now that you have set up the archiving cron and changed these two settings, you can enjoy fast pre-processed near real-time reports in Piwik!

the general settings with the options above highlighted

Today’s statistics will have a one hour lifetime, which ensures the reports are processed every hour (near real time)

Increase PHP Memory Limit

If you receive this error:

Fatal error: Allowed memory size of 16777216 bytes exhausted (tried to allocate X bytes)

you must increase the memory allocated to PHP. Update your php.ini file to increase your memory limit in the file

/etc/php5/cli/php.ini

and/or at

/etc/php5/apache/php.ini

To give Piwik enough memory to process your web analytics reports, increase the memory limit to 512M:

memory_limit = 512M

More High Traffic Server Tips!

It is possible to track millions of pages per month on hundreds or thousands of websites using Piwik. Once you have set up cron archiving as explained above, there are other important and easy steps to improve Piwik performance.

For more information, see the High traffic server FAQ.

More Information About Piwik Archiving

  • If you run archiving several times per day, it will re-archive today’s reports, as well as any reports for a date range which includes today: current week, current month, etc.
  • Your Piwik database size will grow over time, this is normal. Piwik will delete archives that were processed for incomplete periods (i.e. when you archived a week in the middle of this week), but will not delete other archives. This means that you will have archives for every day, every week, every month and every year in the MySQL tables. This ensures a very fast UI response and data access, but does require disk space. We’d love to see a a plugin that would delete some of the old data (for example, only keep the top 50 rows for each report).
  • Piwik archiving for today’s reports is not incremental: running the archiving several times per day will not lower the memory requirement for weeks, months or yearly archives. Piwik will read all logs for the full day to process a report for that day.
  • Once a day/week/month/year is complete and has been processed, it will be cached and not re-processed by Piwik.
  • If you don’t set up archiving to run automatically, archiving will occur when a user requests a Piwik report. This can be slow and provide a bad user experience (users would have to wait N seconds). This is why we recommend that you set up auto-archiving for medium to large websites (click for more information) as explained above.