If your website has more than a few hundreds visits per day (bravo!), waiting for Piwik to process your data may take a few minutes. The best way to avoid these waiting times is to set up a cron job on your server so that your data is automatically processed every hour.
To automatically trigger the Piwik archives, you can set up a script that will execute every hour.
There are instructions below for Linux/Unix systems using a crontab, but also instructions for Windows users with the Windows Task Scheduler, and for tools such as CPanel. If you don’t have access to the server, you can also setup a web cron.
Linux/Unix: How to Set up a Crontab to Automatically Archive the Reports.
A crontab is a time-based scheduling service in a Unix-like server. The crontab requires php-cli or php-cgi installed. You will also need SSH access to your server in order to set it up. Let’s create a new crontab with the text editor
and then add the lines:
MAILTO="email@example.com" 5 * * * * www-data /usr/bin/php5 /path/to/piwik/console core:archive --url=http://example.org/piwik/ > /home/example/piwik-archive.log
The Piwik archive script will run every hour (at 5 minutes past). Generally, it completes in less than one minute. On larger websites (10,000 visits and more), Piwik archiving can take up to 30 minutes.
Breakdown of the parameters:
- MAILTO=”firstname.lastname@example.org” If there is an error during the script execution, the script output and error messages will be sent to the email@example.com address.
- www-data is the user that the cron job will be executed by, it should generally be your web server user. It is sometimes “apache”.
- /usr/bin/php5 is the path to your PHP executable. It varies depending on your server configuration and operating system. You can execute the command “which php5” or “which php” in a linux shell, to find out the the path of your PHP5 executable. If you don’t know the path, ask your web host or sysadmin.
is the only required parameter in the script, which must be set to your Piwik base URL eg. http://analytics.example.org/ or http://example.org/piwik/
> /home/example/piwik-archive.log is the path where the script will write the output. You can replace this path with /dev/null if you prefer not to log the last piwik cron output text. The script output contains useful information such as which websites are archived, how long it takes to process for each date & website, etc.
2>& /home/example/piwik-archive-errors.log is the optional path where the script will write the error messages. If you omit this from the cron tab, then errors will be emailed to your MAILTO address. If you write this in the crontab, then errors will be logged in this specified error log file.
Description of the ‘linux cron’ utility: The cron utility uses two different types of configuration files: the system crontab and user crontabs. The only difference between these two formats is the sixth field.
- In the system crontab, the sixth field is the name of a user for the command to run as. This gives the system crontab the ability to run commands as any user.
- In a user crontab, the sixth field is the command to run, and all commands run as the user who created the crontab; this is an important security feature.
If you set up your crontab as a user crontab, you would instead write:
5 * * * * /usr/bin/php5 /path/to/piwik/console core:archive --url=http://example.org/piwik/ > /dev/null
This cron job will trigger the day/week/month/year archiving process at 5 minutes past every hour. This will make sure that when you visit your Piwik dashboard, the data has already been processed; Piwik will load quickly.
Test the cron command: Make sure the crontab will actually work by running the script as the crontab user in the shell:
su www-data -c "/usr/bin/php5 /path/to/piwik/console core:archive --url=http://example.org/piwik/"
You should see the script output with the list of websites being archived, and a summary at the end stating that there was no error.
Windows: How to Set up Auto-Archiving Using Windows Scheduler
To open the task scheduler on Windows (XP, 7, 2003/2008 Server) click All Programs, point to Accessories, point to System Tools, and then click Scheduled Tasks.
Click ‘Add Scheduled Task’ and name the task e.g. “Piwik Auto Archiving”. Click on the tab ‘Trigger’ and add a new trigger. Select to create a trigger after a timetable, to be executed daily and every hour. Confirm the settings and switch to the action tab.
In the task properties, in the “Run” input field, you should input the command to run the archiving script, for example
C:\xampp\php\php.exe "D:\www\piwik\console" core:archive --url=http://piwik.example.org/
See also this screenshot of the Piwik archiving scheduled task properties window:
CPanel: How to Set up the Cron Script Using CPanel
It is easy to set up automatic archiving if you use a user interface such as CPanel, Webmin or Plesk. Here are the instructions for CPanel:
- Log in to CPanel for the domain with the Piwik installation
- Click on “Cron Jobs”
- Leave email blank
- In ‘Minutes’ put 00 and leave the rest blank.
You then need to paste in the path to the PHP5 executable, then the path to the Piwik /console script, then the parameter with your Piwik base URL –url=piwik.example.org/
Here is an example for a Hostgator install (in this example you would need to change ‘yourcpanelsitename’ to whatever your particular domains cpanel username is)
/usr/local/bin/php -f /home/yourcpanelsitename/public_html/piwik/console core:archive --url=example.org/piwik/ > /home/example/piwik-archive-output.log
“yourcpanelsitename” tends to be the first eight letters of your domain (unless you changed it when you set up your cpanel account)
6. Click “Add New Cron Job”
Piwik will process your reports automatically at the hour.
Web Cron When Your Web Host Does Not Support Cron Tasks
If possible, we highly recommend that you run a cron or scheduled task. However, on some shared hosting, or on particular server configurations, running a cron or scheduled task may not be easy or possible.
Some web hosts let you set up a web cron, which is a simple URL that the host will automatically visit at a scheduled time. If your web host lets you create a web cron, you can input the following URL in their hosting interface:
Replace the XYZ by the super user 32 characters token_auth. To find the token_auth, log in as a super user in Piwik, click on Administration link in the top menu, then click the API link in the left and the token_auth is displayed on the page.
You can test the web cron by pasting the URL in your browser, wait a few minutes for processing to finish and then check the output.
The web cron should be triggered at least once per hour. You may also use a ‘Website Monitoring’ service (free or paid) to automatically request this page every hour.
Important Tips for Medium to High Traffic Websites
Disable browser triggers for Piwik archiving and limit Piwik reports to updating every hour
After you have set up the automatic archive script as explained above, you can set up Piwik so that requests in the user interface do not trigger archiving, but instead read the pre-archived reports. Login as the super user, click on Administration > General Settings, and select:
- Allow Piwik archiving to trigger when reports are viewed from the browser: No
- Reports for today will be processed at most every: 3600 seconds
Click save to save your changes. Now that you have set up the archiving cron and changed these two settings, you can enjoy fast pre-processed near real-time reports in Piwik!
Today’s statistics will have a one hour lifetime, which ensures the reports are processed every hour (near real time)
Increase PHP Memory Limit
If you receive this error:
Fatal error: Allowed memory size of 16777216 bytes exhausted (tried to allocate X bytes)
you must increase the memory allocated to PHP. Update your php.ini file to increase your memory limit in the file
To give Piwik enough memory to process your web analytics reports, increase the memory limit to 512M:
memory_limit = 512M
More High Traffic Server Tips!
It is possible to track millions of pages per month on hundreds or thousands of websites using Piwik. Once you have set up cron archiving as explained above, there are other important and easy steps to improve Piwik performance.
For more information, see the High traffic server FAQ.
More Information About Piwik Archiving
- If you run archiving several times per day, it will re-archive today’s reports, as well as any reports for a date range which includes today: current week, current month, etc.
- Your Piwik database size will grow over time, this is normal. Piwik will delete archives that were processed for incomplete periods (i.e. when you archived a week in the middle of this week), but will not delete other archives. This means that you will have archives for every day, every week, every month and every year in the MySQL tables. This ensures a very fast UI response and data access, but does require disk space.
- Piwik archiving for today’s reports is not incremental: running the archiving several times per day will not lower the memory requirement for weeks, months or yearly archives. Piwik will read all logs for the full day to process a report for that day.
- Once a day/week/month/year is complete and has been processed, it will be cached and not re-processed by Piwik.
- If you don’t set up archiving to run automatically, archiving will occur when a user requests a Piwik report. This can be slow and provide a bad user experience (users would have to wait N seconds). This is why we recommend that you set up auto-archiving for medium to large websites (click for more information) as explained above.
By default, when you disable browser triggers for Piwik archiving, it does not completely disable the trigger of archiving as you might expect. Users browsing Piwik will still be able to trigger processing of archives in one particular case: when a Custom segment is used. To ensure that users of your Piwik will never trigger any data processing, in your config.ini.php file you must add the following setting below the
; disable browser trigger archiving for all requests (even those with a segment) browser_archiving_disabled_enforce = 1
Help for core:archive command
Here is the help output for this command:
$ ./console help core:archive Usage: core:archive [--url="..."] [--force-all-websites] [--force-all-periods[="..."]] [--force-timeout-for-periods[="..."]] [--skip-idsites[="..."]] [--force-idsites[="..."]] [--force-periods[="..."]] [--force-date-last-n="..."] [--force-date-range[="..."]] [--concurrent-requests-per-website[="..."]] [--disable-scheduled-tasks] [--accept-invalid-ssl-certificate] [--xhprof] Options: --url Mandatory option as an alternative to '--piwik-domain'. Must be set to the Piwik base URL. For example: --url=http://analytics.example.org/ or --url=https://example.org/piwik/ --force-all-websites If specified, the script will trigger archiving on all websites. Use with --force-all-periods=[seconds] to also process those websites that had visits in the last [seconds] seconds. ##Launching several processes with this option will make them share the list of sites to process. --force-all-periods Limits archiving to websites with some traffic in the last [seconds] seconds. For example --force-all-periods=86400 will archive websites that had visits in the last 24 hours. If [seconds] is not specified, all websites with visits in the last 604800 seconds (7 days) will be archived. --force-timeout-for-periods The current week/ current month/ current year will be processed at most every [seconds]. If not specified, defaults to 3600. --skip-idsites If specified, archiving will be skipped for these websites (in case these website ids would have been archived). --skip-all-segments If specified, all segments will be skipped during archiving. --force-idsites If specified, archiving will be processed only for these Sites Ids (comma separated) --force-periods If specified, archiving will be processed only for these Periods (comma separated eg. day,week,month) --force-date-last-n This script calls the API with period=lastN. You can force the N in lastN by specifying this value. --force-date-range If specified, archiving will be processed only for periods included in this date range. Format: YYYY-MM-DD,YYYY-MM-DD --force-idsegments If specified, only these segments will be processed (if the segment should be applied to a site in the first place). Specify stored segment IDs, not the segments themselves, eg, 1,2,3. Note: if identical segments exist w/ different IDs, they will both be skipped, even if you only supply one ID. --concurrent-requests-per-website When processing a website and its segments, number of requests to process in parallel (default: 3) --disable-scheduled-tasks Skips executing Scheduled tasks (sending scheduled reports, db optimization, etc.). --accept-invalid-ssl-certificate It is _NOT_ recommended to use this argument. Instead, you should use a valid SSL certificate! It can be useful if you specified --url=https://... or if you are using Piwik with force_ssl=1 --xhprof Enables XHProf profiler for this archive.php run. Requires XHPRof (see tests/README.xhprof.md). --help (-h) Display this help message