Docs › How to setup auto archiving of your reports every night?
If your website is more than a few hundreds visits per day, waiting for Piwik to process your data could take more than a couple of seconds. You can setup a nightly cron so that your data is automatically processed every night.
How to setup a crontab to automatically archive overnight?
In order to trigger automatically the Piwik archives at night, you can setup a crontab. A crontab is a time-based scheduling service in Unix-like server. You need SSH access to your server in order to set it up:
# crontab -e
and then add the lines:
MAILTO="youremail@example.com"
5 0 * * * www-data /path/to/piwik/misc/cron/archive.sh > /dev/null
www-data is the user the cron job will be executed under, it should generally be your webserver user. It is sometimes "apache". All error messages resulting from the cron job execution will be sent to the youremail@example.com address.
The cron utility uses two different types of configuration files, the system crontab and user crontabs. The only difference between these two formats is the sixth field. In the system crontab, the sixth field is the name of a user for the command to run as. This gives the system crontab the ability to run commands as any user. In a user crontab, the sixth field is the command to run, and all commands run as the user who created the crontab; this is an important security feature. If you setup your crontab as a user crontab, you would instead write:
5 0 * * * /path/to/piwik/misc/cron/archive.sh > /dev/null
This cron job will trigger the day / week / month / year archiving process at 00:05AM every day. This will make sure that when you visit your Piwik Interface, the data has already been processed: Piwik will load fast.
Make sure the crontab will actually work by executing the command
# sh /path/to/piwik/misc/cron/archive.sh
in the shell. You should see XML output containing your number of visits for each date.
Tip: Increase the memory
If you have the error Fatal error: Allowed memory size of 16777216 bytes exhausted (tried to allocate X bytes), you can increase the memory allocated to PHP. Update your /etc/php5/cli/php.ini file, and set for example:
memory_limit = 128M
This will only affect PHP CLI, your webserver configuration stays unchanged. The crontab archiving should now work, and Piwik should be very fast!
Tip for medium to High traffic websites
In your Piwik configuration file config/config.ini.php, you can add the following at the top of the file:
[General]
time_before_today_archive_considered_outdated = 3600
enable_browser_archiving_triggering = false
This will disable the automatic archiving from the browser, making sure your Piwik users don't trigger the heavy archiving process. Also, today's statistics will have a one hour lifetime, which ensures the reports are not processed too often.
More information about Piwik archiving
- Archiving several times per day will only result in Today's reports being updated more often. It will not change the memory requirement for other periods: Piwik archiving is not incremental.
- Data size growing is expected. Piwik will delete archives that were processed for imcomplete periods (ie. when you archived a week in the middle of this week), but will not delete other archives. You will therefore have archives for every day, every week, every month and every year in the mysql tables. They ensure very fast UI response and data access, but this requires disk space. In the future one can imagine a plugin that would delete some of the old data (for example, only keep the top 50 rows for each report).
- At this point, archiving doesn't delete logs. In the future, these logs will either be deleted or rotated in other tables or files.
- There is an issue with memory and Piwik archiving. This might be an issue on very large Piwik installations.
- If you don't setup archiving to run automatically, archiving will occur when a user requests a Piwik report. This will often be slow and a bad user experience (users would have to wait N seconds) hence why we recommend to setup auto archiving for medium to large websites as explained above.