If you need to access Piwik RAW data and import it in your data warehouse there are two options: HTTP API data export, or direct database export.
1) Export all data using our HTTP API
Using our HTTP API you can export all your individual users data and all users actions (visits, pages, clicks, events, downloads, custom dimensions, user location, user information, and more). The API used to export all your Raw data is called Live.getLastVisitsDetails and lets you export all the user and clickstream data for a given website and a given date. Optionally a Custom Segment when you need to export specific segment of users). For example if you wanted to export the full dataset for yesterday, you would use the following API call:
- Replace the domain name
demo.piwik.orgby your Piwik Analytics URL
anonymousby the token_auth of your API user. We recommend you create a new user in Piwik, for example called ‘api-data-warehouse-export’ and grant this user a ‘view’ permission on the website you’re exporting to data warehouse, and then use this user’s token_auth in your API calls.
&filter_limit=-1parameter makes sure that all the data for this day will be returned.
- The data can be exported in JSON, XML, CSV, and more. Replace
&formal=xmlby your favorite data format.
- This API could be called once a day and all output data can be imported in your data warehouse.
Important note about high traffic Piwik servers
if you export a lot of data, for example more than 10,000 visits and their associated actions and clickstream, then the HTTP request above may take a long time or even time out (depending on your server and PHP settings). Therefore when you export a lot of data, we recommend that you only export 10,000 at a time and use our paging
&filter_offset= feature. Here is how this works: to export the first 10,000 records, instead of
filter_limit=-1 you would write
filter_limit=10000&filter_offset=0. Then, to export the next dataset, you would write
filter_limit=10000&filter_offset=10000. Then to export the next dataset, you will write
filter_limit=10000&filter_offset=20000. You repeat this until there is no more result in the dataset.
2) Direct read-only access to the MySQL database
Another solution is to directly access the Piwik Analytics MySQL database where all your data is stored. This solution should be faster to import a lot of data, especially when your data warehouse supports data import from MySQL. We recommend to create a Read-only mysql user who can only accces the Piwik database and read data from it (not allowed to write). You will find more information about the Piwik database structure in our developer guide. Note that the database schema may change in the future.
If you need more information about Data warehousing best practises, please contact the Piwik professional support team.