This document provides an introduction to the database schema of Piwik, explaining the different tables and architecture choices, and linking to the related source files when necessary.
The DB schema changes regularly due to added functionality and performance improvements. We havent got an up to date DB schema, if you generate one please let us know so we can include it here
The database has been designed with emphasis on simplicity, efficiency and data modularity. The database contains different sections
Piwik tracks visitors, page views (ie ‘actions’), conversions and Ecommerce products. Tracking is implemented in the most efficient way possible, each SQL query should use an INDEX or do minimal work on the DB. However, tracking in Piwik issues many UPDATE and INSERT statements.
An action (page view) is defined by a name (“homepage”, “/blog/hello-world”) and a type (an integer that defines page/download/etc) (see the class Piwik_Tracker_Action for more information).
A new action by a visitor creates a record in log_link_visit_action containing the idaction and the idvisit. This table also contains a field idaction_ref and time_spent_ref_action used to process the “time on page” metric, among others.
URLs are recorded in a “lookup table” piwik_action. Using a hash matching algorithm, URLs are uniquely identified with an integer instead of duplicating the URL for each page view.
When tracking Goals, each conversion is recorded in the table piwik_log_conversion.
When tracking Ecommerce orders and abandoned carts, the visitor cart is kept as a row in piwik_log_conversion.
Individual items (products) are recorded in piwik_log_conversion_items. When a cart is updated, the actual row for this visit’s cart in piwik_log_conversion is updated. Items in piwik_log_conversion_items can be set to deleted=1
A user is defined by:
A token_auth is generated and is used to sign API calls.
A user has an access level (‘view’ or ‘admin’ or ‘no access’ or the user is ‘super admin’ and has admin access by default) on a given idsite.
A website is defined by an idsite, a main_url and is linked to site_url so it can have several alias url.
An archive in Piwik is the aggregate of data for a given period. It’s the result of the logs being processed into meaningful data (see the section Statistics Logger).
A row in this archive_* table contains some data for a given date/period on a website. For example, a record could contain the list of countries on the website idsite = 3 for the week of the January, 7th 2008.
There are two different tables because there are two data types possible in Piwik archives:
The table archive_numeric_* is used to store plain numbers. The value field has a FLOAT type which means you can save integers and floats. For example, it is used to record the number of visitors over a given period and the number of distinct search engines keywords used.
The table archive_blob_* stores anything that is not a number. A BLOB is a binary data type that can contain anything from strings and compressed strings to serialized arrays and serialized objects. For example, it is used to store the search engine keywords that the visitors used over a given period and the visitors’ browsers.
A record (row) in these archive tables is automatically handled by the classes Piwik_ArchiveProcessing_Record*. There are different classes in this file:
The archiving logic can be found in the class Piwik_ArchiveProcessing:
It is important to note that most of the real archiving processing is actually done within plugins that hook on special events (see such hooks in ArchiveProcessing_Day search for the call Piwik_PostEvent()).
Loading an archive (and launching the archive processing, if necessary) is done via Piwik_Archive.
For performance reasons, the tables are partitioned by month. This means that new tables will be created for each month to ensure that the data is evenly partitioned. If there was only one table containing all the data, it would become huge and lookups would be very slow. Partitioning is done by the class Piwik_TablePartitioning.
See also the monthly partitioning class used for this table: Piwik_TablePartitioning_Monthly.
A different table structure is used for FLOAT and BLOB because it makes it very fast to look up the integer/float values. SQL SELECT is very fast because the tables are light (and the rows of archive_numeric_* have a fixed length). For example, we need to select the number of visitors for the last 30 days very quickly.
The tables logger_error, logger_message, logger_api_call and logger_exception are used to log miscellaneous information.
The table log_profiling is used to store profiling information about the SQL queries. You can enable profiling using Piwik_Tracker_Db::enableProfiling() and output the profiling analysis using Piwik::printSqlProfilingReportTracker().