Introduction to QA in Piwik

Like any piece of good software, Piwik comes with a comprehensive QA suite that includes unit and integration tests. The unit tests make sure core components of Piwik work properly. The integration tests make sure Piwik’s tracking and report aggregation and APIs work properly.

To complete our QA suite, we’ve recently added a new type of tests: Screenshot tests, that we use to make sure Piwik’s controller and JavaScript code works properly.

This blog post will explain how they work and describe our experiences setting them up; we hope to show you an example of innovative QA practices in an active open source project.

Screenshot Tests

As the name implies, our screenshot tests (1) first capture a screenshot of a URL, then (2) compare the result with an expected image. This lets us test the code in Piwik’s controllers and Piwik’s JavaScript simply by specifying a URL.

Contrast this with conventional UI tests that test for page content changes. Such tests require writing large amounts of test code that, at most, check for changes in HTML. Our tests, on the otherhand, will be able to show regressions in CSS and JavaScript rendering logic with a bare minimum of testing code.

Capturing Screenshots

Screenshots are captured using a 3rd party tool. We tried several tools before settling on PhantomJS. PhantomJS executes a JavaScript file with an environment that allows it to create WebKit powered web views. When capturing a screenshot, we supply PhantomJS with a script that:

  • opens a web page view,
  • loads a URL,
  • waits for all AJAX requests to be completed,
  • waits for all images to be loaded
  • waits for all JavaScript to be run.

Then it renders the completed page to an PNG file.

  • To see how we use PhantomJS see capture.js.
  • To see how we wait for AJAX requests to complete and images to load see override.js.

Comparing Screenshots

Once a screenshot is generated we test for UI regressions by comparing it with an expected image. There is no sort of fuzzy matching involved. We just check that the images consist of the same bytes.

If a screenshot test fails we use ImageMagick’s compare command line tool to generate an image diff:

Showing differences QA tests screenshots pixel by pixel comparison

In this example above, there was a change that caused the Search box to be hidden in the datatable. This resulted in the whole Data table report being shifted up a few pixels. The differences are visible in red color which gives rapid feedback to the developers what has changed in the last commit.

Screenshot Tests on Travis

We experienced trouble generating identical screenshots on different machines, so our tests were not initially automated by Travis. Once we surpassed this hurdle, we created a new github repo to store our UI tests and screenshots and then enabled the travis build for it. We also made sure that every time a commit is pushed to the Piwik repo, our travis build will push a commit to the UI test repo to run the UI tests.

We decided to create a new repository so the main repository wouldn’t be burdened with the large screenshot files (which git would not handle very well). We also made sure the travis build would upload all the generated screenshots to a server so debugging failures would be easier.

Problems we experienced

Getting generated screenshots to render identically on separate machines was quite a challenge. It took months to figure out how to get it right. Here’s what we learned:

Fonts will render identically on different machines, but different machines can pick the wrong fonts. When we first tried getting these tests to run on Travis, we noticed small differences in the way fonts were rendered on different machines. We thought this was an insurmountable problem that would occur due to the libraries installed on these machines. It turns out, the machines were just picking the wrong fonts. After installing certain fonts during our Travis build, everything started working.

Different versions of GD can generate slightly different images. GD is used in Piwik to, among other things, generate sparkline images. Different versions of GD will result in slightly different images. They look the same to the naked eye, but some pixels will have slightly different colors. This is, unfortunately, a problem we couldn’t solve. We couldn’t make sure that everyone who runs the tests uses the same version of GD, so instead we disabled sparklines for UI testing.

What we learned about existing screenshot capturing tools

We tried several screenshot capturing tools before finding one that would work adequately. Here’s what we learned about them:

  • CutyCapt This is the first screenshot capturing tool we tried. CutyCapt is a C++ program that uses QtWebKit to load and take a screenshot of a page. It can’t be used to capture multiple screenshots in one run and it can’t be used to wait for all AJAX/Images/JavaScript to complete/load (at least not currently).

  • PhantomJS This is the solution we eventually chose. PhantomJS is a headless scriptable browser that currently uses WebKit as its rendering engine.

    For the most part, PhantomJS is the best solution we found. It reliably renders screenshots, allows JavaScript to be injected into pages it loads, and since it essentially just runs JavaScript code that you provide, it can be made to do whatever you want.

  • SlimerJS SlimerJS is a clone of PhantomJS that uses Gecko as the rendering engine. It is meant to function similarly to PhantomJS. Unfortunately, due to some limitations hard-coded in Mozilla’s software, we couldn’t use it.

    For one, SlimerJS is not headless. There is, apparently, no way to do that when embedding Mozilla. You can, however, run it through xvfb, however the fact that it has to create a window means some odd things can happen. When using SlimerJS, we would sometimes end up with images where tooltips would display as if the mouse was hovering over an element. This inconsistency meant we couldn’t use it for our tests.

One tool we didn’t try was Selenium Webdriver. Although Selenium is traditionally used to create tests that check for HTML content, it can be used to generate screenshots. (Note: PhantomJS supports using a remote WebDriver.)

Our Future Plans for Screenshot Testing

At the moment we render a couple dozen screenshots. We test how our PHP code, JavaScript code and CSS makes Piwik’s UI look, but we don’t test how it behaves. This is our next step.

We want to create Screenshot Unit Tests for each UI control Piwik uses (for example, the Data Table View or the Site Selector). These tests would use the Widgetize plugin to load a control by itself, then execute JavaScript that simulates events and user behavior, and finally take a screenshot. This way we can test how our code handles clicks and hovers and all sorts of other behavior.

Screenshots Tests will make Piwik more stable and keep us agile and able to release early and often. Thank you for your support & Spreading the word about Piwik!


Benaka M.

Benaka is a talented Software Developer and an active member of the Piwik development team. He has contributed several new features and countless bug fixes and performance improvements.