From Google Analytics to Matomo: Preserving Patron Privacy in Web Analytics

Aligning analytics with libraries' values

In March 2022, Google upended the world of web analytics. The version of Google Analytics in wide use since 2012 was given an expiration date of July 2023. At that point, their long-standing Universal Analytics system would be replaced by Google Analytics 4 (GA4). GA4 isn’t just an interface overhaul—it has challenged webmasters’ assumptions about the gathering and reporting of web traffic data. Migrating to GA4 is a complicated process of reinventing your web analytics wheel. Basic reports—like the number of people who clicked a link or how long patrons spend on a page—need to be rebuilt and restructured for GA4. With this work barreling toward us, it is a good opportunity to consider other web analytics platforms.

Matomo, an open source competitor to Google Analytics, is a more viable option than ever. It aligns with the values and ethics of libraries while still providing access to data and insights about how patrons use our digital services.

The way things were

Google Analytics was built on some assumptions that don’t mesh well with the principles and practices of libraries. It expects that we’re okay with:

  • Working around a system that’s built for e-commerce and sales goals

  • Sending info about our patrons’ web use to Google 

  • Relying on Google’s mysterious sampling algorithms to build our analytics reports and tables

The sampling algorithms were especially problematic on webpages with low levels of traffic. The data just wasn’t reliable, and had a strange affinity for multiples of six. At UNC Libraries, we saw 6, 12, and 18 page views in traffic reports so often that I no longer considered this data reliable. 


🌟 Learn more about getting the most out of data:


Matomo: a better option

Matomo bills itself as a “Google Analytics alternative that protects your data and your customers’ privacy,” and it lives up to that goal.

Matomo offers a similar model to WordPress and other open source tools: you can pay someone to host it for you, or you can put it on your own server.  At its core, you own Matomo’s data. You can move the data, analyze it, or even delete it. If you choose to host Matomo on your own server, you know with absolute certainty that you’re not sending your basic web traffic data to Google or anybody else. Matomo also builds its reports based on 100 percent of traffic data, instead of extrapolating via sampling algorithms as GA4 does.

Google AnalyticsMatomo
Google hosts the dataYou host the data
Google provides limited supportMatomo provides limited support
Custom reporting is includedCustom reporting has a recurring cost
Info about our users goes to GoogleInfo about our users stays with us
Reports are based on mysterious sampling methodsReports are based on 100% of collected web usage data
Comparing Google Analytics and Matomo at a glance

The care and feeding of Matomo

Hosting your own web analytics platform admittedly comes with responsibilities. Google Analytics has limited customization and doesn’t always line up with libraries’ professional ethics, but it’s also simple to set up and has no cost. Matomo inverts much of this: it lines up ethically, but requires more care and feeding to run and maintain.

By choosing Matomo, you’re committing the time and work of Library IT staff in support of your professional principles.

Matomo requires two main things: (1) a web server running PHP and (2) a MySQL database. Anybody who has set up their own WordPress site, or has worked with countless other open source software tools before, will be familiar with these requirements. 

You’ll want to consider that a web analytics database is by its nature going to require significant disk space. As a way to avoid having an endlessly growing pile of data, Matomo will prompt you to think about how much data you want to retain and for how long.

Kicking the tires

After the setup work, Matomo is thankfully straightforward to learn. Your familiarity from past work with Google Analytics will translate well to Matomo’s reports. You’ll see familiar terminology like Pageviews, Bounce Rate, and Time on Page. And you can be confident in the data: Matomo builds its reports based on all visitors’ traffic, with no sampling algorithm involved.

A screenshot of Matomo's user interface, showing pages, page views, average time on page, and other data
Matomo’s dashboard

Setup tips

Once you’re up and running, consider these configuration changes and enhancements to improve your web analytics workflows:

Conclusion

Matomo isn’t a comprehensive, one-step solution to patron privacy, but it’s a big step in the right direction. Google can still get some patron data from vendors who use Google Analytics in their subscription resources, and vendors themselves may be harvesting patron data for their own analytics uses. These are issues to advocate for. Looking for a perfect solution shouldn’t keep you from taking smaller steps in the meantime. Keeping libraries’ website usage data out of big tech analytics data pools is still an important win for patron privacy. Matomo makes it possible.


🔥 Sign up for LibTech Insights (LTI) new post notifications and updates.

📅 Join us for a free webinar on AI prompt engineering for librarians.

✍️ Interested in contributing to LTI? Send an email to Deb V. at Choice with your topic idea.