Data Dictionary: How to Build one for your SaaS business

At the beginning of every startup, the most helpful data usually isn’t quantitative; it’s qualitative. You can’t measure very much in the early days for the simple reason you just don’t have enough data.

Analytics products like Google Analytics, Hotjar or Mixpanel won’t help you understand what problems your users really need to solve or what features you should prioritize after launch. At this stage, the only way for you to have a reasonable perception of the needs and the trends is to go out of the office and talk to people.

But when companies start to grow, they rapidly orient from a qualitative to a quantitive data collection approach. Which means they quickly start paying more attention to statistically significant numerical data than descriptive data, usually harder to analyze.

Some of these companies have enough engineering workforce and knowledge base to plan a solid data infrastructure, others will have to hire a team of data engineers and data analysts in charge of the overall data strategy.

In less than a year, a few of these companies will grow until the moment their team will include very different profiles and roles. Product managers, marketers, customer support leads, customer experience experts, sales managers and much more will join the ship.

Different people, in different teams, with different know-hows and different goals. In this context, it is no surprise to find that when all these different mindsets start to think about user data and metrics each one has his own interpretation and here’s when a lot of inconsistencies start scratching the surface.

In the most severe cases, these divergences might end up in begin a total mess.

What Active Users are for the data team, are Active Subscribers for the Marketing people and Engaged User for the Product team.

Different expressions that designate the same metrics and equal metrics explained by different expressions. Just like being in the same company without speaking the same language.


The Need for a Shared Data Dictionary


First, what is a Shared Data Dictionary?


A data dictionary is a shared collection of descriptions, expressions and definitions of the data objects for the benefit of product managers, data analysts, marketers, and others who need to refer to them. This collection is shared across each department of the company.


Second, why should your company adopt this?


  • Consistency

Everyone in the company will speak the same language. The metrics definitions are shared across all the teams. The data team will build up charts and dashboards, the product team will set product KPIs, the marketing team will build segments and user cohorts. They all will be using the same naming convention. Everyone is coordinated.

  • Easy Onboarding

When a document sum-up all the user metrics, everyone will benefit from it. No exceptions for the new hires who will shorten their ramp-up period.

  • Move fast

While building a Data Dictionary will require you some efforts, it’s something that’s worth your time and ensures a great return on investment. The hard thing though, is not to about writing a PDF and sharing it on Google Drive, the hard part is making sure your data dictionary is one of the bedrocks of your company’s culture and everyone recognizes it.

On Building a SaaS Metrics Shared Dictionary

At its core level every SaaS product needs a combination of these 4 fundamental ingredients:

  1. Attention — How to get the product in other people’s hands?
  2. Enrollment — How to get them to try the product as quick as possible?
  3. Stickiness — How to transform the product into people’s natural habit?
  4. Conversions — How to get users pay to live in that habit?

While in some cases there might be an overlap, we can argue that first three are related to User Engagement and the fourth is more related to User Subscription.

To build a SaaS Shared Dictionary we need to analyze the users lifecycles both from an Engagement perspective and from a Subscription perspective.

This will give you the full picture of how a user interacts with any given product along the way— from the moment he sees the product for the first time until he becomes a loyal customer.


SaaS Engagement Lifecycle

The engagement cycle explains what are the logics and the dynamics whereby the level of engagement vary along with the product adoption.

The Lifecycle Visualization

Here’s a sample of what an engagement lifecycle might be for a SaaS business.


SaaS Data Dictionary: engagement lifecycles



Primitive User Types

The primitive user types are composed of the building blocks of this schema. They are broken down in:

  • Visitors — people who visited the website (Paid, Search, Ads)
  • Email Subscribers — User who opted-in with his email address in a given form
  • User Signups — User who signed-up to your website
  • Trial Users — User who started a trial
  • User in TrialUser who are currently in trial
  • Onboarded Users — User who successfully completed the onboarding process
  • Active Users — User who had some sort of core interaction with your product (e.g. upload a new photo, added a new friend, created a note) in a given time window.
  • Inactive Users — User who didn’t do any relevant action in your product in a given time window (it’s easy to mistake it for Churn)
  • Deleted Users — User who deleted their account

While in this post we’re not going to talk about how you define what makes a user active or not, you should be really clear on how you define user activity and inactivity.

Derivate User Types

Combinations of different primitive user types will bring up derivative user metrics and new user segments.


A. Inactive User Segment


  • User Signup — then Inactive

Users who signed up but since then they did not take any sort of action in the product.

  • Trial User — then Inactive

Users who started a trial but after the end of it, they became inactive. It means that they didn’t do any relevant action in a given time window.

  • User in Trial — then Inactive

It collects all the users who are currently involved in a trial and are inactive. Even if they’ve started a trial they are not expressing any serious interest in your product. It means they might be more inclined to cancel before the last day of trial.

  • User Onboarded — then Inactive

Users that after the onboarding phase end up in being disengaged hence inactive.

  • Active User — then Inactive

Users that after being engaged initially, they stopped using the product and ended up in being inactive.

  • Never Active-User

Users who have never been active in your product. This metrics has no time frame.


B. Active User Segment


  • User Signed up —then-Active

Users that after they signed up for your product they actually were engaged and became active in a given time frame.

  • User Trial —then Active

Users who started a trial and after the end of it, they’re doing some relevant actions in the product. They are expressing some serious interest in your product. It means they might be more inclined to convert as new customers.

  • User in Trial — then Active

Users who are currently involved in a trial are active. They are expressing some serious interest in your product. It means they might be more inclined to convert as new customers.

  • User Onboarded —then Active

Users that after the onboarding phase end up in being engaged hence active. This metric gives you a good sense of how effective is your onboarding.

  • User Activated —then Active

Users that after they were engaged with the product, hence active, they somehow stopped using the product and end up in being inactive. (current state)

SaaS Subscription Lifecycle

The user subscription lifecycle explains how the user interacts with the product from a subscription perspective. A lot of fundamental metrics for SaaS business are derived from the Subscription lifecycle. Here we will focus only on those metrics who are strictly related to users.

The Lifecycle Visualization

First, let’s visualize a common subscription lifecycle for a SaaS business.

SaaS Data Dictionary: subscription lifecycles

User Types

The user type, from a subscription perspective, are usually divided into two main groups:


A. Non-paying Users


  • Non Paying User

This segment collects all the free users that are not paying at the moment. They might have paid in the past.

  • Users Never Paid

This segment collects all the free users. They never paid.

  • Expired Users

This segment collects all the expired users. They are currently expired.

  • Cancelled User

This segment collects all the cancelled users. It means that they cancelled their subscription but their billing period it’s not ended yet. Users who will not reactivate before the end of the billing cycle will expire their subscription.


B. Paying Users


  • First Time Paying User

Users who are in their first billing cycle.

  • Recurring Paying User

Users who are (at least) in their second billing cycle.

Conclusion Wrap-up

  1. All these metrics are not written in stone, so while you should take the basic skeleton that we show in this post, I’d recommend to tweak it according to your specific SaaS business
  2. Keep in mind that hard thing about data dictionaries is keeping them up to date over time. If your SaaS data dictionary goes out of date it’ll be impossible to use and it will inevitably restore the original chaos.
  3. Many people equate the engagement lifecycle to the subscription lifecycle. While there’s a certain overlap between the two they are two different ways to represent your users’ product adoption.
  4. Of the two lifecycles, the one which is mistakenly underrated is the engagement. You need to measure user engagement long before the users became customers, and measure customer activity long before the churn, to stay ahead of the game.