The process of creating good KPIs (I)

At ecommerce websites, it's never enough to put some products on sale, with a good website and wait until you make millions... Well, you can think this will work, but in long runs you'll need some kind of analytics that checks out that things are really done on the right way. , that marketing spents are wisely planned, that the product guarantees the costumers expectations,...

All this stuff is usually managed by data scientists, whose job is to  collect all the info on huge databases, export it into useful information  and resume it into some cells of a spreadsheet to be presented to future investors, marketing teams, managers, accountants,...

Due to the fact that, creating something with so concentrate data usually brings lots of problems, some of them growing exponentially when some of the others are solved because there are always more and more questions that want to be solved. We'll see on the following articles the process that goes from the extraction and definition of the stored data until the calculation of KPis (Key Performance Indicators) as well as how can our current KPIs forecast the future ones with some sucess guarantees. This first part of the articles will go for the tratment of the raw data, which is usually the part that takes more time due to the huge amount of data that has to be extracted and processed from databases.

Diagrama de relaciones entre KPIs

Diagrama de relaciones entre KPIs

Chart of the relationships between the KPis we'll define in the future

Gimme, gimme some data

As stated before, everything starts by comparing some data and their evolution along the time. It's never so easy to say 'the previous month we billed 1.000$, this month we've billed 2.000$ so our business is growing up'. If the marketing investment of the previous month was only 100$ (from which the company has recovered 1.000$) and the current month the spent has grow up to 5.000$ (from which only 2.000$ have been recovered), we can state that marketing money is going, literaly, to the trash... At the same time, for instance, if we're dealing with cosmetics, it usually happens that a big part of the income comes from women between 40 and 50 years from a certain country (lets suppose it's Spain) and that, the marketing campaigns have brought young men from 18 to 25 from France instead (which it'll explain the bad second month of the previous example). We see then, that raw data is useless without any context to trat it, to compare it, so we'll first of all define the variables of the breakdown we want to use in our fantastic spreadsheet.

Breakdown variables

On ecommerce sites, there are so many breakdown variables as the imagination of the Data Scientist team that is in charge of make the computations and forecasts. It's always good to note that, the more the breakdown variables we have to analyze, the more the cases we have on our spreadsheet, and that increases the time of the analysis. It increases so dramatically that, it can go from little minutes in case we have no breakdown variables to weeks and weeks, which makes impossible to take a decision in time. Here is then our first dilemma: detail vs time.

Detail vs time

On a spreadsheet, where data is not breakdown with any variable, we lose a lot of detail that why something is as it is (we bill more from men or women, from which country and age, from which source are they coming to our website,...) but we have instead really fast analysis. On the other hand, on a spreadsheed with dozens or hundreds of breakdown variables, the analysis will be really detailed (analysts could even found the gold reef for that company) but it can take days and days to be performed and to explode so exhaustive amount of information.

Some breakdown variables

We've already stated that there can be a lot of different breakdown variables, but we'll only say here some of the more extended nowadays:

X=\{Countries\}\times\{Platforms\}\times\{Sources\}\times\{Genres\}\times \{AgeRanks\}\times\cdots

  • Country of the user: it's useful to see which countries are performing better than the others.
  • Platform: to know if the product is better on Android, on iOS, on website,...
  • Source: to know if it's better invest money on Google Adworks, on social networks, on television, radio,...
  • Gender: to know if it's better to give the product  a look for men or women.
  • Age range: to know what humor or sense we can give to the advertisement of the product.
  • ...

And it should be done month after month!

And all this breakdown information must be extracted, computed and analyzed month after month, so in a lot of cases, once the data scientists have finished a period of analysis, it starts the next one and they have to start all over again without having any break. Isn't it fantastic the life of a marketing Data Scientist?

Users data and monetization data

On the classical ecommerce business model, it usually starts with users that interact on the web and its products for free. From these people, just a few amount  them are finally converted to payers of the product. So, in this case, we can split the recopiled data onto two groups: the raw data  from the users (also called visitors) and the raw data from the payers (and both data also breakdown on all the variables considered on the previous step).  Let's see now the two kind of data an how we can extract information about each one of them.

Users data

Let's suppose we're analyzing month n, with n>1 (i.e. any month different from the first one). If we were dealing with the first month, all the variables related to the previous month are meaningless, so they can be set to zero for this purpose.

UserData_{n, x} =\begin{Bmatrix} Dau_n & Wau_n& Mau_n\\NewUsers_n& NewUsersRet_n&\\MktSpent_n&NewMktUsers_n&PayersRetained_n \end{Bmatrix}

with x \in X and n\in\mathbb{Z}^+

  1. Dau_n DAU o Daily Active Users, is the average of the amount of unique users that visit the site each day in the given month.
  2. Wau_n WAU o Weekly Active Users, is the average of the amount of unique users that visite the site in a week of the given month.
  3. Mau_n MAU o Monthly Active Users, is the number of unique users that visit the site of the given month.
  4. NewUsers_n New Users, is the amount of users that gets to the site for the first time in the given month.
  5. NewUsersRet_n New Users retained from previous month, are the users that arrived to the game the previous month (Nu_{n-1} ), and that have come to the site the current game (i.e. they are part of the current Mau_n ).
  6. MktSpent_n Marketing Spent, are the expenses  of the marketing team, that we'll suppose for simplicity that all of them come on the same currency (either €, $,... but all of them on the same one).
  7. NewMktUsers_n New users from marketing campaigns, are the new users of the current month that come to the site due to a marketing campaign with expenses.
  8. PayersRetained_n Retained Payers from previous month, are the users that payed on the website the previous month and that have come to the site again this month. Note that, despite that it has to deal with conversion, we only want here to come to the site, so this variable falls to the side of the users data.
Ejemplos de datos de usuarios y adquisición para crear unos buenos KPIs

Example of the users data and acquisition to create some good KPIs.

Conversion and monetization data

Similar to the previous case, we have here some variables we can look at ti see the conversion and monetization status from standard user to payer or premium user. We should note here that, there would be a new type of breakdown here: the month of the data, the breakdown variables and the type of the payment donde by the users:

UserData_{n, p, x} =\begin{Bmatrix}GrossRevenue_n&Payments_n&Payers_n\end{Bmatrix}

con x \in X, n\in\mathbb{Z}^+ y

p = (name, revShare, type)\in P = \{PaymentPlatforms\}

  1. PaymentPlatform  Payment Platform Name, a simple but useful way to categorize payers depending on their favorite payment way.
  2. RevShare RevenueShare, the percentage of the revenue we get to our bags after the part withdrawn by the payment platform (usually between 60 and 70% of the total).
  3. Type The type of the payment is usually set in one of the following categories:
    1. Direct, a direct payment through a credit card, Paypal or similar.
    2. Indirect,  is not really a true payment, but a way that has the user to earn something by doing some actions in the platform (completing a survey, ordering a pizza, registering at some website, watching some videos,...). This is usually known as Offer Walls.
    3. Ads, there are many companies that show ads at the websites so the company that owns the web earns a little amount of money for each ad seen.
  4. GrossRevenue_n, the gross income of the month  the give payment platform, p. . In fact, net revenue can be easily computed through the calculus GrossRevenue_n\times RevShare.
  5. Payments_n, the number of transactions of the current month.
  6. Payers_n, the number of unique payments of the current month. Obviously, each payer can make more than a payment, so the following equation must be ruled Payers_n\leq Payments_n.
Conversion data for KPis

Example of conversion and monetization data to create some good KPIs.

And we're ready to go dancing!

With all this information, we can infer and discover a huge amount of extra data that will help you to understand how a business works, if it grows up or gets stucked, if the marketing expenses are recovered, if the money is invested in a good way or not,... So, we'll start dancing on the following part:

Let's go dancing

Let's go dancing

At the second part of this article we will learn to combine all this raw data into some solid and good KPis. This way, with less than twenty numbers, we could combine them and find out more than fifty useful KPIs, indicators that will help us understanding the behaviour of our ecommerce.Its definition and construction will be hard, tedious, almost an odyssey, but since we have spoted the raw data at the beginning, and its all about its calculation, we'll be only combining them in a smart way to have good and useful data month after month.