Type at least 1 character to search
Back to top
uplift modelling

Uplift modelling, an analysis by Julie Vidalis

When considering corporate performance measures, financial indicators, i.e. revenue growth and profitability, are the most widely used indicators that boards and shareholders continue to use as part of the triple bottom line. These indicators are closely linked to improved measures of customer satisfaction, increased innovation, improved direct and indirect costs, diversification of revenue streams and other well-documented and researched measures that have a direct impact on business growth and sustainability.


By Julie Vidalis


As a company continues its growth trajectory, complexity in its product mix is often introduced as it adjusts its revenue and cost model. This is particularly the case in the retail sector, where a company may have tens of thousands of different brands and product line variations on offer.

Consider the above sector, with an example of a retail company whose objective is to increase total sales and profitability per shop, by adjusting its product assortment in these individual shops. A poor product mix affects the customer's shopping experience and can lead to lost sales, both direct and indirect. This is often the responsibility of Category Managers, who need to understand the real impact of their product range review decisions on customer satisfaction and ultimately on shop sales and profitability.

The reasons for an increase in sales cannot always be measured directly, as multiple factors can influence an increase or decrease in sales. Sales observed after the introduction of a new product range in the shop are not necessarily the exclusive result of a change in the product range. COVID-19 is a case in point, where an important external factor, beyond the retailers' control, affected thousands of in-store sales, irrespective of the product mix decisions made.

The solution is to use elevation modelling.

The objective of bottom-up modelling is to predict the differential impact of a single change, or treatment, on a specific dependent variable using predictive modelling, by isolating the effect of a change in an independent variable. This approach is based on Rubin's causal inference model which states

that :

where Y is the outcome (dependent) variable.

A common approach to quantifying the causal effect is to use the traditional approach. This approach uses the sales observed after the implementation of the product combination as ?(treated). A machine learning forecasting algorithm is used to predict sales to simulate what might have happened if the product mix had remained the same and becomes ?(processed). The final increase is calculated by subtracting observed sales from predicted sales.

However, this is not necessarily an accurate representation of the increase in sales. The main drawback of this approach is that there is no concrete way to show that the change in product supply was the only variable affecting sales.

The bottom-up modelling approach was developed to overcome the drawback of the traditional approach which takes no steps to isolate the effect of a single change. There has been a great deal of research on bottom-up modelling used in classification problems, but it is not as commonly used in regression analyses such as sales evaluation.

One method that is a positive adaptation to bottom-up modelling for a regression analysis problem is the single model approach with treatment variable indicator.

In the case of a retailer changing its product range, the change that is introduced is a change in product range and the measurement parameter is sales, which is a continuous dependent variable. The sales data used are grouped according to whether they were collected before or after the implementation of the product line change, and are identified in the dataset by an independent variable called the treatment indicator.

In this approach, a machine learning forecasting algorithm is trained on a subset of the data, where the data is obtained both before and after the implementation of the new product line for a category. One of the indicators on which the model is trained is a treatment variable. This variable acts as a flag to highlight the data points that come from the pre- and post-implementation. For this example, assume that the treatment variable is called the post-implementation indicator, and that it is set to one for all data points collected after the implementation of the new product line, and zero for data points collected before the implementation of the new product line.

A single trained model is used to predict sales twice. The first set of forecasts is generated for sales, while holding the post-implementation flags at one or zero. This process is repeated, but all treatment variables are first set to zero to represent the alternative future if the product line change never occurred.

In doing so, the model only takes into account the effect of a change in one independent variable. In this case, it considers the effect of the introduction of a new product line on sales. The sum of the difference between the first and second set of forecasts becomes the amount of increased sales.

Although various factors need to be taken into account when building the model to ensure that the bottom-up signal is not lost, it is an excellent method for isolating the effect of a decision that has been implemented and therefore better understanding how these decisions have impacted on the business.