Type at least 1 character to search
Back to top

Uplift modellingn, an analysis by Julie Vidalis


By Julie Vidalis

When we consider company performance measures, the most utilised metrics that executive boards and shareholders continue to use in the triple bottom line are the financial indicators, i.e. sales revenue growth and profitability. This is closely tied to improvement in customer satisfaction metrics, increased innovation, improvements in direct and indirect costs, diversified revenue streams, and other well documented and studied metrics that directly impact on corporate growth and sustainability.

As a company continues on its growth trajectory, complexity in its product mix is often introduced as it adjusts its revenue and cost model. This is particularly the case in the retail sector, where a company may have many tens of thousands of different brands and variations of product ranges offered.

Let’s consider the above sector, with an example of a retail company’s goal of increasing total sales revenue and profitability per store, by adjusting its product mix in these individual stores. Getting the product mix wrong impacts on the customer’s shopping experience and can result in lost sales, both direct and indirect. This responsibility often falls on Category Managers, who need to understand the true impact of their product range review decisions on customer satisfaction, and ultimately, store sales and profitability.

Reasons for an uplift in sales cannot always be measured directly, as there are multiple factors that could be affecting an increase, or a decrease, in sales. The observed sales after a new product mix is introduced in store, is not necessarily exclusively as a result of changing the product mix. COVID-19 is a case in point, where a significant external factor that is out of the control of retailers, has affected thousands of store sales, regardless of the product mix decisions implemented.

The solution is to use uplift modelling.

The aim of uplift modelling is to predict the incremental impact of a single change, or treatment, on a specific dependent variable using predictive modelling, by isolating the effect of a change in one independent variable. This approach is based on the Rubin model of causal inference which states


where Y is the outcome (dependent) variable.

A common approach to quantify the Causal Effect is the use of the Traditional approach. This approach uses the observed sales after the product mix has been implemented as 𝑌(treated). A machine learning forecasting algorithm is used to predict sales to simulate what could have happened if the product mix remained the same and becomes 𝑌(treated). The final uplift is calculated by subtracting the observed sales from the predicted sales.

However, this is not necessarily an accurate representation of uplift in sales. The main disadvantage of this approach is that there is no concrete way to show that the change in product offering was the only variable affecting sales.

The uplift modelling approach was developed to address the drawback of the Traditional Approach which doesn’t take any measures to isolate the effect of a single change. Much research has gone into uplift modelling used in classification problems, but it is not as commonly used in regression analysis such as when assessing sales.

A method that is a positive adaptation to uplift modelling for a regression analysis problem is the Single Model With Treatment Variable Indicator approach.

In the case of a retailer changing the product mix, the change that is being introduced is a change in product mix and the metric is sales, which is a continuous dependent variable. The sales data that is used is grouped according to whether it was collected Pre or Post Implementation of the change in product mix, and is identified in the data set by an independent variable called the Treatment Indicator.

In this approach, a machine learning forecasting algorithm is trained on a subset of data, where data is obtained from both Pre and Post Implementation of the new product range for a category. One of the indicators on which the model is trained is a treatment variable. This variable acts as a flag to highlight which data points come from Pre and Post Implementation. For this example, let’s assume the treatment variable is called the Post Implementation flag, and it is set to one for all data points collected after the implementation of the new product range, and zero for data points collected before implementation of the new product range.

A single trained model is used to predict sales twice. The first prediction set is generated for sales, while keeping the post implementation flags set to either one or zero. This process is repeated, but first all treatment variables are set to zero to represent the alternative future if the change in product range never occurred.

By doing so, the model only considers the effect of changing one independent variable. In this case, it is considering the effect of introducing a new product range on sales. The sum of the difference between the first and second set of predictions becomes the uplift sales amount.

Even though there are various factors that must be considered during the development of the model to ensure that the uplift signal is not lost, it is a great method to isolate the effect of a decision that has been implemented, and therefore better understand how these decisions have impacted the business.