We have 4 different display advertising campaigns. We would like to evaluate how effective each advertising campaign is in generating sales
We have (anonymized) data which contains 10000 users who clicked on at least one of the display ads from 4 different campaigns a, b, c, or d. Purchase is indicated by the "Conversion" variable (i.e., equals 1 if there is purchase and 0 otherwise). The Value
column indicates the revenue in dollars earned from each purchase. The cost per click for each campaign:
- a: 7
- b: 5
- c: 4
- d: 2
The order of clicks is as indicated in the data. Data example:
user_id | click_1 | click_2 | click_3 | click_4 | click_5 | Conversion | Value |
---|---|---|---|---|---|---|---|
001 | a | b | NaN | d | NaN | 0 | 0 |
002 | c | c | a | b | a | 1 | 100 |
We would like to build a statistical attribution model to know:
- Which campaign is the most successful in terms of unit sales contributed?
- What is the return on investment for each campaign?
- How would you optimize the spend of a given budget of $1 million across all four campaigns?
We model the user's click sequence as a Markov Chain, where each state represents a campaign or a conversion state. The Markov Chain captures the transition probabilities between different states, allowing us to analyze the user's journey from initial ad clicks to potential conversion.
(Source: setosa.io)
We define the following states in our Markov Chain:
- Campaign states:
a
,b
,c
,d
- Null state:
Null
(representing no click) - Conversion states: 0 (no conversion), 1 (conversion)
Based on frequency, we can calculate the transition probabilities (when a state changes from one to another, or when a state remains). An example can be illustrated below:
The training state based on frequency can give us transition matrix
To determine the contribution of each campaign to the conversions and revenue, we calculate the removal effect of each campaign. The removal effect measures the impact of removing a campaign from the Markov Chain on the overall conversion probability. For each campaign, we remove it from the Markov Chain by setting the transition probabilities from the removed campaign to other states and from other states to the removed campaign to 0. We then calculate the conversion probability without the removed campaign and compare it to the original conversion probability. The difference between the two probabilities gives us the removal effect of that campaign.
Let
-
$X_t$ : the state of the Markov Chain at time step$t$ -
$X_0$ : the initial state of the Markov Chain -
$P(X_t = 1 | X_0 = \text{start})$ : Probability of reaching the conversion state (state 1) starting from the initial state -
$P(X_t = 1 | X_0 = \text{start}, P_{\text{removed}}^{(i)})$ : probability of reaching the conversion state starting from the initial state, with campaign$i$ removed from the Markov Chain
The removal effects of each campaign:
a
: -0.137 (Removing campaigna
would decrease the conversion probability by 13.7%)b
: -0.135 (Removing campaignb
would decrease the conversion probability by 13.5%)c
: -0.143 (Removing campaignc
would decrease the conversion probability by 14.3%)d
: -0.144 (Removing campaignd
would decrease the conversion probability by 14.4%)
Based on removal effects, we attribute the total conversions and revenue to each campaign. The attribution is proportional to the removal effect of each campaign. We calculate the attributed conversions and revenue for each campaign based on their relative contribution to the total removal effect.
Let
-
$\text{RE}(i)$ : removal effect of campaign$i$ -
$\sum_{j=1} \text{RE}(j)$ : sum of removal effects of all campaigns -
$\frac{\text{RE}(i)}{\sum_{j=1} \text{RE}(j)}$ : proportion of the removal effect of campaign$i$ relative to the total removal effect of all campaigns -
$\text{Total Conversions}$ : total conversions across all campaigns -
$\text{Total Revenue}$ : total revenue generated across all campaigns
The revenue attribution of each campaign:
a
: 21567 (~24%)b
: 21210 (~24%)c
: 22463 (~26%)d
: 22691 (~26%)
With total cost of each campaign ($ = \text{cost per click} \times \text{total number click}$), we can calculate return on investment (%) for each campaign:
a
: -58%b
: -44%c
: -24%d
: 54%
as can be seen, only campaign d
makes a profit. This is because campaign d
has the cheapest cost per click (CPC), while each campaign almost got the same number of total click (~7.5k). Nevertheless, this doesn't mean we only invest campaign d
and opt out of the others. As stated above, if we removed any single campaign, the conversion probabiliy would be decreased by 14%
To optimize the budget allocation across the campaigns, we formulate an optimization problem using linear programming. The objective is to maximize the total attributed revenue while satisfying the budget constraints.
Assuming retun on investment of each campaign is constant, the optimization problem is set up as follows:
Decision variables:
- Total budget constraint:
$x_a + x_b + x_c + x_d$ = total budget - Non-negativity constraints: all
$x_a$ ,$x_b$ ,$x_c$ ,$x_d$ > 0 - Removal effect constraints: Ensure that the attributed revenue for each campaign is consistent with the removal effects
Let:
-
$\mathbf{x} = [x_a, x_b, x_c, x_d]^T$ be the vector of decision variables representing the budget allocation for each campaign -
$\mathbf{r} = [r_a, r_b, r_c, r_d]^T$ be the vector of attributed revenue per unit budget for each campaign where$\mathbf{r} = (1 + \text{return on investment (per-unit)}) \times \text{cost}$ -
$\mathbf{RE} = [\text{RE}_a, \text{RE}_b, \text{RE}_c, \text{RE}_d]^T$ be the vector of removal effects for each campaign -
$\mathbf{AR} = [\text{AR}_a, \text{AR}_b, \text{AR}_c, \text{AR}_d]^T$ be the vector of attributed revenue for each campaign.
The optimization problem can be formulated as follows:
- Objective function:
$$\text{Maximize: } \mathbf{r}^T \mathbf{x}$$
Subject to:
- Total budget constraint (assuming total budget =
$1 million): $ $\mathbf{1}^T \mathbf{x} = \text{total budget}$$ - Removal effect constraints:
$$\mathbf{AR} = \frac{\mathbf{RE}}{\mathbf{1}^T \mathbf{RE}} \odot (\mathbf{r}^T \mathbf{x})$$
The optimization problem can be solved using linear programming techniques, to obtain the optimal budget allocation
The outcome gave us the allocated budget to each campaign:
a
: 384497 (38%)b
: 283741 (28%)c
: 221652 (22%)d
: 110110 (11%)
- High Variance: Markov chain model relies on frequency-based probabilities. It requires a large amount of data to accurately estimate the transition probabilities between states. If the data is limited or too many noises, the model's accuracy could be skrewed. New or rare occurrences can significantly impact the probabilities, making the model sensitive to outliers or changes in user behavior.
- Memoryless: This linear Markov chains assume that the future state depends only on the current state, not on the sequence of events that preceded it. In the context of marketing attribution, the user's previous interactions and the sequence of touchpoints may have a significant impact on their decision to convert. The Markov chain model does not capture this long-term dependency explicitly. User journeys and states may be more complex, with multiple touchpoints, non-linear paths with external factors.
- Stationarity: Markov chains often assume that the transition probabilities remain constant over time (stationarity assumption). In practice, user behavior and preferences may evolve, and the effectiveness of marketing campaigns may change over time.
- Independence: Removal effect of a campaign assume the independence on the overall conversion probability, i.e. the removal of a campaign does not affect the effectiveness of other campaigns. It may not capture the synergistic effects or interactions between campaigns.