-
Notifications
You must be signed in to change notification settings - Fork 14
/
README.rmd
227 lines (149 loc) · 10.5 KB
/
README.rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
---
output:
github_document:
pandoc_args: --webtex
---
[![R CMD checks](https://github.com/bachmannpatrick/CLVTools/workflows/R-CMD-check/badge.svg?branch=development)](https://github.com/bachmannpatrick/CLVTools/actions)
[![Tests](https://github.com/bachmannpatrick/CLVTools/workflows/testthat-tests/badge.svg?branch=development)](https://github.com/bachmannpatrick/CLVTools/actions)
[![CRAN Status](http://www.r-pkg.org/badges/version/CLVTools)](https://cran.r-project.org/package=CLVTools)
[![Repo Status](https://www.repostatus.org/badges/latest/active.svg)](https://www.repostatus.org/#active)
[![CRAN Downloads](https://cranlogs.r-pkg.org/badges/CLVTools)](https://cran.r-project.org/package=CLVTools)
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r setup, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
## The CLVTools Package
Today, customer lifetime value (CLV) is the central metric for valuing customers. It describes the long-term economic value of customers and gives managers an idea of how customers will evolve over time. To model CLVs in continuous non-contractual business settings such as retailers, probabilistic customer attrition models are the preferred choice in literature and practice.
The R package `CLVTools` provides an efficient and easy to use implementation framework for probabilistic customer attrition models in non-contractual settings. Building up on the learnings of other implementations, the package adopts S4 classes to allow constructing rich and rather complex models that nevertheless still are easy to apply for the end user. The framework is capable to accommodate a variety of probabilistic customer attrition models for non-contractual settings in continuous and discrete time.
Currently, CLVTools implements the following probabilistic models:
(1) Standard Pareto/NBD model (Schmittlein, Morrison & Colombo 1987)
(2) Pareto/NBD model with **time-invariant** contextual factors (Fader & Hardie 2007)
(3) Pareto/NBD model with **time-varying** contextual factors (Bachmann, Meierer & Näf 2021)
(4) Standard BG/NBD model (Fader, Hardie, & Lee 2005)
(5) BG/NBD model with **time-invariant** contextual factors (Fader & Hardie 2007)
(6) Standard Gamma/Gompertz/NBD (Bemmaor & Glady 2012)
(7) Gamma/Gompertz/NBD model with **time-invariant** contextual factors (Näf, Bachmann & Meierer 2020)
(8) Gamma/Gamma model to estimate customer spending (Colombo & Jiang 1999; Fader, Hardie & Lee 2005; Fader & Hardie 2013)
In future versions of `CLVTools` the following models are added. See [GitHub Issues](https://github.com/bachmannpatrick/CLVTools/projects) for a time-line.
(9) Standard BG/BB model (Fader, Hardie, & Shang 2010)
In addition the framework features a system of layers between the optimizer and the log-likelihood function to allow the flexible addition of model extensions during the model fitting process. Currently these layers include:
- Correlation of the purchase and the attrition process
- L2 regularization for parameters of contextual factors
- Equality constraints between parameters of contextual factors for the purchase and the attrition process.
## Installation
Install the most recent **stable release from CRAN**:
```
install.packages("CLVTools")
```
Install the **development version from GitHub** (using the `devtools` package):
```
devtools::install_github("bachmannpatrick/CLVTools", ref = "development")
```
To **compile the package from source**, please be advised that `CLVTools` relies on an external C++ library called `GSL`. This library has to be installed on your computer to be able to compile `CLVTools` from source. Follow these 3 steps:
1. Update to the latest version of R.
2. Install the external dependency (`GSL`):
*For Linux:*
:
```
apt-get update
apt-get install libgsl0-dev
```
If you are using an R Docker container with Linux (e.g. rocker/tidyverse), you can build up on these Docker images as follows
```
FROM rocker/tidyverse
RUN apt-get update -qq && apt-get -y install \
libgsl0-dev
```
Alternatively, follow the instruction in the section "Installing Dependencies external to the R system" at https://ropenscilabs.github.io/r-docker-tutorial/03-install-packages.html to install `GSL` in a running Docker container with Linux.
*For Mac with Intel chip:*
:
```
brew install gsl
```
*For Mac with Apple Silicon:*
:
1. Download gsl-latest.tar.gz from [https://ftp.gnu.org/gnu/gsl/](https://ftp.gnu.org/gnu/gsl/) and unzip it.
2. Navigate to the unziped folder, e.g. `cd ~/Downloads/gsl-2.7.1`
3. run the following commands line by line:
```
sudo make clean
sudo chown -R $USER .
./configure && make
make install
```
*For Windows:*
: First, install `RTools` through https://cran.r-project.org/bin/windows/Rtools/ (> v4.0). Next, use the new `RTools` package manager to install the `GSL` library (see https://github.com/r-windows/docs/blob/master/rtools40.md#readme) by using `pacman` through the `RTools Bash`:
```
pacman -S mingw-w64-{i686,x86_64}-gsl
```
3. Install the development version from source:
```
devtools::install_github("bachmannpatrick/CLVTools", ref = "development")
```
## General workflow
Independent of the latent attrition model applied in `CLVTools`, the general workflow consists of three main steps:
1. Create a `clv.data` object containing the dataset and required meta-information such as date formats and column names in the dataset. After initializing the object, there is the option to add additional information on covariates in a separate step.
2. Fit the model on the data provided.
3. Use the estimated model parameters to predict future customer purchase behavior.
![Workflow for `CLVTools`](vignettes/figures/WALKTHROUGH-steps.png)
`CLVTools` provides two ways for evaluating latent attrition models: you can use of the provided formula interface or you can use standard functions (non-formula interface). Both offer the same functionality, however the formula interface is especially helpful when covariates are included in the model. We will illustrate both options.
Reporting and plotting results is facilitated by the implementation of well-known generic methods such as `plot()`, `print()` and `summary()`. These commands adapt their output according to the model state and may be used at any point of the workflow.
## A Minimal Example
For detailed instructions and all available options and model variations see the detailed [walkthrough](https://www.clvtools.com/articles/CLVTools.html) and the [manual](https://www.clvtools.com/reference/index.html).
Start by loading the package:
```{r load-library}
library("CLVTools")
```
As Input data `CLVTools` requires customers' transaction history. Every transaction record consists of a purchase date and customer ID.
```{r load-data}
data("apparelTrans")
apparelTrans
```
Before we estimate a model, we are required to initialize a data object using the `clvdata()` command. The data object contains the prepared transactional data and is later used as input for model fitting. Additionally we specify options for the date and time units, estimation duration and variable names (see [Walkthrough](https://www.clvtools.com/articles/CLVTools.html) for details). Make sure to store the generated object in a variable, e.g. in our example `clv.apparel`.
```{r load-CreateObj}
clv.apparel <- clvdata(apparelTrans,
date.format="ymd",
time.unit = "week",
estimation.split = 40,
name.id = "Id",
name.date = "Date",
name.price = "Price")
```
Be aware that probabilistic models such as the ones implemented in CLVTools are usually applied to specific customer cohorts. That means, you analyze customer that have joined your company at the same time (usually same day, week, month, or quarter). For more information on cohort analysis, see also [here](https://en.wikipedia.org/wiki/Cohort_analysis). Consequently, the data apparelTrans in this example is not the full transaction records of a fashion retailer, but rather only the customer cohort of 250 customers purchasing for the first time at this business on the day of 2005-01-03.
As a first probabilistic latent attrition model we estimate the standard Pareto/NBD model and therefore, use the command `pnbd()` to fit the model and estimate model parameters. Other models such as the BG/NBD model (`bgnbd()`) and the GGomp/NBD (`ggomnbd()`) are also available.
### **Estimating the model using formula interface:**
```{r estimate-model-formula}
est.pnbd <- latentAttrition(~pnbd(),
data=clv.apparel)
est.pnbd
```
### **Estimating the model using non-formula interface:**
```{r estimate-model}
est.pnbd <- pnbd(clv.data = clv.apparel)
est.pnbd
```
You can always use `summary()` to get details on CLVTools object (also before they are estimated):
```{r param-summary}
#Full detailed summary of the parameter estimates
summary(est.pnbd)
```
Once the model parameters are estimated, we are able to predict future customer behavior on an individual level. To do so, we use `predict()` on the object with the estimated parameters (i.e. `est.pnbd`). In general, probabilistic customer attrition model predict three expected characteristics for every customer:
* "conditional expected transactions" (CET), which is the number of transactions to expect form a customer during the prediction period,
* "probability of a customer being alive" (PAlive) at the end of the estimation period and
* "discounted expected residual transactions" (DERT) for every customer, which is the total number of transactions for the residual lifetime of a customer discounted to the end of the estimation period.
If spending information was provided when initializing the clvdata-object, also "customer lifetime value" (CLV) is predicted.
```{r predict-model}
results <- predict(est.pnbd)
print(results)
```
`clvdata` objects may be plotted using the `plot()` command. Similar to `summary()`, the output of `plot()` adapts to the current modeling step.
```{r plot-model, fig.height=4.40, fig.width=9}
plot(est.pnbd)
```
## Contributions
Feedback and contributions to this package are welcome! Please use [GitHub Issues](https://github.com/bachmannpatrick/CLVTools/issues) for filing bug reports. Provide your contributions in the form of [Pull Requests](https://help.github.com/articles/about-pull-requests/). See also [these general guidelines](https://guides.github.com/activities/contributing-to-open-source/#contributing) to contribute to Open Source projects on GitHub.