Open Bandit Pipeline: a python library for bandit algorithms and off-policy evaluation


pypi Python Downloads GitHub commit activity GitHub last commit License arXiv

[arXiv] [NeurIPS2021 Proceedings]

Open Bandit Pipeline: a research framework for off-policy evaluation and learning

Docs | Google Group | Tutorial | Installation | Usage | Slides | Quickstart | Open Bandit Dataset | 日本語

Table of Contents


Open Bandit Dataset (OBD)

Open Bandit Dataset is a public real-world logged bandit dataset. This dataset is provided by ZOZO, Inc., the largest fashion e-commerce company in Japan. The company uses some multi-armed bandit algorithms to recommend fashion items to users in a large-scale fashion e-commerce platform called ZOZOTOWN. The following figure presents the displayed fashion items as actions where there are three positions in the recommendation interface.

Recommended fashion items as actions in the ZOZOTOWN recommendation interface

The dataset was collected during a 7-day experiment on three “campaigns,” corresponding to all, men's, and women's items, respectively. Each campaign randomly used either the Uniform Random policy or the Bernoulli Thompson Sampling (Bernoulli TS) policy for the data collection. Open Bandit Dataset is unique in that it contains a set of multiple logged bandit datasets collected by running different policies on the same platform. This enables realistic and reproducible experimental comparisons of different OPE estimators for the first time (see Section 5 of the reference paper for the details of the evaluation of OPE protocol using Open Bandit Dataset).

The small size version of our data is available at obd. We release the full size version of our data at Please download the full size version for research uses. Please also see obd/ for the detailed dataset description.

Open Bandit Pipeline (OBP)

Open Bandit Pipeline is an open-source Python software including a series of modules for implementing dataset preprocessing, policy learning methods, and OPE estimators. Our software provides a complete, standardized experimental procedure for OPE research, ensuring that performance comparisons are fair and reproducible. It also enables fast and accurate OPE implementation through a single unified interface, simplifying the practical use of OPE.

Overview of the Open Bandit Pipeline

Open Bandit Pipeline consists of the following main modules.

  • dataset module: This module provides a data loader for Open Bandit Dataset and a flexible interface for handling logged bandit data. It also provides tools to generate synthetic bandit data and transform multi-class classification data to bandit data.
  • policy module: This module provides interfaces for implementing new online and offline bandit policies. It also implements several standard policy learning methods.
  • simulator module: This module provides functions for conducting offline bandit simulation. This module is necessary only when you use the ReplayMethod to evaluate online bandit policies. Please refer to examples/quickstart/online.ipynb for a quickstart guide of implementing OPE of online bandit algorithms.
  • ope module: This module provides generic abstract interfaces to support custom implementations so that researchers can evaluate their own estimators easily. It also implements several basic and advanced OPE estimators.

Supported Bandit Algorithms and OPE Estimators

Bandit Algorithms (click to expand)
OPE Estimators (click to expand)

Please refer to Section 2 and the Appendix of the reference paper for the standard formulation of OPE and the definitions of a range of OPE estimators. Note that, in addition to the above algorithms and estimators, Open Bandit Pipeline provides flexible interfaces. Therefore, researchers can easily implement their own algorithms or estimators and evaluate them with our data and pipeline. Moreover, Open Bandit Pipeline provides an interface for handling real-world logged bandit data. Thus, practitioners can combine their own real-world data with Open Bandit Pipeline and easily evaluate bandit algorithms' performance in their settings with OPE.


You can install OBP using Python's package manager pip.

pip install obp

You can also install OBP from source.

git clone
cd zr-obp
python install

Open Bandit Pipeline supports Python 3.7 or newer. See pyproject.toml for other requirements.


Example with Synthetic Bandit Data

Here is an example of conducting OPE of the performance of IPWLearner as an evaluation policy using Direct Method (DM), Inverse Probability Weighting (IPW), Doubly Robust (DR) as OPE estimators.

# implementing OPE of the IPWLearner using synthetic bandit data
from sklearn.linear_model import LogisticRegression
# import open bandit pipeline (obp)
from obp.dataset import SyntheticBanditDataset
from obp.policy import IPWLearner
from obp.ope import (
    InverseProbabilityWeighting as IPW,
    DirectMethod as DM,
    DoublyRobust as DR,

# (1) Generate Synthetic Bandit Data
dataset = SyntheticBanditDataset(n_actions=10, reward_type="binary")
bandit_feedback_train = dataset.obtain_batch_bandit_feedback(n_rounds=1000)
bandit_feedback_test = dataset.obtain_batch_bandit_feedback(n_rounds=1000)

# (2) Off-Policy Learning
eval_policy = IPWLearner(n_actions=dataset.n_actions, base_classifier=LogisticRegression())
action_dist = eval_policy.predict(context=bandit_feedback_test["context"])

# (3) Off-Policy Evaluation
regression_model = RegressionModel(
estimated_rewards_by_reg_model = regression_model.fit_predict(
ope = OffPolicyEvaluation(
    ope_estimators=[IPW(), DM(), DR()]

Performance of IPWLearner estimated by OPE

A formal quickstart example with synthetic bandit data is available at examples/quickstart/synthetic.ipynb. We also prepare a script to conduct the evaluation of OPE experiment with synthetic bandit data in examples/synthetic.

Example with Multi-Class Classification Data

Researchers often use multi-class classification data to evaluate the estimation accuracy of OPE estimators. Open Bandit Pipeline facilitates this kind of OPE experiments with multi-class classification data as follows.

# implementing an experiment to evaluate the accuracy of OPE using classification data
from sklearn.datasets import load_digits
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
# import open bandit pipeline (obp)
from obp.dataset import MultiClassToBanditReduction
from obp.ope import OffPolicyEvaluation, InverseProbabilityWeighting as IPW

# (1) Data Loading and Bandit Reduction
X, y = load_digits(return_X_y=True)
dataset = MultiClassToBanditReduction(X=X, y=y, base_classifier_b=LogisticRegression(random_state=12345))
dataset.split_train_eval(eval_size=0.7, random_state=12345)
bandit_feedback = dataset.obtain_batch_bandit_feedback(random_state=12345)

# (2) Evaluation Policy Derivation
# obtain action choice probabilities of an evaluation policy
action_dist = dataset.obtain_action_dist_by_eval_policy(base_classifier_e=RandomForestClassifier(random_state=12345))
# calculate the ground-truth performance of the evaluation policy
ground_truth = dataset.calc_ground_truth_policy_value(action_dist=action_dist)

# (3) Off-Policy Evaluation and Evaluation of OPE
ope = OffPolicyEvaluation(bandit_feedback=bandit_feedback, ope_estimators=[IPW()])
# evaluate the estimation performance (accuracy) of IPW by the relative estimation error (relative-ee)
relative_estimation_errors = ope.evaluate_performance_of_estimators(
{'ipw': 0.01827255896321327} # the accuracy of IPW in OPE

A formal quickstart example with multi-class classification data is available at examples/quickstart/multiclass.ipynb. We also prepare a script to conduct the evaluation of OPE experiment with multi-class classification data in examples/multiclass.

Example with Open Bandit Dataset

Here is an example of conducting OPE of the performance of BernoulliTS as an evaluation policy using Inverse Probability Weighting (IPW) and logged bandit data generated by the Random policy (behavior policy) on the ZOZOTOWN platform.

# implementing OPE of the BernoulliTS policy using log data generated by the Random policy
from obp.dataset import OpenBanditDataset
from obp.policy import BernoulliTS
from obp.ope import OffPolicyEvaluation, InverseProbabilityWeighting as IPW

# (1) Data Loading and Preprocessing
dataset = OpenBanditDataset(behavior_policy='random', campaign='all')
bandit_feedback = dataset.obtain_batch_bandit_feedback()

# (2) Production Policy Replication
evaluation_policy = BernoulliTS(
    is_zozotown_prior=True, # replicate the policy in the ZOZOTOWN production
action_dist = evaluation_policy.compute_batch_action_dist(
    n_sim=100000, n_rounds=bandit_feedback["n_rounds"]

# (3) Off-Policy Evaluation
ope = OffPolicyEvaluation(bandit_feedback=bandit_feedback, ope_estimators=[IPW()])
estimated_policy_value = ope.estimate_policy_values(action_dist=action_dist)

# estimated performance of BernoulliTS relative to the ground-truth performance of Random
relative_policy_value_of_bernoulli_ts = estimated_policy_value['ipw'] / bandit_feedback['reward'].mean()

A formal quickstart example with Open Bandit Dataset is available at examples/quickstart/obd.ipynb. We also prepare a script to conduct the evaluation of OPE using Open Bandit Dataset in examples/obd. Please see our documentation for the details of the evaluation of OPE protocol based on Open Bandit Dataset.


If you use our dataset and pipeline in your work, please cite our paper:

Yuta Saito, Shunsuke Aihara, Megumi Matsutani, Yusuke Narita.
Open Bandit Dataset and Pipeline: Towards Realistic and Reproducible Off-Policy Evaluation


  title={Open Bandit Dataset and Pipeline: Towards Realistic and Reproducible Off-Policy Evaluation},
  author={Saito, Yuta and Shunsuke, Aihara and Megumi, Matsutani and Yusuke, Narita},
  journal={arXiv preprint arXiv:2008.07146},

The paper has been accepted at NeurIPS2021 Datasets and Benchmarks Track. The camera-ready version of the paper is available here.

Sister Package: pyIEOE

In addition to OBP, we develop a Python package called pyIEOE, which allows practitioners to easily evaluate and compare the robustness of OPE estimators.

Please also see the following reference paper about IEOE (accepted at RecSys'21).

Yuta Saito, Takuma Udagawa, Haruka Kiyohara, Kazuki Mogi, Yusuke Narita, Kei Tateno.
Evaluating the Robustness of Off-Policy Evaluation

Google Group

If you are interested in the Open Bandit Project, please follow its updates via the google group:


Any contributions to Open Bandit Pipeline are more than welcome! Please refer to for general guidelines how to contribute to the project.


This project is licensed under the Apache 2.0 License - see the LICENSE file for details.

Project Team



For any question about the paper, data, and pipeline, feel free to contact:


Papers (click to expand)
