OFCOURSE is a simulated environment enables multi-agent reinforcement learning for order fulfillment.
This repository requires Python >= 3.7. Miniconda/Anaconda is our recommended Python distribution.
To get started:
- Clone this repository and move to the OFCOURSE directory:
>>> git clone https://github.com/GitYiheng/ofcourse.git && cd ofcourse
- Install the dependencies:
>>> pip install -r requirements.txt
>>> sh ./run_exp/exp1/run_exp1_ppo.sh
>>> sh ./run_exp/exp1/run_exp1_happo.sh
>>> sh ./run_exp/exp1/run_exp1_ippo.sh
>>> sh ./run_exp/exp1/run_exp1_clo.sh
>>> sh ./run_exp/exp2/run_exp2_ppo.sh
>>> sh ./run_exp/exp2/run_exp2_happo.sh
>>> sh ./run_exp/exp2/run_exp2_ippo.sh
>>> sh ./run_exp/exp2/run_exp2_clo.sh
For these two tasks, the fulfillment agents are defined in env/define_exp1_env.py and env/define_exp2_env.py.
# file name: main.py
from algo.runner import Runner # import runner
from algo.arguments import get_args # import argument parser
args = get_args() # parse arguments
runner = Runner(args) # create a runner instance with specified arguments
runner.run() # start learning or evaluation
Train happo
on exp1
:
>>> python main.py --env=exp1 --algo=happo --mode=learn --log_dir=runs/exp1_happo --seed=10
Monitor the training progress with TensorBoard:
>>> tensorboard --log_dir=runs
OFCOURSE is structured according to the format of OpenAI Gym. It is the standard API to communicate between reinforcement learning algorithms and environments.
from env.exp1_env import Exp1Env # import env
env = Exp1() # create an env instance
obs = env.reset() # start a new episode
num_steps = 10 # number of steps
for _t in range(num_steps):
sampled_actions = env.action_space.sample() # sample actions (not from algo)
obs, rewards, dones, _ = env.step(sampled_actions) # interact with env
if all(dones):
obs = env.reset() # start a new episode when current one ends
Customized fulfillment systems can be constructed in OFCOURSE. Here, we use Task 1 (Fulfillment of Physical and Virtual Orders in One System) from the paper as an example.
import numpy as np
from env.resource import Resource
from env.order import Order
from env.container import Buffer, Inventory
from env.operation import OpStore, OpRoute, OpConsoRoute, OpDispatch
from env.fulfillment_unit import FulfillmentUnit
from env.agent import Agent
from env.order_source import OrderSource
Before defining the fulfillment system, we first define the buffer length and inventory capacity.
# ---------- PARAMS ---------- #
buffer_len = 5
inventory_limit = 32
There are two agents in the fulfillment system. Agent 0 is consisted of 6 fulfillment units and agent 1 is composed of 4 fulfillment units, where they share the first three stages.
# ---------- AGENT 0 ---------- #
agent0 = Agent()
agent0.add_fulfillment_unit(agent0_layer5)
agent0.add_fulfillment_unit(agent0_layer4)
agent0.add_fulfillment_unit(agent0_layer3)
agent0.add_fulfillment_unit(agent0_layer2)
agent0.add_fulfillment_unit(agent0_layer1)
agent0.add_fulfillment_unit(agent0_layer0)
# ---------- AGENT 1 ---------- #
agent1 = Agent()
agent1.add_fulfillment_unit(agent1_layer3)
agent1.add_fulfillment_unit(agent1_layer2)
agent1.add_fulfillment_unit(agent1_layer1)
agent1.add_fulfillment_unit(agent1_layer0)
Taking the third stage (i.e. the consolidation warehouse) of agent 0 for example, it has two Containers and three Operations. Each Container has its associated Resource, in which we define Resource before attaching it to the corresponding Container. Here, one Container is an Inventory and another Container is a Buffer. In regard to Operations, we have one Operation for storing incoming Orders to the Inventory and two Operations for consolidating and dispatching Orders toward their destinated Buffers.
# 3RD STAGE IN AGENT 0
agent0_layer3 = FulfillmentUnit()
agent0_layer3_inventory_resource = Resource(constraint=32, normal_price=0.6, overage_price=2.0, occupied=0)
agent0_layer3_buffer0_resource = Resource(constraint=-1, normal_price=0.0, overage_price=0.0, occupied=0)
agent0_layer3_inventory = Inventory(resource=agent0_layer3_inventory_resource, inventory_limit=inventory_limit)
agent0_layer3_buffer0 = Buffer(resource=agent0_layer3_buffer0_resource, buffer_len=buffer_len)
agent0_layer3.add_container(container=agent0_layer3_inventory)
agent0_layer3.add_container(container=agent0_layer3_buffer0)
agent0_layer3_op0 = OpStore(buffers_orig=[agent0_layer3_buffer0], inventory_dest=agent0_layer3_inventory, op_price=0.1, op_time=1)
agent0_layer3_op1 = OpConsoRoute(buffers_orig=[agent0_layer3_buffer0], inventory_orig=agent0_layer3_inventory, buffer_dest=agent0_layer4_buffer0, op_price=4.0, op_time=3)
agent0_layer3_op2 = OpConsoRoute(buffers_orig=[agent0_layer3_buffer0], inventory_orig=agent0_layer3_inventory, buffer_dest=agent0_layer4_buffer1, op_price=8.0, op_time=2)
agent0_layer3.add_operation(operation=agent0_layer3_op0)
agent0_layer3.add_operation(operation=agent0_layer3_op1)
agent0_layer3.add_operation(operation=agent0_layer3_op2)
The order source is a mechanism that takes in the simulation step as its input and generates a set of order instances as its output. Currently, orders are placed according to a prescribed repeating pattern. External order source management will be added soon.
The fulfillment systems presented in the paper are inspired by practical problems: experiment 1 (fulfillment of physical and virtual orders in one system) originates from Cainiao's domestic fulfillment business and experiment 2 (cross-border order fulfillment) stems from the fulfillment business of AliExpress. Due to data disclosure regulation of the company, synthetic data is used for demonstration, which can be found in exp1 and exp2.
See docs/act_obs.md.
@inproceedings{zhu2023ofcourse,
title={OFCOURSE: A Multi-Agent Reinforcement Learning Environment for Order Fulfillment},
author={Yiheng Zhu and Yang Zhan and Xuankun Huang and Yuwei Chen and Yujie Chen and Jiangwen Wei and Wei Feng and Yinzhi Zhou and Haoyuan Hu and Jieping Ye},
booktitle={Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track},
year={2023},
url={https://openreview.net/forum?id=0RSQEh9lRG}
}