This is the official implementation for the paper "MOKA: Open-World Robotic Manipulation through Mark-based Visual Prompting" (RSS 2024).
conda create -n moka python=3.10
conda activate moka
pip install -r requirements.txt
pip install -e .
Clone the Grounded-SAM repository and set some environment variables. Follow their instructions to if you want to use Docker.
git clone git@github.com:IDEA-Research/Grounded-Segment-Anything.git
export AM_I_DOCKER=False
export BUILD_WITH_CUDA=True
export CUDA_HOME=/path/to/cuda/
Then, install GroundingDINO and SAM. Some dependency versions are required to be compatible with MOKA, so please follow the instructions below:
python -m pip install -e segment_anything
pip install --no-build-isolation -e GroundingDINO
pip uninstall roboflow supervision
pip install roboflow supervision==0.15.0
Install Detectron2:
python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'
If you run into any issues, please refer to the original repositories for more detailed instructions.
Download DINO and SAM checkpoints, in MOKA root directory:
mkdir ckpts && cd ckpts
wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth
wget https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth
Please refer to a customized version of DROID and follow the instructions in the repository to install the necessary dependencies.
Before you run MOKA, remember to set the following environment variables:
export OPENAI_API_KEY=your_openai_key
Convert the stored data to RLDS format by referring to the examples shown in this repo. This dataset is compatible with the dataloader in octo, and can be used to finetune the pre-trained model.
Check our demo for a quick start on visual prompting. Try it out for your own tasks!
Run full MOKA pipeline on physical robot using the following command:
python franka_eval.py --config /path/to/config.yaml --data_dir /path/to/dataset_to_be_saved
For those who are not using the Franka robot and DROID platform, you can modify the moka/policies/franka_policy.py
for your own setup.
Our project couldn't have been possible without the following great works:
- The perception pipeline in MOKA is adapted from Grounded-SAM, GroundingDINO and SAM.
- The robot platform is built on top of DROID.
- We get the visual-prompting inspiration from SoM.