This is a minimal implementation of the paper SG-Bot: Object Rearrangement via Coarse-to-Fine Robotic Imagination on Scene Graphs (ICRA 2024), arxiv.
conda env create -f environment.yml
cd extension
python setup.py install
Please also install Pytorch. We test it with Pytorch 1.12.1 with CUDA 11.6.
Please refer to this page for downloading the data used in the paper and more information.
We set up two shape autoencoders called AtlasNet
and AtlastNet2
. AtlasNet
is trained with full shapes under canonical coordinates,
while AtlasNet2
is trained under the camera frame, which provides shape priors to the goal scene graph to guide the imagination. We also provide trained models downloaded here: trained AtlasNet and trained AtlasNet2.
-
For generating shapes
- Train
AtlasNet
. Need to adjust--batchSize
,--nepoch
to make the training optimal.
cd AtlasNet python training/train_AE_AtlasNet.py
-
Inference point clouds [optional]: run
AtlasNet/inference/run_AE_AtlasNet.py
. The results would store generated points underAtlasNet/log/atlasnet_separate_cultery/network
. -
Obtain point feature for training Graph-to-3D: run
AtlasNet/inference/create_features_gt.py
, and the features are stored inobjs_features_gt_atlasnet_separate_cultery.json
. The keys in the json file are the name of the objects, e.g., "cup_1", and the values are the latent features (128 dimensions).
- Train
-
For producing shape priors
- Store partial points in the initial scenes under the camera frame: This aims to train
AtlasNet2
. The files can be downloaded from here: partial_pcs. You can also modify the file path and runAtlasNet2/auxiliary/generate_partial_pc_for_object.py
. The final output are stored as pickle files underAtlasNet2/partial_pc_data
. - Split the trainval set: Function
generate_train_sample
inAtlasNet2/auxiliary/generate_partial_pc_for_object.py
splitsAtlasNet2/partial_pc_data
into train (90%) and test (10%). The file names are stored asAtlasNet2/partial_pc_data_splits.json
- Train
AtlasNet2
: The procedure is the same asAtlasNet
.
- Store partial points in the initial scenes under the camera frame: This aims to train
We built the scene generator based on Graph-to-3D, a GCN-VAE architecture. Different from the original Graph-to-3D, we leverage a shape-aware scene graph to make the generated shapes aligned with the observed shapes in the initial scene. We provide the trained model available here: trained graph_to_3d.
If you want to retrain the network, --batchSize
, --nepoch
, --exp
needs to be set with proper numbers.
cd graphto3d
python scripts/train_vaegan.py
More details can be found in the original repository.
There are two modes--robot
and oracle
. The robot
mode support a robot arm manipulating the objects according to the imagination. This mode needs a grasping pose prediction network, which we use Contact-GraspNet. This needs tensorflow downloaded.
pip install tensorflow-estimator==2.7.0 tensorflow-gpu==2.7.0
The checkpoints can be downloaded from the original repository or here. After download the checkpoints, move them to ./contact_graspnet
.
The oracle
mode does not need an agent, but just directly put objects in relative poses. To make the script work, one can modify the variable mode
inside, and then run:
python sgbot_pybullet.py
The results in the paper are under the oracle
mode. We directly use the pre-defined scene graph as the goal.
We provide a recorded rosbag to demonstrate the performance. To conduct this trial, MaskRCNN checkpoint needs to be downloaded from here. Additional requirements need to installed.