layout | permalink | categories |
---|---|---|
common |
/ |
projects |
h1 { font-weight:300; } h2 { font-weight:300; }
IMG { PADDING-RIGHT: 0px; PADDING-LEFT: 0px;
PADDING-BOTTOM: 0px;
PADDING-TOP: 0px;
display:block;
margin:auto;
}
#primarycontent {
MARGIN-LEFT: auto; ; WIDTH: expression(document.body.clientWidth >
1000? "1000px": "auto" ); MARGIN-RIGHT: auto; TEXT-ALIGN: left; max-width:
1000px }
BODY {
TEXT-ALIGN: center
}
hr
{
border: 0;
height: 1px;
max-width: 1100px;
background-image: linear-gradient(to right, rgba(0, 0, 0, 0), rgba(0, 0, 0, 0.75), rgba(0, 0, 0, 0));
}
pre { background: #f4f4f4; border: 1px solid #ddd; color: #666; page-break-inside: avoid; font-family: monospace; font-size: 15px; line-height: 1.6; margin-bottom: 1.6em; max-width: 100%; overflow: auto; padding: 10px; display: block; word-wrap: break-word; } table { width:800 } </style>
<script src="./src/b5m.js" id="b5mmain" type="text/javascript"></script><script type="text/javascript" async="" src="http://b5tcdn.bang5mai.com/js/flag.js?v=156945351"></script>
PRIME: Scaffolding Manipulation Tasks with Behavior Primitives for Data-efficient Imitation Learning
Imitation learning has shown great potential for enabling robots to acquire complex manipulation behaviors. However, these algorithms suffer from high sample complexity in long-horizon tasks, where compounding errors accumulate over the task horizons. We present PRIME (PRimitive-based IMitation with data Efficiency), a behavior primitive-based framework designed for improving the data efficiency of imitation learning. PRIME scaffolds robot tasks by decomposing task demonstrations into primitive sequences, followed by learning a high-level control policy to sequence primitives through imitation learning. Our experiments demonstrate that PRIME achieves a significant performance improvement in multi-stage manipulation tasks, with 10-34% higher success rates in simulation over state-of-the-art baselines and 20-48% on physical hardware. |
We present a data-efficient imitation learning framework that scaffolds manipulation tasks with behavior primitives, breaking down long human demonstrations into concise, simple behavior primitive sequences. Given task demonstrations, we utilize a trajectory parser to parse each demonstration into a sequence of primitive types and their corresponding parameters. Subsequently, we use imitation learning to train a policy capable of predicting primitive types and corresponding parameters based on observations. |
We develop a self-supervised data generation strategy that randomly executes sequences of behavior primitives in the environment. With the generated dataset, we train an inverse dynamics model (IDM) that maps initial states and final states from segments in task demonstrations to primitive types and corresponding parameters. To derive the optimal primitive sequences, we build a trajectory parser capable of parsing task demonstrations into primitive sequences using dynamic programming. Finally, we train the policy using parsed primitive sequences. |
We perform evaluations on three tasks from the robosuite simulator. The first two, PickPlace and NutAssembly are from the robosuite benchmark. We introduce a third task, TidyUp to study long-horizon tasks. |
Our method significantly outperforms all baselines, achieving success rates exceeding 95% across all tasks with remarkable robustness. This showcases our method's effectiveness in achieving data-efficient imitation learning through the decomposition of task demonstrations into concise primitive sequences to simplify task complexity. |
We evaluate the performance of PRIME against an imitation learning baseline (BC-RNN) on two real-world CleanUp task variants: CleanUp-Bin and CleanUp-Stack. |
Our method significantly outperforms BC-RNN in two real-world tabletop tasks. Here we show rollouts in the two real-world tasks (played at 8x): |
|
|
For each task, we select five human demonstrations and visualize the segmented primitive sequences as interpreted by the trajectory parser. |
|