This repository contains code and technical details for the paper:
Discovering Object Attributes by Prompting Large Language Models with Perception-Action APIs (website)
Authors: Angelos Mavrogiannis, Dehao Yuan, Yiannis Aloimonos
Please cite our work if you found it useful:
@article{mavrogiannis2024discoveringobjectattributesprompting,
title={Discovering Object Attributes by Prompting Large Language Models with Perception-Action APIs},
author={Angelos Mavrogiannis and Dehao Yuan and Yiannis Aloimonos},
year={2024},
eprint={2409.15505},
archivePrefix={arXiv},
primaryClass={cs.RO},
url={https://arxiv.org/abs/2409.15505},
}
We describe our end-to-end framework for embodied attribute detection. The LLM receives as input a perception API with LLMs and VLMs as backbones, an action API based on a Robot Control API, a natural language (NL) instruction from a user, and a visual scene observation. It then produces a python program that combines LLM and VLM function calls with robot actions to actively reason about attribute detection.
The API consists of an
We integrate the perception-action API in different AI2-THOR household environments. In the Distance Estimation task (left) the robot has to identify which object is closer to its camera. We use the question ”which one is closer to me?” followed by the objects in question. Our API invokes an active perception behavior that computes the distance to an object by fixating on it with a call to
We integrate our framework into a RoboMaster EP robot with the