Navigating the Physical World: A Survey of Embodied Navigation
This is a repository of embodied navigation survey led by MSP group from Shanghai Jiao Tong University .
In this repository, you can learn the concept of embodied navigation and find the state-of-the-arts works related to this topic.
If you find this repository helpful, please consider Stars ⭐ or Sharing ⬆️.
1. Embodied Navigation Paradigm and Elements
Embodied navigation (EN) is a novel problem of conducting advanced egocentric navigation tasks through state estimation, task cognition and motion execution for an autonomous robot. Compared with traditional navigation, EN is stressed with the key features of 1) egocentric sensing, 2) interactive engagement with the environment through high degrees of freedom in actions and 3) high-level cognition for complex tasks.
Recently, an increasing number of works are proposed to solve the problems within the EN system. However, existing studies tend to focus on specific sub-issues, lacking a coherent and unified framework. Consequently, the systematic level and real-world application of EN now remain constrained due to the absence of a comprehensive problem formulation. To this end, we propose a unified formulation for EN, structured into five stages: transition, observation, fusion, task reward construction and action skills optimization (TOFRA).
Embodied Navigation
Traditional Navigation
Ego-centric
Global Axis
Multi Nodes, n-DoF
Single Node, <=6DoF
Evolved Motion Skills
Fixed Movement
Autonomous Task Decomposition and Multi-Task Joint Optimization
Manual Task Decomposition for Individual Optimization
First Principles
Engineering-Oriented Approach
Weak Metricity
Precise Metricity
Active Interaction Between Agent and Environment
Passive Perception
2. Interactive Perception
Surrunding Environment -Task
Algorithm
Modality
Object Type
Date
Publication
Paper Link
Code
DCGNN
LiDAR
Single-state 3D object
2023
CAIS
Link
---
ContrastZSD
CAM
Zero-shot Object
2024
IEEE TPAMI
Link
---
Gfocal
CAM
Dense object
2023
IEEE TPAMI
Link
---
DeepGCNs
Point cloud
---
2023
IEEE TPAMI
Link
code
GCNet
---
---
2023
IEEE TPAMI
Link
code
CNN hybrid module(CSWin+hybrid patch embedding module+slicing-based inference)
RGB image
objects in UAV images
2023
J-STARS
link
iS-YOLOv5
RGB image
small objects in autonomous driving
2023
Pattern Recognition Letters
link
ASIF-Net
RGB-D
Salient Object
2021
IEEE T Cybernetics
link
code
AdaDet(based on Early-Exit Neural Networks)
RGB image
...
2024
IEEE T COGN DEV SYST
link
memory network+causal intervention+Mask RCNN
RGB/grayscale image
object in different weather condition
2024
IEEE TPAMI
link
Res2Net
RGB image
object on 2D frames, especially Salient Object
2021
IEEE TPAMI
link
code
Algorithm
Modality
Date
Publication
Paper Link
Code
R2former
Cam
2023
CVPR
Link
Code
Eigenplaces
Cam
2023
ICCV
Link
Code
Anyloc
Cam
2023
RAL
Link
Code
Optimal transport aggregation for visual place recognition
Cam
2023
ArXiv
Link
Code
Seqot
LiDAR
2022
TIE
Link
Code
Lpd-net
LiDAR
2019
ICCV
Link
Bevplace
LiDAR
2023
ICCV
Link
Code
Adafusion
Cam-LiDAR
2022
RAL
Link
Mff-pr
Cam-LiDAR
2022
ISMAR
Link
Lcpr
Cam-LiDAR
2023
RAL
Link
Explicit Interaction for Fusion-Based Place Recognition
Cam-LiDAR
2024
ArXiv
Link
Algorithm
Modality
Semantic Type
Date
Publication
Link
Reinforcement Learning with Phase Transition Mechanism
Visual
Object Recognition and Goal Navigation
2023
arXiv
arXiv
Active Neural SLAM with Semantic Segmentation
Visual
Object Classification and Goal Localization
2022
NeurIPS
NeurIPS Proceedings
Reinforcement Learning with Communication and Feature Fusion Modules
Visual and Semantic Maps
Object and Scene Understanding
2021
arXiv
arXiv
Multitask Learning with Attentive Architecture
Visual, Audio, and Text
Multi-Modal Object and Scene Classification
2022
NeurIPS
NeurIPS Proceedings
Self-supervised Learning with Multi-Head Attention
Visual and Language
3D Object Recognition and Language Understanding
2022
arXiv
arXiv
Deep Reinforcement Learning
Visual
Scene and Object Classification
2021
CVPR
CVPR 2021
Curriculum Learning
Visual
Object and Scene Recognition
2020
ICLR
ICLR 2020
Vision-Language Models
Visual and Language
Object Detection and Language Understanding
2023
arXiv
arXiv
Semantic Mapping and Coordination
Visual and Semantic Maps
Object and Scene Classification
2022
IEEE Robotics and Automation Letters
IEEE Xplore
Scene Priors with Reinforcement Learning
Visual
Scene and Object Classification
2021
ICCV
ICCV 2021
Algorithm
Modality
Date
Publication
Paper Link
Code
Doppler-only Single-scan 3D Vehicle Odometry
Radar
2023
ArXiv
Link
PhaRaO
Radar
2020
ICRA
Link
RadarSLAM
Radar
2020
IROS
Link
4DRadarSLAM
Radar
2023
ICRA
Link
Code
LIC-Fusion
Lidar-IMU-Cam
2019
IROS
Link
LIC-Fusion 2.0
Lidar-IMU-Cam
2020
IROS
Link
Faster-LIO
Lidar-IMU
2022
RAL
Link
Code
LOAM
Lidar
2014
RSS
Link
LeGO-LOAM
Lidar
2018
IROS
Link
Algorithm
Modality
Date
Publication
Paper Link
Code
Fast-LIO
Lidar-IMU
2021
RAL
link
code
Swarm-LIO
LiDar-IMU
2023
ICRA
link
code
Vision-UWB fusion framework
UGV-assisted
2023
IEEE ICIEA
link
EKF+ IGG robust estimation
GNSS+INS+IMU+Force sensor
2024
MEAS SCI TECHNOL
link
code
Omni-Swarm(multidrone map-based localization+visual drone tracking)
VIO+UWB sensors+stereo wide-field-of-view cameras
2022
IEEE T ROBOT
link
code
EKF
ToF+?
2024
AMC
link
HDVIO
VIO+IMU+dynamics module
2023
RSS
link
Acoustic Inertial Measurement(AIM)
Acoustics(microphone array)
2022
ACM(SenSys)
link
Algorithm
Modality
Date
Publication
Paper Link
Code
Direct LiDAR Odometry
LiDAR
2022
RAL
Link
MIPO
IMU-Kinematics
2023
IROS
Link
Code
Robust Legged Robot State Estimation Using Factor Graph Optimization
Cam-IMU-Kinematics
2019
RAL
Link
VILENS
Cam-IMU-Lidar-Kinematics
2023
TRO
Link
Invariant Smoother for Legged Robot State Estimation
IMU-Kinematics
2023
TRO
Link
Cerberus
Cam-IMU-Kinematics
2023
ICRA
Link
Code
On State Estimation for Legged Locomotion Over Soft Terrain
IMU-Kinematics
2021
IEEE Sensors Letters
Link
Event Camera-based Visual Odometry
Event Cam-RGBD
2023
IROS
Link
Pronto
Cam-IMU-Lidar-Kinematics
2020
Frontiers in Robotics and AI
Link
Code
Legged Robot State Estimation With Dynamic Contact Event Information
IMU-Kinematics
2021
RAL
Link
Vision-Assisted Localization and Terrain Reconstruction with Quadruped Robots
Depth-IMU-Lidar
2022
IROS
Link
Algorithm
Modality
Semantic Type
Date
Publication
Link
Social Dynamics Adaptation (SDA)
Depth images, ResNet, Recurrent Policy Network
Human Trajectories, Motion Policy
2024
arXiv
Link
SMPL Body Model, Motion Retargeting
Motion Capture, SMPL Parameters
Human Motion, Humanoid Motion Imitation
2024
arXiv
Link
Humanoid Shadowing Transformer, Imitation Transformer
Optical Marker-based Motion Capture, RGB Camera
Human Body and Hand Data, Pose Estimation
2024
arXiv
Link
Remote Teleoperation Architecture
Fiber Optic Network, Virtual Reality Equipment
Teleoperation, Human-Robot Interaction
2022
arXiv
Link
POMDP, Reinforcement Learning
Motion Capture, Force-Controlled Actuators
Human Motion, Robot Locomotion
2024
arXiv
Link
Modular Learning Framework, Imitation Learning
Motion Capture, Human Demonstrations
Humanoid Behaviors, Task Learning
2021
IEEE Robotics and Automation Letters
Link
Zero-Shot Learning with CLIP Embeddings
RGB-D Camera
Object Navigation
2022
CVPR
Link
Reinforcement Learning
Visual Inputs (RGB Camera)
Open-World Navigation
2021
ICRA
Link
Reinforcement Learning with Gesture Recognition
Multimodal (Gestures, Visual Inputs)
Human-Robot Interaction
2023
CVPR
Link
Vision-Language Model, Self-Supervised Learning
Visual and Language Inputs
Instruction Following
2020
CVPR
Link
Simulation-Based Learning
Visual and Physical Simulation
Physical Interaction Prediction
2020
CVPR
Link
Collabrative Sensing -View
Algorithm
Modality
Date
Publication
Paper Link
Code
Graph‐based subterranean exploration path planning using aerial and legged robots
Cam-Depth-Lidar-Thermal-IMU
2020
Journal of Field Robotics
Link
Stronger Together
Cam-Lidar-GNSS
2022
RAL
Link
Code
VIO-UWB-Based Collaborative Localization and Dense Scene Reconstruction within Heterogeneous Multi-Robot Systems
Depth-Lidar
2022
ICARM
Link
Heterogeneous Ground and Air Platforms, Homogeneous Sensing: Team CSIRO Data61's Approach to the DARPA Subterranean Challenge
Cam-Lidar
2022
Field Robotics
Link
Aerial-Ground Collaborative Continuous Risk Mapping for Autonomous Driving of Unmanned Ground Vehicle in Off-Road Environments
Depth-Lidar-IMU
2023
TAES
Link
Code
Cooperative Route Planning for Fuel-constrained UGV-UAV Exploration
Cam-Lidar-GNSS
2022
ICUS
Link
Energy-Efficient Ground Traversability Mapping Based on UAV-UGV Collaborative System
Cam-Lidar
2022
TGCN
Link
Aerial-Ground Robots Collaborative 3D Mapping in GNSS-Denied Environments
Cam-Lidar
2022
ICRA
Link
Autonomous Exploration and Mapping System Using Heterogeneous UAVs and UGVs in GPS-Denied Environments
Cam-Depth-Lidar
2019
TVT
Link
Algorithm
Modality
Date
Publication
Paper Link
Code
Joint Optimization of UAV Deployment and Directional Antenna Orientation
2023
WCNC
Link
Multi-UAV Collaborative Sensing and Communication: Joint Task Allocation and Power Optimization
2023
TWC
Link
Decentralized Multi-UAV Cooperative Exploration Using Dynamic Centroid-Based Area Partition
Depth
2023
Drones
Link
Cooperative 3D Exploration and Mapping using Distributed Multi-Robot Teams
Lidar
2024
ICARSC
Link
RACER
Cam-IMU
2023
TRO
Link
Code
Fast Multi-UAV Decentralized Exploration of Forests
Depth
2023
RAL
Link
Code
Next-Best-View planning for surface reconstruction of large-scale 3D environments with multiple UAVs
Depth
2020
IROS
Link
An autonomous unmanned aerial vehicle system for fast exploration of large complex indoor environments
Cam-Lidar
2021
Journal of Field Robotics
Link
Code
Multi‑MAV Autonomous Full Coverage Search in Cluttered Forest Environments
Cam-Lidar
2022
Journal of Intelligent & Robotic Systems
Link
Algorithm
Modality
Date
Publication
Paper Link
Code
Hybrid Stochastic Exploration Using Grey Wolf Optimizer and Coordinated Multi-Robot Exploration Algorithms
2019
IEEE Access
Link
MR-TopoMap
Cam-Lidar
2022
RAL
Link
H2GNN
2022
RAL
Link
SMMR-Explore
Lidar-IMU
2021
ICRA
Link
Distributed multi-robot potential-field-based exploration with submap-based mapping and noise-augmented strategy
Lidar
2024
Robotics and Autonomous Systems
Link
CoPeD-Advancing Multi-Robot Collaborative Perception
Cam-Lidar-GNSS-IMU
2024
RAL
Link
Collaborative Complete Coverage and Path Planning for Multi-Robot Exploration
2021
Sensors
Link
Voronoi-Based Multi-Robot Autonomous Exploration in Unknown Environments via Deep Reinforcement Learning
Lidar
2020
TVT
Link
Multi-vehicle cooperative localization and mapping with the assistance of dynamic beacons in GNSS-denied environment
IMU-Lidar
2024
ISAS
Link
Global/Local Space -Representation
Algorithm
Based Structure
Date
Publication
Paper Link
Code
Point Cloud Library (PCL)
Point cloud
2011
ICRA
link
PointNet
Point cloud
2017
CVPR
link
PCT
Point cloud
2021
computational visual media
link
TEASER
Point cloud
2021
IEEE T ROBOT
link
SC-CNN
point cloud+hierarchical+anisotropic spatial geometry
2022
TRGS
link
code
PMP-Net++
Point cloud
2023
IEEE TPAMI
link
code
STORM
Point cloud
2023
IEEE TPAMI
link
Registration by Graph Matching
deep graph+point cloud
2023
IEEE TPAMI
link
code
CrossNet
RGB+grayscale+point cloud
2024
TMM
link
PointConT
point content-based Transformer
2024
JAS
link
code
Algorithm
Based Structure
Date
Publication
Paper Link
Code
Direct Voxel Grid Optimization
voxel grid
2022
CVPR
link
NICE-SLAM
multireslutional voxel grid
2022
CVPR
link
Instant neural graphics primitives with a multiresolution hash encoding
voxel grid hash encoding
2022
ACM Transactions on Graphics
link
Vox-Fusion
voxel grid with octree
2022
ISMAR
link
Occupancy Networks
occupancy grid
2019
CVPR
link
Algorithm
Based Structure
Date
Publication
Paper Link
Code
NeRF
MLP
2022
ACM Transactions on Graphics
link
3D-GS
3D-GS
2023
ACM Transactions on Graphics
link
NerF-LOAM
Neural-SDF
2023
ICCV
link
DeepSDF
MLP-SDF/TSDF
2019
CVPR
link
Algorithm
Modality
Date
Publication
Paper Link
Code
Egocentric Action Recognition by Automatic Relation Modeling
Egocentric RGB Videos
2023
TPAMI
link
Egocentric Human Activities Recognition With Multimodal Interaction Sensing
Egocentric RGB Videos+IMU
2024
IEEE Sensors Journal
link
Ego-Humans
Egocentric RGB Videos
2023
ICCV
link
E2(GO)MOTION
Egocentric Event Stream Videos
2022
CVPR
link
Towards Continual Egocentric Activity Recognition: A Multi-Modal Egocentric Activity Dataset for Continual Learning
Egocentric RGB Videos+IMU
2024
IEEE Transactions on Multimedia
link
MARS
IMU
2021
IEEE Internet of Things Journal
link
Multi-level Contrast Network for Wearables-based Joint Activity Segmentation and Recognition
IMU
2022
Globecom
link
Timestamp-Supervised Wearable-Based Activity Segmentation and Recognition With Contrastive Learning and Order-Preserving Optimal Transport
IMU
2024
TMC
link
Algorithm
Modality
Date
Publication
Paper Link
Code
MotionGPT
IMU
2023
NIPS
link
IMUGPT 2.0
IMU
2024
ArXiv
link
APSR framework
depth/3D joint information/RGB frame/IR sequence
2020
IEEE TPAMI
link
MS block+Res2Net
2D RGB images
2023
IEEE TCSVT
link
EM+Dijkstra
IMU
2020
IEEE T HUM-MACH SYST
link
MotionLLM
2024
arxiv
link
code
Seq2Seq+SeqGAN+RL+MC
CAM+Master Motor Map framework
2021
ICRA
link
KIT (a datasheet)
CAM+Master Motor Map framework
2017
Big Data
link
datasheet
Motion Patches+ViT framework
3D joint position+RGB 2D images
2024
arxiv
link
No.
Algorithm
Modality
Semantic Type
Date
Publication
Link
1
Deep Learning
Visual
Image Segmentation
02/2022
IEEE Transactions on Intelligent Transportation Systems
Link
2
CNNs
Visual
Image Understanding
06/2021
Neural Networks
Link
3
Semantic Localization and Mapping
Visual
Image Recognition
03/2023
Robotics and Autonomous Systems
Link
4
Vision-Based Learning
Visual
Image Recognition
05/2023
International Journal of Robotics Research
Link
5
Deep Learning
Visual
Image Analysis
08/2021
Pattern Recognition Letters
Link
6
Integrated Semantic Mapping
Visual
Image Recognition
04/2022
Robotics
Link
7
Deep Learning
Visual
Image Segmentation
02/2023
Journal of Field Robotics
Link
8
Advanced Semantic Analysis
Visual
Image Understanding
06/2023
Autonomous Robots
Link
No.
Algorithm
Modality
Semantic Type
Date
Publication
Link
1
PPO
Visual
Object Recognition
09/2021
arXiv
Link
2
XgX
Visual
Object Detection
11/2023
arXiv
Link
3
Deep RL
Visual
Object Recognition
01/2022
Journal of Intelligent & Robotic Systems
Link
4
Cross-Modal Learning
Visual/Textual
Object Detection
04/2022
IEEE Robotics and Automation Letters
Link
5
Goal-Oriented Exploration
Visual
Object Detection
03/2021
CVPR
Link
6
Deep RL
Visual
Object Segmentation
07/2022
Sensors
Link
7
Multi-Task Learning
Visual
Object Localization
12/2021
IEEE Transactions on Neural Networks and Learning Systems
Link
8
DCNNs
Visual
Scene Understanding
10/2022
Pattern Recognition
Link
9
Spatial Attention Mechanism
Visual
Object Detection
06/2021
Robotics and Autonomous Systems
Link
10
Real-Time Semantic Mapping
Visual
Object Recognition
05/2023
International Journal of Advanced Robotic Systems
Link
Algorithm
Date
Publication
Paper Link
Code
Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection
2018
IJRR
Link
Learning ambidextrous robot grasping policies
2019
Science Robotics
Link
Learning Synergies between Pushing and Grasping with Self-supervised Deep Reinforcement Learning
2018
IROS
Link
GraspNet-1Billion: A Large-Scale Benchmark for General Object Grasping
2020
CVPR
Link
Code
AnyGrasp: Robust and Efficient Grasp Perception in Spatial and Temporal Domains
2023
TRO
Link
Unsupervised Reinforcement Learning of Transferable Meta-Skills for Embodied Navigation
2020
CVPR
Link
UMI on Legs: Making Manipulation Policies Mobile with Manipulation-Centric Whole-body Controllers
2024
arXiv
Link
Code
DrEureka: Language Model Guided Sim-To-Real Transfer
2024
RSS
Link
Code
Humanoid Locomotion as Next Token Prediction
2024
arXiv
Link
Algorithm
Date
Publication
Paper Link
Code
Learning compositional models of robot skills for task and motion planning
2021
ISRR
Link
Code
Learning Manipulation Skills via Hierarchical Spatial Attention
2022
TRO
Link
Lifelong Robot Library Learning: Bootstrapping Composable and Generalizable Skills for Embodied Control with Language Models
2024
ICRA
Link
Code
SAGCI-System
2022
ICRA
Link
Pedipulate: Enabling Manipulation Skills using a Quadruped Robot’s Leg
2024
ICRA
Link
PhyPlan
2024
arxiv
Link
Code
Practice Makes Perfect: Planning to Learn Skill Parameter Policies
2024
RSS
Link
Code
Extreme Parkour with Legged Robots
2024
ICRA
Link
Code
WoCoCo: Learning Whole-Body Humanoid Control with Sequential Contacts
2024
arXiv
Link
HiLMa-Res: A General Hierarchical Framework via Residual RL for Combining Quadrupedal Locomotion and Manipulation
2024
IROS
Link
Robust and Versatile Bipedal Jumping Control through Multi-Task Reinforcement Learning
2023
RSS
Link
Real-World Humanoid Locomotion with Reinforcement Learning
2024
Science Robotics
Link
Algorithm
Modality
DoF
Date
Publication
Paper Link
Code
iPlanner
Depth
2-D
2023
RSS
Link
ViPlanner
RGB-D
2-D
2024
ICRA
Link
DTC: Deep Tracking Control
Depth
1/2-D
2024
Science Robotics
Link
Neural RRT*
RGB
2-D
2020
IEEE Transactions on Automation Science and Engineering
Link
Socially aware motion planning with deep reinforcement learning
Stereo RGB
2-D
2017
IROS
Link
Efficient Autonomous Exploration Planning of Large-Scale 3-D Environments
RGB
3-D
2019
RAL
Link
ArtPlanner: Robust Legged Robot Navigation in the Field
RGB-D
2.5-D
2021
Journal of Field Robotics
Link
Code
Perceptive Whole Body Planning for Multi-legged Robots in Confined Spaces
RGB-D
3-D
2021
Journal of Field Robotics
Link
Versatile Multi-Contact Planning and Control for Legged Loco-Manipulation
RGB-D
3-D
2023
Science Robotics
Link
Learning to walk in confined spaces using 3D representation
RGB-D/LiDAR
3-D
2024
ICRA
Link
Code
VLFM: Vision-Language Frontier Maps for Zero-Shot Semantic Navigation
RGB-D
2-D
2024
ICRA
Link
Code
Autonomous Navigation of Underactuated Bipedal Robots in Height-Constrained Environments.
RGB-D
3-D
2023
IJRR
Link
Morphological Collabration -Morphologic
Algorithm
Morphologic
Date
Publication
Paper Link
Code
Learning Robust Autonomous Navigation and Locomotion for Wheeled-legged Robots
Wheel-leg
2024
Science Robotics
Link
SytaB
Ground-Air
2022
RAL
Link
Aerial-aquatic robots capable of crossing the air-water boundary and hitchhiking on surfaces
ground-air-water
2022
Science Robotics
Link
Advanced Skills through Multiple Adversarial Motion Priors in
Reinforcement Learning
Wheel-leg
2023
ICRA
Link
Curiosity-Driven Learning of Joint Locomotion and Manipulation Tasks
Wheel-leg
2023
PMLR
Link
Offline motion libraries and online MPC for advanced mobility skills
Wheel-leg
2022
IJRR
Link
Whole-body mpc and online gait sequence generation for wheeled-legged robots
Wheel-leg
2021
IROS
Link
Skywalker
Ground-Air
2023
RAL
Autonomous and Adaptive Navigation for Terrestrial-Aerial Bimodal Vehicles
Ground-Air
2022
RAL
Link
ManyQuadrupeds: Learning a Single Locomotion Policy for Diverse Quadruped Robots
Quadrupedal
2024
ICRA
Link
Body Transformer: Leveraging Robot Embodiment for Policy Learning
Legged
2024
arXiv
Link
Learning Bipedal Walking on a Quadruped Robot via Adversarial Motion Priors
Legged
2024
arXiv
Link
Algorithm
Date
Publication
Paper Link
Code
RFUniverse: A Multiphysics Simulation Platform for Embodied AI
2023
arxiv
Link
Code
Call for Cooperators: Join Us in Advancing Embodied Navigation Research
We are excited to invite researchers and experts in the field of embodied navigation to collaborate on an innovative paper aimed at pushing the boundaries of autonomous navigation systems. Our goal is to explore the intersection of interactive perception, neuromorphic cognition, and evolutionary motion capabilities in the development of cutting-edge embodied navigation systems.(Contact: sjtu4742986@sjtu.edu.cn )