Skip to content

The embodied intelligence of Shenhao's humanoid robot Xiaohao

Notifications You must be signed in to change notification settings

charliezcr/Xiaohao

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

60 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Xiaohao: Humanoid Robot's Embodied Intelligence

Project Overview

Welcome to the GitHub repository of Xiaohao's embodied intelligence system. Xiaohao is a humanoid robot designed by Hangzhou Shenhao Technology, serving as a guide in the exhibition hall. This is multi-agent system integrating a large language model and a vision language model at its core, supported by robotic arms, mobility wheels, and a camera. These agents communicate via ROS messages, allowing Xiaohao to respond to user prompts, make decisions, and interact with its environment.

In this repository, you will find the open-sourced code for the large language model and the camera, as well as the ROS publisher nodes responsible for orchestrating communications among the different components.

Repository Contents

  1. Language Processing Nodes:

    • cozy_chat.py: This node operates as a ROS subscriber to the 'wake' topic. Upon activation, it records audio, processes it using VAD (Voice Activity Detection), and converts the audio to text via the Paraformer(Please download it yourself from Modelscope by Alibaba, also for other Modelscope model I used in the initialization part) speech recognition model. The text is then processed by the large language model (Qwen) to generate responses, which are converted back to speech by using SAMBERT TTS to interact with users. This node also handles movement commands for the robot.
      • vad.py: Voice activity detection function utilized in cn_chat.py.
      • silero_vad.onnx: Open-source VAD model used throughout the project.
      • cozy_tts.py: Text-to-speech function used in cozy_chat.py.
      • llm_prompt.md: System prompt for the large language model.
      • vl_prompt.md: System prompt for the vision language model.
  2. Camera and Vision Processing:

    • rs_cam.py: This node manages the Intel® RealSense™ Stereo depth camera. It subscribes to the 'camera' topic and, upon receiving commands, captures color and depth images, identifies objects, and communicates with mobility components to navigate towards them.

Flow Chart

image

About

The embodied intelligence of Shenhao's humanoid robot Xiaohao

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages