Taking emotion analysis and intelligent speech as the research object, in order to improve the current situation of intelligent furniture robot lack of emotional interaction, an intelligent companion robot is designed based on expression recognition and intelligent speech. The system mainly includes an expression recognition camera, an intelligent voice module, and a full color light cube display system. After actual testing, the system has fast response, high recognition accuracy, stable work, and a good experience, making people's lives more intelligent and user-friendly.
-
Expression Recognition Camera: RasberryPi 3B+, CameraPi
-
Intelligent Voice Module: intelligent speech based on FPGA, Microphone
-
Full-color Light-cube Display System: Including power switch circuit, control circuit and cascade drive circuit, and leave enough space to expand subsequent circuits.Especially, this display system come from my another projects: LightCube - A Design of 3D Dynamic Display System Based-on Voice Control
Emotion Cube
Function positioning of companion robot:
- Home entertainment The smart speakers equipped with technology terminals such as voice recognition and microphone make the companion robot more humane and improve the happiness of life;
- Popular science education The full-color light cube animation stimulates children's imagination of three-dimensional space, their interest in intelligent technology, and spreads knowledge of three-dimensional modeling;
- Emotional complement According to the results of sentiment analysis, Light Cube uses different melody music and animations to express their feelings Love makes interaction more lively and interesting and heals emotions.
Human-computer interaction that can produce emotional interaction with people is the real meaning of smart home. This work takes emotion analysis and intelligent speech as the research object, adopts comprehensive designs such as full-color light cube three-dimensional animation, facial expression recognition, voice recognition, etc., allowing technology to resonate with human emotions, alleviating people's life pressure, emotional healing, and creating a warm and comfortable Emotional home surroundings. Specifically in:
-
Improve the single function of traditional companion robots and lack of emotional communication;
-
The integration of dynamic expression recognition and naked-eye 3D display technology has brought new opportunities for emotional communication in the field of smart home, and also proposed a new direction of smart home, emotional interaction.
-
Comprehensive evaluation of multiple emotions can obtain a person's true emotional state, which is more beneficial to regulate emotions and relieve stress.
-
The display effect of three-dimensional animation can bring people a real visual enjoyment, and the naked-eye 3D sensory experience can satisfy people's capture and enrichment of the picture.
-
The cascaded drive circuit design of the high-end full-color light cube solves the problem of LED drive with a current that is difficult to meet the number of tens of thousands. This design solution can be used to meet a larger number of three-dimensional dot matrix displays.
To put it simply, facial expressions are part of the human body's (physical) language, which is a physical and psychological response, usually used to convey emotions. There are many kinds of human facial expressions. At present, the recognition of the six basic human expressions of happiness, surprise, sadness, anger, disgust and fear is relatively good. Facial expressions can be recognized better, and signs and commands are given to control people. The machine interactive device completes the corresponding action. Due to the complexity of human emotions, these expressions are not enough to fully determine the emotional fluctuations in a person's heart. To improve the accuracy of judgment, it is necessary to pass comprehensive evaluations such as heart rate detection and voice processing.
Figure 1. The effect of facial expression recognition
Speech recognition is realized by LDV5 speech module and FPGA speech sampling, filtering and recognition. After a long period of debugging and parameter setting, the recognition rate of commonly used sentences is high, and the voice chat function is completed through local semantic analysis and retrieval. Functions such as intelligent voice answering, light music playback, storytelling, and voice-controlled light cube display are now relatively complete, but the voice recognition rate needs to be improved, and the response time needs to be reduced.
Figure 2. 3D dynamic display effect diagram of light cube 12*12*12
After the Raspberry Pi is powered on, turn on the camera to perform dynamic expression recognition. When facial expressions are recognized, background music to adjust emotions is played. At the same time, send instructions to STM32 through Bluetooth communication to control the light cube to display different three-dimensional animations, and cooperate with the playing background music to create a relaxed, comfortable and warm home environment, which can effectively relieve people's mental stress and relieve emotions.
Figure 3. 3D dynamic display effect diagram of light cube 12*12*12
As shown in Figure 4, the robot system consists of Raspberry Pi and STM32 as the main controller, and sends the expression recognition result to STM32 to control the light cube to display 3D full-color animation. The Raspberry Pi completes the control of the pan/tilt steering gear so that the camera can track the movement of people; Dlib and Numpy are used for face recognition, facial feature point extraction and algorithm normalization processing to complete facial expression recognition. STM32F4 controls the LED lights of the three-dimensional position of the 12-level full-color light cube to display full-color animation, presenting a 3D picture, and enhancing the visual effect. (Refer to Figure 5 for the specific implementation process)
Figure 4. System Block
Figure 5. Program Flow Chat
First, use dlib to complete face recognition, and extract 68 feature points of face information. Then instantiate a shape_predictor object, use dlib to train face feature detection, and perform face feature point calibration. When calibrating, use the circle method of opencv to add a watermark to the coordinates of the feature point. The content is the serial number and position of the feature point. Next, it is necessary to perform a comprehensive calculation based on the coordinate information of these 68 feature points as a judgment index for each expression.
According to the judgment index we put forward, first calculate the proportion of mouth open. Before selecting the standard value of the indicator, analyze multiple happy face photos and calculate the average of the ratio of open mouth when happy. Use the slope of the fitted straight line to approximate the degree of inclination of the eyebrows. Similarly, calculate the eyebrow height and mouth width of each expression, and then classify and discuss, and give the discrimination threshold. Through the analysis of multiple different expression data, the reference value of each index can be obtained, and simple expression classification standards can be written to complete expression recognition.
For audio signal processing, we use the LDV5 module for processing. FPGA simulates SPI to communicate with the module to read the value of its register, and perform FFT fast Fourier transform on it for voice signal processing, and send corresponding instructions through the voice signal. FPGA voice sampling, filtering, and recognition, after a long period of debugging and parameter setting, the recognition rate of commonly used sentences is high, and the voice chat function is completed through local semantic analysis and retrieval.
Adopt 12 SM16126 serial conversion and strong drive chip cascade to form 144 output port control, input timing control through DateIN data port, carry out PWM modulation, produce RBG color adjustment. Through the host computer of the light cube 3D modeling we designed, we design different animations, mark the 6-bit data of each LED, etc., and finally read each frame of animation of the 12-level light cube in the SD card in DMA mode Realize the full-color animation display of Light Cube.
Go to the ``.doc/.pdf`` file in Chinese,click here
Here, I would like to thank my partner: Shen Fuzhou, who has been with me to continue this project, and he has also contributed a lot to this project. I am honored to meet such a great partner in college. At the same time, I would also like to thank my instructor: Mr. Chen Lei, who provided us with guidance and financial support from the school. Without your help, this project cannot be successfully completed. PS: The whole project took half a year, and it took more than 20 days to weld the entire light cube -_- (Here, I have to thank my roommates, specially Sun Jiqiao, for welding with us!)
Because this project is a funded project, on the one hand, it is open source for public welfare, and on the other hand, it has applied for national patents protection 📑 for copyright ownership. If you have any commercial use, please contact me.
If you have any questions or idea, please let me know 📧 yidazhang1@gmail.com
Use this bibtex to cite this repository:
@misc{EmotionCube,
title={Emotion Cube: Intelligent Speech Companion Robot Based-on Sentiment Analysis},
author={Charmve},
year={2019.12},
publisher={Github},
journal={GitHub repository},
howpublished={\url{https://github.com/Charmve/Intelligent-Speech-Sompanion-Robot-Based-on-Sentiment-Analysis}},
}
You can find how to contact me in the right sidebar. You can follow me to find something more interesting.
If you like Charmve or me or my projects, you can buy me a ☕ coffee 🍉 / 🍦 or 🍰 cake at Charmve Sponsors to support me, click the button. Your name will be shown at https://charmve.github.io/sponsor.html.
Code with ❤️ & ☕ By Charmve @ 2021