MorphFun is an innovative audio manipulation application that allows users to create musical magic with just a webcam and their creativity. This README will provide a detailed overview of the project, explaining its purpose, functionality, and how to use it effectively.
MorphFun is a fun and interactive application that combines audio processing and computer vision to transform your melodies into a symphony of instruments. With the power of Timbre Transfer, Pose Estimation, and Pose Classification, MorphFun brings a new dimension to music creation.
Have you ever wanted to see what your voice or whistle would sound like when played by a professional violinist or saxophonist? With MorphFun, your musical dreams become a reality. Sing or whistle into your microphone, and see your performance magically transformed into the sounds of four distinct instruments: Violin, Flute, Trumpet, or Tenor Saxophone. It's a fusion of your creativity and the art of music.
MorphFun harnesses the power of DDSP (Differentiable Digital Signal Processing) from Magenta, a Google Research project, for Neural Timbre Transfer. DDSP is a groundbreaking technique that allows you to modify the timbre, pitch, and dynamics of audio signals in a highly controllable manner.
With DDSP, MorphFun takes your recorded audio and performs a remarkable transformation. It not only replicates the melody but also allows you to choose from four distinct timbres: Violin, Flute, Trumpet, and Tenor Saxophone. This means that your vocal or whistle performance can instantly sound like it's being played by a professional instrumentalist. It's a magical fusion of your creativity and the art of music.
MorphFun incorporates the cutting-edge Mediapipe Holistic model for Pose Estimation. This model goes beyond traditional pose estimation by providing a comprehensive understanding of the body's movements. It extracts key points and landmarks representing the pose of your face, limbs, and hand, capturing even the subtlest nuances of your performance.
This real-time Pose Estimation is the foundation of MorphFun's interactive experience, enabling the application to track your movements and translate them into musical instrument choices.
What sets MorphFun apart is its custom Pose Classification model. We have trained our own LSTM-based neural network to analyze the sequences of keypoints generated by Pose Estimation. This model is tailored specifically for recognizing your "musical pose" and classifying it into one of five categories: Violin, Flute, Trumpet, Tenor Saxophone, or "No Instrument" if you're not mimicking any instrument.
This personalized model ensures accurate and responsive instrument selection, making your MorphFun experience feel uniquely tailored to your performance.
MorphFun provides an easy-to-use GUI with three buttons:
- REC: Start and stop audio recording.
- Pause/Play: Pause or resume audio playback.
- Clear: Reset your session for a fresh start.
MorphFun's magic happens through a seamless interaction of its key features. Here's an overview of the user-application interaction flow:
- The user records an audio performance through the Graphical User Interface (GUI), singing or whistling a melody.
- The recorded audio and melody are processed through DDSP, creating four distinct audio outputs, each simulating the timbre of a different instrument: Violin, Flute, Trumpet, and Tenor Saxophone.
- Simultaneously, the user's performance is captured by the webcam, and Pose Estimation (powered by Mediapipe Holistic) extracts key landmarks to represent the user's pose, including their face, limbs, and hand movements.
- These pose landmarks are fed into a custom Pose Classification model, which classifies the user's performance as one of the four instrument categories or "No Instrument."
- Based on the Pose Classification result, the corresponding instrument's audio output is played in real-time, creating a harmonious fusion of your performance and the instrument's sounds.
MorphFun turns your webcam and microphone into a musical playground, offering endless possibilities for creative expression.
Before running the application, ensure you have the required dependencies installed. You can do this by creating a Conda environment with Python 3.8.18 and then installing the dependencies using the provided requirements.txt
file:
If you don't have Conda installed, you can download and install it from Anaconda.
- Clone the repository to your local machine.
- Open a terminal or command prompt.
- Navigate to the project directory.
- Run the following command to create a Conda environment with Python 3.8.18:
conda create -n morphfun python=3.8.18
- After creating the Conda environment, activate it using the following command (on Windows, replace
source
withconda activate
):
source activate morphfun
- Finally, install the project dependencies using
pip
and the providedrequirements.txt
file:
pip install -r requirements.txt
- Clone the repository to your local machine.
- Create and activate the Conda environment as instructed above.
- Execute the following command to start MorphFun:
python3 source/main.py