National Action Council for Minorities in Engineering(NACME) Artificial Intelligence - Machine Learning (AIML) Intensive Summer Bootcamp at the University of Southern California
We would personally like to thank the Apple NACME AIML Bootcamp program directors, our professor Dr. Corey Baker, our guide Abdulla Alshabanah, our PI Murali Annavaram for giving us the opportunity to perform this research in an area that we are deeply interested in. Their support and guidance have been invaluable throughout the course of this project, and their dedication to fostering a learning environment has significantly contributed to our growth and development in the field of artificial intelligence and machine learning. Thank you for providing us with this incredible opportunity to explore and innovate.
Developed by:
- Alline Ayala -
Electronic Systems Engineering Technology
-Texas A&M
- Noah Teckle -
Electrical
-University of Southern California
- Jonathan Haile -
Computer Science and Business Administration
-University of Southern California
In this project, the team developed a Two-Tower Collaborative Filtering Model to create a recommendation system for music. The team used Last.Fm's API and an Apple Music and Spotify dataset from Kaggle to perform the experiments. The model is a collaborative filtering model, which uses the interactions of all users determine which songs are interacted with and determine a score for the predicted song.
The model works with the aid of a deep learning architecture. It makes use of an MLP and a SparseNN to make predictions on representations of user and item embeddings. If you're interested in viewing the code, we suggest you look at structs.py, and sparsenn.py.
Below are diagrams showing the basic working of the two-tower MLP model:
Our model works along with a Sentence Transformer and user/item embeddings. The results of the model are found by running the main.py file with the appropriate command in step 11 or 15. The results will show a variety of scores. The Hit Rate (HR)@10 is the measure of the test item being in a user's top-k (top-10) list. The Normalized Discounted Cumulative Gain (NDCG)@10 is the metric based on the position of the true item among the top 10 list. This project used the following git repo of a recommendation system on MovieLens and BookCrossing as a reference: repo
With this code, we created our base model which uses song name as the essential embedded feature for recommendations. We experimented more with this code and used different item features to observe their results. The other features we worked with were:
- Song Summaries for our Apple Music and Spotify Datasets provided by an LLM (Groq API)
- Truncated Lyrics for both of the item Datasets, we will be using the first 25 truncated words of each song (Lyrics OVH API)
Read our paper:
NOTE: If you would like to access the completed data collection csv files as well as the Excel sheet with our data analysis, follow this link
- Fork this repo
- Change directories into your project
- On the command line, type
pip3 install requirements.txt
- Create a .env file in the data_collection directory
- Add the following line, with your Last Fm Api Key: LASTFM_API_KEY=YOUR_API_KEY
- Add ".env" to .gitignore
- pip install python-dotenv in your environment
- Download Apple Music Dataset into data_collection/DATASET
- For Apple Music, run last_fm_user.py -> apple_items.py -> apple_rating.py -> apple_datareader.py
- Change the dataset flag in main (lines 28 and 256) to 'apple_tracks'
- Run the following command: TOKENIZERS_PARALLELISM=False python3 main.py --dataset=apple_tracks --dataset_dir=/PATH_TO_DATACOLLECTION/ --device=cpu --batch_size=1024 --print_freq=32 --lr=2e-5 --epochs=5 --margin=1 --num_negatives=20 --warm_threshold=0.2 --num_workers=8
- Download Spotify Dataset into data_collection/DATASET
- For Spotify, run last_fm_user.py -> spotify_items.py -> spotify_rating.py -> spotify_data_reader.py
- Change the dataset flag in main (lines 28 and 256) to 'spotify_tracks'
- Run the following command: TOKENIZERS_PARALLELISM=False python3 main.py --dataset=spotify_tracks --dataset_dir=/PATH_TO_DATACOLLECTION/ --device=cpu --batch_size=1024 --print_freq=32 --lr=2e-5 --epochs=5 --margin=1 --num_negatives=20 --warm_threshold=0.2 --num_workers=8
- Play music, find songs, and discover artists. Last.fm. (n.d.).
- Groq is fast ai inference. Groq is Fast AI Inference. (n.d.-a).
- He, X., Liao, L., Zhang, H., Nie, L., Hu, X., & Chua, T.-S. (2017, Summer 26). Neural collaborative filtering. Neural Collaborative Filtering.
- Gossi, D., & Gunes, M. (n.d.). Lyric-based music recommendation - department of ... Lyric-Based Music Recommendation.