multimodal_ml_music.tsv

Year	Entrytype	Title	Author	Link	Code	Task	Reproducible	Dataset	Framework	Architecture	Dropout	Batch	Epochs	Dataaugmentation	Input	Dimension	Activation	Loss	Learningrate	Optimizer	Gpu	
2008	inproceedings	Multimodal Music Mood Classification using Audio and Lyrics	Laurier, Cyril and Grivolla, Jens and Herrera, Perfecto	http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.182.426&rep=rep1&type=pdf		mood classification																
2009	inproceedings	Combining audio content and social context for semantic music discovery	Turnbull, Douglas R. and Barrington, Luke and Lanckriet, Gert and Yazdani, Mehrdad	https://www.cs.swarthmore.edu/~turnbull/Papers/Turnbull_CombineMusicTags_SIGIR09.pdf		music retrieval																
2011	inproceedings	The need for music information retrieval with user-centered and multimodal strategies	Liem, Cynthia CS and M{\"u}ller, Meinard and Eck, Douglas and Tzanetakis, George and Hanjalic, Alan	https://dl.acm.org/doi/pdf/10.1145/2072529.2072531																		
2011	inproceedings	Musiclef: A benchmark activity in multimodal music information retrieval	Orio, Nicola and Rizo, David and Miotto, Riccardo and Montecchio, Nicola and Schedl, Markus and Lartillot, Olivier	https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.449.4173&rep=rep1&type=pdf																		
2013	incollection	Music emotion recognition: From content- to context-based models	Barthet, Mathieu and Fazekas, Gy{\"{o}}rgy and Sandler, Mark	https://qmro.qmul.ac.uk/xmlui/bitstream/handle/123456789/31911/Fazekas%20Music%20Emotion%20Recognition%202012%20Accepted.pdf;jsessionid=76AE783B989ED4CDBFB8B9C5CE013CE4?sequence=1		emotion recognition																
2013	inproceedings	Cross-modal Sound Mapping Using Deep Learning	Fried, Ohad and Fiebrink, Rebecca	https://www.ohadf.com/papers/FriedFiebrink_NIME2013.pdf																		
2016	unpublished	Towards Music Captioning: Generating Music Playlist Descriptions	Choi, Keunwoo and Fazekas, Gy{\"{o}}rgy and Sandler, Mark and Mcfee, Brian and Cho, Kyunghyun	https://arxiv.org/pdf/1608.04868.pdf		music captioning																
2016	inproceedings	Exploring customer reviews for music genre classification and evolutionary studies	Oramas, Sergio and Espinosa-Anke, Luis and Lawlor, Aonghus and Serra, Xavier and Saggion, Horacio	https://repositori.upf.edu/bitstream/handle/10230/33063/Oramas_ISMIR2016_expl.pdf?sequence=1&isAllowed=y		genre classification																
2017	inproceedings	Music emotion recognition via end-To-end multimodal neural networks	Jeon, Byungsoo and Kim, Chanju and Kim, Adrian and Kim, Dongwon and Park, Jangyeon and Ha, Jung Woo	http://ceur-ws.org/Vol-1905/recsys2017_poster18.pdf		emotion recognition																
2017	article	Learning neural audio embeddings for grounding semantics in auditory perception	Kiela, Douwe and Clark, Stephen	https://www.jair.org/index.php/jair/article/view/11101/26292																		
2017	inproceedings	A deep multimodal approach for cold-start music recommendation	Oramas, Sergio and Nieto, Oriol and Sordo, Mohamed and Serra, Xavier	https://dl.acm.org/doi/pdf/10.1145/3125486.3125492	https://github.com/sergiooramas/tartarus	music recommendation																
2018	inproceedings	Music mood detection based on audio and lyrics with deep neural net	Delbouys, R{\'{e}}mi and Hennequin, Romain and Piccoli, Francesco and Royo-Letelier, Jimena and Moussallam, Manuel	https://arxiv.org/pdf/1809.07276.pdf		mood classification																
2018	inproceedings	Cbvmr: content-based video-music retrieval using soft intra-modal structure constraint	Hong, Sungeun and Im, Woobin and Yang, Hyun S	https://dl.acm.org/doi/abs/10.1145/3206025.3206046	https://github.com/csehong/VM-NET	music retrieval																
2018	inproceedings	JTAV: Jointly Learning Social Media Content Representation by Fusing Textual, Acoustic, and Visual Features	Liang, Hongru and Wang, Haozheng and Wang, Jun and You, Shaodi and Sun, Zhe and Wei, Jin-Mao and Yang, Zhenglu	http://arxiv.org/abs/1806.01483	https://github.com/mengshor/JTAV																	
2018	article	Multimodal Deep Learning for Music Genre Classification	Oramas, Sergio and Barbieri, Francesco and Nieto, Oriol and Serra, Xavier	https://transactions.ismir.net/articles/10.5334/tismir.10/	https://github.com/fvancesco/music_resnet_classification	genre classification																
2018	inproceedings	Image generation associated with music data	Qiu, Yue and Kataoka, Hirokatsu	https://openaccess.thecvf.com/content_cvpr_2018_workshops/papers/w49/Qiu_Image_Generation_Associated_CVPR_2018_paper.pdf		image generation																
2018	inproceedings	The Sound of Pixels	Zhao, Hang and Gan, Chuang and Rouditchenko, Andrew and Vondrick, Carl and Mcdermott, Josh and Torralba, Antonio	https://arxiv.org/pdf/1804.03160.pdf	https://github.com/hangzhaomit/Sound-of-Pixels	source separation																
2019	inproceedings	Query by Video: Cross-Modal Music Retrieval	Li, Bochen	www.gracenote.com		music retrieval																
2019	article	Creating a Multitrack Classical Music Performance Dataset for Multimodal Music Analysis: Challenges, Insights, and Applications	Li, Bochen and Liu, Xinzhao and Dinesh, Karthik and Duan, Zhiyao and Sharma, Gaurav	https://arxiv.org/pdf/1612.08727.pdf																		
2019	article	Cross-Modal Music Retrieval and Applications: An Overview of Key Methodologies	M{\"{u}}ller, Meinard and Arzt, Andreas and Balke, Stefan and Dorfer, Matthias and Widmer, Gerhard	https://arxiv.org/pdf/1902.04397.pdf		music retrieval																
2019	inproceedings	Multimodal music information processing and retrieval: Survey and future challenges	Simonetta, Federico and Ntalampiras, Stavros and Avanzini, Federico	https://arxiv.org/pdf/1902.05347.pdf																		
2019	inproceedings	Learning Affective Correspondence between Music and Image	Verma, Gaurav and Dhekane, Eeshan Gunesh and Guha, Tanaya	https://arxiv.org/pdf/1904.00150.pdf																		
2019	inproceedings	Query-by-Blending: a Music Exploration System Blending Latent Vector Representations of Lyric Word, Song Audio, and Artist	Watanabe, Kento and Goto, Masataka	https://archives.ismir.net/ismir2019/paper/000015.pdf		music retrieval																
2019	article	Deep cross-modal correlation learning for audio and lyrics in music retrieval	Yu, Yi and Tang, Suhua and Raposo, Francisco and Chen, Lei	https://arxiv.org/pdf/1711.08976.pdf		music retrieval																
2019	inproceedings	Audio-visual embedding for cross-modal music video retrieval through supervised deep CCA	Zeng, Donghuo and Yu, Yi and Oyama, Keizo	https://arxiv.org/pdf/1908.03744.pdf		music video retrieval																
2020	inproceedings	Music autotagging as captioning	Cai, Tian  and Mandel, Michael I  and He, Di	https://www.aclweb.org/anthology/2020.nlp4musa-1.14		music captioning																
2020	inproceedings	Musical word embedding: Bridging the gap between listening contexts and music	Doh, Seungheon and Lee, Jongpil and Park, Tae Hong and Nam, Juhan	https://arxiv.org/abs/2008.01190																		
2020	unpublished	Learning Contextual Tag Embeddings for Cross-Modal Alignment of Audio and Tags	Favory, Xavier and Drossos, Konstantinos and Virtanen, Tuomas and Serra, Xavier	https://arxiv.org/pdf/2010.14171.pdf	https://github.com/xavierfav/ae-w2v-attention																	
2020	inproceedings	Foley music: Learning to generate music from videos	Gan, Chuang and Huang, Deng and Chen, Peihao and Tenenbaum, Joshua B and Torralba, Antonio	https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123560732.pdf		music generation																
2020	inproceedings	Music gesture for visual sound separation	Gan, Chuang and Huang, Deng and Zhao, Hang and Tenenbaum, Joshua B and Torralba, Antonio	https://openaccess.thecvf.com/content_CVPR_2020/papers/Gan_Music_Gesture_for_Visual_Sound_Separation_CVPR_2020_paper.pdf		source separation																
2020	inproceedings	Large-Scale Weakly-Supervised Content Embeddings for Music Recommendation and Tagging	Huang, Qingqing and Jansen, Aren and Zhang, Li and Ellis, Daniel P. W. and Saurous, Rif A. and Anderson, John	https://ieeexplore.ieee.org/abstract/document/9053240		music recommendation																
2020	inproceedings	Tr$\backslash$" aumerai: Dreaming music with stylegan	Jeong, Dasaem and Doh, Seungheon and Kwon, Taegyun	https://arxiv.org/abs/2102.04680	https://github.com/jdasam/traeumerAI	music-to-image synthesis																
2020	inproceedings	MusicBERT - learning multi-modal representations for music and text	Rossetto, Federico and Dalton, Jeff	https://www.aclweb.org/anthology/2020.nlp4musa-1.13																		
2021	inproceedings	Music Playlist Title Generation: A Machine-Translation Approach	Doh, SeungHeon and Lee, Junwon and Nam, Juhan	https://arxiv.org/abs/2110.07354	https://github.com/SeungHeonDoh/ply_title_gen	playlist captioning																
2021	article	Enriched music representations with multiple cross-modal contrastive learning	Ferraro, Andres and Favory, Xavier and Drossos, Konstantinos and Kim, Yuntae and Bogdanov, Dmitry	https://arxiv.org/abs/2104.00437	https://github.com/andrebola/contrastive-mir-learning																	
2021	inproceedings	MusCaps: Generating Captions for Music Audio	Manco, Ilaria and Benetos, Emmanouil and Quinton, Elio and Fazekas, Gyorgy	https://arxiv.org/abs/2104.11984	https://github.com/ilaria-manco/muscaps	music captioning																
2021	inproceedings	Multimodal metric learning for tag-based music retrieval	Won, Minz and Oramas, Sergio and Nieto, Oriol and Gouyon, Fabien and Serra, Xavier	https://arxiv.org/pdf/2010.16030.pdf	https://github.com/minzwon/tag-based-music-retrieval	music retrieval																
2022	article	Toward Universal Text-to-Music Retrieval	Doh, SeungHeon and Won, Minz and Choi, Keunwoo and Nam, Juhan	https://arxiv.org/abs/2211.14558	https://github.com/SeungHeonDoh/music-text-representation																	
2022	article	Clap: Learning audio concepts from natural language supervision	Elizalde, Benjamin and Deshmukh, Soham and Ismail, Mahmoud Al and Wang, Huaming	https://arxiv.org/abs/2206.04769	https://github.com/microsoft/CLAP																	
2022	inproceedings	Data-Efficient Playlist Captioning With Musical and Linguistic Knowledge	Gabbolini, Giovanni and Hennequin, Romain and Epure, Elena	https://preview.aclanthology.org/emnlp-22-ingestion/2022.emnlp-main.784	https://github.com/deezer/playntell	playlist captioning																
2022	article	RECAP: Retrieval Augmented Music Captioner	He, Zihao and Hao, Weituo and Song, Xuchen	https://arxiv.org/abs/2212.10901v1		music captioning																
2022	inproceedings	Mulan: A joint embedding of music audio and natural language	Huang, Qingqing and Jansen, Aren and Lee, Joonseok and Ganti, Ravi and Li, Judith Yue and Ellis, Daniel PW	https://arxiv.org/abs/2208.12415																		
2022	inproceedings	Learning music audio representations via weak language supervision	Manco, Ilaria and Benetos, Emmanouil and Quinton, Elio and Fazekas, Gy{\"o}rgy	https://arxiv.org/abs/2112.04214	https://github.com/ilaria-manco/mulap																	
2022	inproceedings	Contrastive audio-language learning for music	Manco, Ilaria and Benetos, Emmanouil and Quinton, Elio and Fazekas, Gy{\"o}rgy	https://arxiv.org/abs/2208.12208	https://github.com/ilaria-manco/muscall	music retrieval																
2022	inproceedings	Conversational Music Retrieval with Synthetic Data	Megan Eileen Leszczynski and Ravi Ganti and Shu Zhang and Krisztian Balog and Filip Radlinski and Fernando Pereira and Arun Tejasvi Chaganty	https://research.google/pubs/pub51943/																		
2022	inproceedings	It's Time for Artistic Correspondence in Music and Video	Sur{\'\i}s, D{\'\i}dac and Vondrick, Carl and Russell, Bryan and Salamon, Justin	https://arxiv.org/abs/2206.07148		music-to-video retrieval																
2022	inproceedings	Interpreting Song Lyrics with an Audio-Informed Pre-trained Language Model	Zhang, Yixiao and Jiang, Junyan and Xia, Gus and Dixon, Simon	https://arxiv.org/abs/2208.11671		music retrieval