BuStop is a ML-based framework to automatically detect different stay-location types for intra-city public bus travels through multi-modal sensing.
-
Updated
Nov 20, 2022 - Jupyter Notebook
BuStop is a ML-based framework to automatically detect different stay-location types for intra-city public bus travels through multi-modal sensing.
Exploring and Visualizing Referring Expression Comprehension (Bachelor's Thesis by David Álvarez Rosa)
Official implementation for MGN
Code for the paper Visual Explanations of Image–Text Representations via Multi-Modal Information Bottleneck Attribution
Encoder-Decoder CNN-LSTM Model with an attention mechanism for image captioning. Trained using the Microsoft COCO Dataset.
Pytorch Implementation of Multimodal Entailment baseline
This repository contains an official PyTorch implementation of Position-aware Location Regression Network (PLRN) for temporal video grounding, which is presented in the paper Position-aware Location Regression Network for Temporal Video Grounding.
Socratic models for multimodal reasoning & image captioning
Interactive Multimodal Explanations for Easy Visual Question Answering
Emo-CLIM: Emotion-Aligned Contrastive Learning Between Images and Music [ICASSP 2024]
Multimodal, intelligent LLM and RAG-powered math tutor capable of combining the power of NLP with CAS to produce answers that are mathematically-sound, hallucination-free, and easy to digest with step-by-step solutions delivered using natural language. Support LaTeX front-end rendering with libraries such as MathJax
Public repository of our IGARSS 2023 submission
Showcases ongoing, and completed projects within various research themes.
[EMNLP 2022] Pytorch code for "Modal-specific Pseudo Query Generation for Video Corpus Moment Retrieval"
Official Codebase of "A Unified Audio-Visual Learning Framework for Localization, Separation, and Recognition" (ICML 2023)
Corpus of resources for multimodal machine learning with physiological signals
Official code of QA-oriented pretraining
Code for project Using Self-Supervised Learning to classify aerial scenes audiovisuals with remote sensing data
VoiceGAN - Hallucinating Faces from Voices
Add a description, image, and links to the multimodal-learning topic page so that developers can more easily learn about it.
To associate your repository with the multimodal-learning topic, visit your repo's landing page and select "manage topics."