Develop a model that answers questions about medical images, such as radiological scans.
-
Multimodal Feature Extraction
- Image Features: Used a modified VGG19 model pre-trained on ImageNet.
- Question Features: Employed NLP techniques with RNNs, LSTMs, or Transformers.
-
Feature Fusion
- Combined visual and textual features using advanced fusion methods.
-
Predictions
- Treated as a classification problem using metrics like F1 Score and Accuracy.
The model effectively extracted and fused features, providing accurate answers to medical questions based on visual inputs. Future work will address improvements in model architecture and dataset expansion.