NutriFusionNet is an advanced machine learning framework designed for estimating nutritional facts from food images. By combining a Visual Question Answering (VQA) framework, Graph-Attentive Ingredient Embedding (GAIE), and multimodal attention mechanisms, NutriFusionNet offers state-of-the-art performance in predicting ingredient masses and nutritional content.
- Ingredient Mass Prediction: Uses VQA to predict individual ingredient masses, providing more detailed nutritional insights.
- Multimodal Attention Mechanism: Ensures precise alignment between visual features and ingredient embeddings.
- Graph-Attentive Ingredient Embedding (GAIE): Improves feature alignment by capturing co-occurrence patterns of ingredients.
- Model Interpretability: Integrated Gradients and Grad-CAM visualizations for understanding model behavior.
- Dataset Reconstruction: Scripts for inspecting and preprocessing the Nutrition5k dataset.
- Models: Implementation of baseline and advanced models.
- Training and Evaluation: Modular scripts for training and testing.
- Embedding Generation: GNN-based embedding and failure analysis for BERT embeddings.
- Visualization: Methods for interpreting model decisions.
NutriFusionNet is trained on the Nutrition5k dataset, which contains over 5,000 food images annotated with nutritional information. We reconstructed the dataset to focus on overhead images and valid labels.
- Limited global diversity: The dataset is primarily U.S.-focused, excluding many international cuisines.
- Label inconsistencies: Some labels contain errors, requiring preprocessing and validation.
- Incremental dataset structure: Facilitates ingredient-level analysis but poses challenges for generalization.
NutriFusionNet achieves state-of-the-art performance:
-
Mean Absolute Error (MAE): 24.66, a 61.38% improvement over the baseline.
-
High Classification Accuracy: 0.86 for identifying ingredient presence.
Key Insights:
- GAIE embeddings outperform BERT by better aligning ingredient relationships.
- NFN delivers comparable results to the Nutrition5k model using simpler inputs (overhead images only).
For more detailed results, see the performance comparison.
- Dataset Bias: The Nutrition5k dataset lacks global diversity, focusing primarily on U.S.-centric dishes.
- Impact on Users: Predictions may influence dietary decisions but are not intended for medical use.
- Expanding the dataset to include global cuisines.
- Incorporating 3D dish representations for volume-based analysis.
- Adding multi-task learning for ingredient presence detection.
- Enhancing attention mechanisms with state-of-the-art designs.
- Addressing biases in pretrained embeddings for better results.
For a complete list of references and detailed documentation, please refer to the project report.
We extend our gratitude to the creators of the Nutrition5k dataset and related works that inspired this project.
This project is licensed under the MIT License. See the LICENSE file for details.
For inquiries or feedback, please contact Yancheng Liu.