Skip to content

Latest commit

 

History

History
111 lines (67 loc) · 4.63 KB

README.md

File metadata and controls

111 lines (67 loc) · 4.63 KB

NutriFusionNet: A Graph-Attention Multimodal Approach for Nutritional Analysis

NutriFusionNet is an advanced machine learning framework designed for estimating nutritional facts from food images. By combining a Visual Question Answering (VQA) framework, Graph-Attentive Ingredient Embedding (GAIE), and multimodal attention mechanisms, NutriFusionNet offers state-of-the-art performance in predicting ingredient masses and nutritional content.

alt text


Key Features

  • Ingredient Mass Prediction: Uses VQA to predict individual ingredient masses, providing more detailed nutritional insights.
  • Multimodal Attention Mechanism: Ensures precise alignment between visual features and ingredient embeddings.

alt text

  • Graph-Attentive Ingredient Embedding (GAIE): Improves feature alignment by capturing co-occurrence patterns of ingredients.

alt text

  • Model Interpretability: Integrated Gradients and Grad-CAM visualizations for understanding model behavior.

alt text


Project Structure

Code Implementations


Dataset

NutriFusionNet is trained on the Nutrition5k dataset, which contains over 5,000 food images annotated with nutritional information. We reconstructed the dataset to focus on overhead images and valid labels.

Dataset Challenges

  • Limited global diversity: The dataset is primarily U.S.-focused, excluding many international cuisines.
  • Label inconsistencies: Some labels contain errors, requiring preprocessing and validation.
  • Incremental dataset structure: Facilitates ingredient-level analysis but poses challenges for generalization.

Results

NutriFusionNet achieves state-of-the-art performance:

  • Mean Absolute Error (MAE): 24.66, a 61.38% improvement over the baseline.

  • High Classification Accuracy: 0.86 for identifying ingredient presence.

Key Insights:

  • GAIE embeddings outperform BERT by better aligning ingredient relationships.
  • NFN delivers comparable results to the Nutrition5k model using simpler inputs (overhead images only).

For more detailed results, see the performance comparison.


Ethical Considerations

  • Dataset Bias: The Nutrition5k dataset lacks global diversity, focusing primarily on U.S.-centric dishes.
  • Impact on Users: Predictions may influence dietary decisions but are not intended for medical use.

Future Directions

  • Expanding the dataset to include global cuisines.
  • Incorporating 3D dish representations for volume-based analysis.
  • Adding multi-task learning for ingredient presence detection.
  • Enhancing attention mechanisms with state-of-the-art designs.
  • Addressing biases in pretrained embeddings for better results.

References

For a complete list of references and detailed documentation, please refer to the project report.


Acknowledgments

We extend our gratitude to the creators of the Nutrition5k dataset and related works that inspired this project.


License

This project is licensed under the MIT License. See the LICENSE file for details.


For inquiries or feedback, please contact Yancheng Liu.