Skip to content

Latest commit

 

History

History
33 lines (19 loc) · 2.41 KB

README.md

File metadata and controls

33 lines (19 loc) · 2.41 KB

Undergraduate Honours Thesis (2023-2024): A Study on Knowledge-based Visual Question Generation

Welcome to the GitHub repository for my undergraduate honours thesis titled "K-VQG: A Study on Knowledge-based Visual Question Generation".

Overview

This repository serves as the home for the research conducted as part of an undergraduate honours thesis. The thesis explores the field of Visual Question Generation (VQG), focusing specifically on knowledge-based approaches.

Thesis Abstract

Extracting information from images is an increasingly prominent task due to its ever-expanding utility and applications. Generating knowledge-based questions (and answers) from images is a more novel approach of this that complements traditional image analysis techniques, offering a more interpretive method for understanding visual content. Knowledge-based questions are designed to prompt responses that extract relevant information from the subject image. In this study, we developed and examined various techniques for performing Knowledge-Based Visual Question Generation (K-VQG). These techniques included prompt engineering utilizing the latest large multimodal model (GPT-4 Vision) and employing sequence-to-sequence (Seq2Seq) models with semantic role labels (SRLs). Within the study, we employed various prompts aimed at providing context through both categorization and few-shot learning techniques. Alongside this, we implemented a sequence-to-sequence (Seq2Seq) model for textual question generation, anchored in semantic role labels, using LLM-generated captions of the images as its input. We observed that the proposed 2-shot learning method delivered the best results on the quantitative metrics (Semantic Similarity, METEOR, etc.).

Contents

  • Code: Contains the code implementations used for data collection, question generation, training, and evaluation.
  • Documentation: Thesis Report , alongside supplementary materials such as resource guides, and related literature.
  • Results: Evaluation metrics, analysis results, and visualizations.
  • Data: Includes Ground truth data and CSV files used in the study.

Author

  • Author: Ammar Hatiya
  • Field of Study: Computer Science
  • Degree: Bachelors of Science, Computer Science

Contact

For any inquiries or collaborations, feel free to contact the author:

Ammar Hatiya