- Abstract
- Repository Overview
- Project Objectives
- Installation
- Usage
- Prototype Details
- Images and Visuals
- Results and Findings
- Contributing
- License
- Contact Information
University websites are essential platforms for providing critical information. However, they often suffer from issues such as fragmented navigation, lack of personalization, and poor search interfaces. This project addresses these challenges by developing an AI-powered chatbot using advanced Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) techniques. The chatbot is designed to dynamically retrieve and generate accurate, domain-specific responses for course-related queries, improving the user experience for prospective students and other stakeholders.
The repository contains all the necessary resources for developing, fine-tuning, and deploying the chatbot. It includes the following directories and files:
Data_Cleaning_and_Data_Analysis/
: Scripts and notebooks for data preprocessing and exploratory data analysis.Google-Collab/
: Experiments and notebooks conducted on Google Colab.- **
Large Language Models
**: All LLMs are hosted on Huggingface. More Information Web_Scraping/scrape_uni/
: Web scraping scripts for extracting university course data.prototype_website_chatbot/
: The prototype implementation of the chatbot.app.py
: The main Streamlit script for running the chatbot prototype.README.md
: This documentation file.
- Development: Implement an AI chatbot capable of delivering accurate and contextually relevant course-related information.
- Improving User Experience: Simplify access to university course details through an intuitive conversational interface.
- Model Evaluation: Fine-tune and evaluate LLMs like Mistral-7B, Gemma-2-9B, Phi-3.5, and Llama-3.2 using metrics such as ScarBLEU, CER, and Meteor.
- Demonstrate Applications: Highlight the potential of AI chatbots in enhancing educational digital services.
-
Clone the repository:
git clone https://github.com/Abhinav330/MSC-Project.git
-
Navigate to the directory:
cd MSC-Project
-
Set up a virtual environment (recommended):
python -m venv env source env/bin/activate # On Windows: env\Scripts\activate
-
Install the dependencies:
pip install -r requirements.txt
-
Run the Prototype Application:
streamlit run app.py
This script launches the chatbot interface using Streamlit, enabling users to interact with the chatbot for course-related queries.
-
Test Data Cleaning and Analysis:
- Access the
Data_Cleaning_and_Data_Analysis
directory. - Use the provided Jupyter notebooks to explore data preparation steps.
- Access the
-
Web Scraping:
- Use the
Web_Scraping/scrape_uni
scripts to extract updated course data from the university website.
- Use the
- Core Functionality: Implemented in
app.py
, this Streamlit script integrates fine-tuned LLMs to respond to user queries about course details dynamically. - Technologies: The chatbot uses LangChain for conversational flow, ChromaDB for efficient data retrieval, and Streamlit for the UI.
- Supported Models: Fine-tuned versions of Mistral-7B, Gemma-2-9B, Llama-3.2, and Phi-3.5.
Figure 1: The chatbot responding to user queries.
Figure 2: Exploratory Data Analysis showing International fee distribution.
-
Model Performance:
- Best Model: Llama-3.2 demonstrated the highest accuracy and fluency.
- Metrics: ScarBLEU, CER, and Meteor scores validated the model outputs.
-
User Experience:
- Improved navigation and search efficiency on the university website.
- Accurate, context-aware responses for course-related queries.
-
Prototype Evaluation:
- Robust response handling for paraphrased and complex user questions.
We welcome contributions to improve the chatbot. To contribute:
- Fork the repository.
- Create a new branch:
git checkout -b feature/your-feature-name
- Commit your changes:
git commit -m 'Add your feature'
- Push to the branch:
git push origin feature/your-feature-name
- Submit a pull request.
This project is licensed under the MIT License.
For any inquiries, please contact Abhinav via abhinav33303@gmail.com.
This project is part of an MSc research initiative focused on improving digital services in higher education.