This repository contains data science projects centered around the Game of Thrones universe. The projects explore and analyze text data from the show, leveraging machine learning and natural language processing (NLP) techniques. The goal is to gain insights from dialogues, characters, and other textual elements.
This project is a data-driven exploration of the Game of Thrones world, focusing on the unique language used by characters. By applying text processing, sentiment analysis, and word frequency techniques, we uncover the patterns in the characters' dialogue. It is ideal for those who want to explore text mining and NLP in a fun and practical way.
- This project extracts and processes unique dialogue for each major character.
- Goal: Understand the distinct speech patterns of different characters.
- Core Tasks: Text parsing, data cleaning, and outputting results to individual files.
- Analyzes the sentiment (positive, negative, neutral) of various characters' dialogues.
- Tools: TextBlob, VADER.
- Goal: Understand how the sentiment of the characters' language evolves throughout the series.
- Identifies frequently used words and phrases by the characters.
- Goal: Discover the most important or repeated themes.
- Tools: NLTK, Pandas.
- Detects and categorizes named entities like people, places, and organizations in Game of Thrones texts.
- Goal: Build a list of key entities mentioned in the dialogues.
- Tools: SpaCy.
To use this repository, follow these steps:
-
Clone the repository:
git clone https://github.com/jarvismayur/Games-of-Thornes---Data-Scince-Projects.git
-
Navigate to the project directory:
cd Games-of-Thornes---Data-Scince-Projects
-
Set up a Python virtual environment and install the required libraries:
# Create virtual environment python -m venv venv # Activate the virtual environment # On Windows: venv\Scripts\activate # On macOS/Linux: source venv/bin/activate # Install dependencies pip install -r requirements.txt
- Python: Main programming language used.
- Pandas: Data manipulation.
- NLTK: Natural Language Processing.
- SpaCy: Named Entity Recognition (NER).
- TextBlob: Sentiment analysis.
- Matplotlib & Seaborn: Data visualization.
- Jupyter Notebook: For project development.
You can run any of the projects in this repository by navigating to the project folder and executing the corresponding Jupyter notebook or Python script. For example, to analyze character dialogues:
cd Unique-Character-Dialogues/
jupyter notebook dialogue_analysis.ipynb
If you'd like to contribute to this repository:
- Fork the repository.
- Create a new branch (git checkout -b feature-branch).
- Make your changes and commit them (git commit -m 'Add new feature').
- Push to the branch (git push origin feature-branch).
- Create a pull request.
This project is licensed under the Apache License 2.0. See the LICENSE file for more information.