TextTuring is an innovative project designed to distinguish human-generated text from machine-generated text. It leverages state-of-the-art natural language processing (NLP) techniques and machine learning models to accomplish this objective. With the ever-increasing generation of AI-generated content, TextTuring offers a powerful solution to identify and verify human-authored text.
TextTuring's inspiration draws from catching chess cheaters who use AI engines to assist them during games. Cheaters can be caught if they use top engine lines, similar to how TextTuring identifies text that closely resembles AI-generated content.
-
Data Collection: TextTuring provides a comprehensive dataset that includes a wide range of text samples. This dataset comprises both human-written and AI-generated content, ensuring diversity and accuracy in the model's training and evaluation.
-
Feature Engineering: The project incorporates advanced feature engineering techniques to analyze and extract meaningful characteristics from text data. These features include n-gram analysis and the computation of weak Language Model (LLM) scores.
-
Threshold Calculation: TextTuring dynamically calculates threshold values based on the provided data. This enables precise differentiation between human and machine-generated text.
-
Model Evaluation: The project employs various machine learning techniques to assess text samples against the calculated threshold. This evaluation process results in clear predictions, helping users determine the authenticity of the text.
-
Scalability: TextTuring is designed with scalability in mind, allowing it to efficiently process vast volumes of text data.
-
Clone the repository
git clone https://github.com/jaywyawhare/TextTuring
-
Install the required packages
pip install -r requirements.txt
-
Generate the dataset
python3 main.py --generate
-
Decide the threshold
python3 main.py --threshold
-
Go through the juptyer notebook
-
Deploy the web app
streamlit run app.py
- Go to the web app
- Arinjay Wyawhare - jaywyawhare
This project is licensed under the MIT License - see the LICENSE file for details.
- I extend my gratitude to the open-source NLP and machine learning communities for their invaluable contributions to the field.
No Need to check my readme as they are written by me because they arent! 😉