Skip to content

gperdrizet/llm_detector

Repository files navigation

Malone

News

2024-08-17: Malone is temporarily off-line so that compute resources can be dedicated to benchmarking and improvements to the classifier. Check out what is going on in the benchmarking and classifier notebooks on the classifier branch. If you would really like to try malone out, get in touch and I will fire it up for you.

2024-08-07: Malone was just named a Backdrop Build v5 Finalist! Check out the build page here! Let's gooooo!

2024-08-01: Backdrop build v5 launch video is up on YouTube. Congrats to all of the other Backdrop Build finishers!

2024-07-30: Malone is live in Beta on Telegram, give it a try here. Note: some Firefox users have reported issues with the botlink page - seems to be a Telegram issue, not a malone issue. You can also find malone by messaging '/start' to @the_malone_bot anywhere you use Telegram.

2024-07-08: llm_detector is officially part of the Backdrop Build v5 cohort under the tentative name 'malone' starting today. Check out the backdrop build page for updates.

Project description

malone

Malone is a synthetic text detection service available on Telegram Messenger, written in Python using HuggingFace, scikit-learn, XGBoost, Luigi and python-telegram-bot, supported by Flask, Celery, Redis & Docker and served via Gunicorn and Nginx. Malone uses an in-house trained gradient boosting classifier to estimate the probability that a given text was generated by an LLM. It uses a set of engineered features derived from the input text, for more details see the feature engineering notebooks.

Table of Contents

  1. Features
  2. Where to find malone
  3. Usage
  4. Performance
  5. Demonstration/experimentation notebooks
  6. About the author
  7. Disclaimer

1. Features

  • Easily accessible - use it anywhere you can access Telegram: iOS or Android apps and any web browser.
  • Simple interface - no frills, just send the bot text and it will send back the probability that the text was machine generated.
  • Useful and accurate - provides a probability that text is synthetic, allowing users to make their own decisions when evaluating content. Maximum likelihood classification accuracy ~90% on held-out test data.
  • Model agnostic - malone is not trained to detect the output of a specific LLM, instead, it uses a gradient boosting classifier and a set of numerical features derived from/calibrated on a large corpus of human and synthetic text samples from multiple LLMs.
  • No logs - no user data or message contents are ever persisted to disk.
  • Open source codebase - malone is an open source project. Clone it, fork it, extend it, modify it, host it yourself and use it the way you want to use it.
  • Free

2. Where to find malone

Malone is publicly available on Telegram. You can find malone on the Telegram bot page, or just message @the_malone_bot with '/start' to start using it.

There are also plans in the works to offer the bare API to interested parties. If that's you, see section 6 below.

3. Usage

To use malone you will need a Telegram account. Telegram is free to use and available as an app for iOS and Android. There is also a web version for desktop use.

Once you have a Telegram account, malone is simple to use. Send the bot any 'suspect' text and it will reply with the probability that the text in question was written by a human or generated by an LLM. For smartphone use, a good trick is long press on 'suspect' text and then share it to malone on Telegram via the context menu. Malone is never more that 2 taps away!

telegram app screenshot

Malone can run in two response modes: 'default' and 'verbose'. Default mode returns the probability associated with the most likely class as a percent (e.g. 75% chance a human wrote this). Verbose mode gives a little more detail about the feature values and prediction metrics. Set the mode by messaging '/set_mode verbose' or '/set_mode default'.

For best results, submitted text must be between 50 and 500 words.

4. Performance

Malone is ~90% accurate with a binary log loss of ~0.25 on hold-out test data depending on the model and feature engineering hyperparameters and the specific train/test split (see example confusion matrix below). The miss-classified examples are more or less evenly split between false negatives and false positives.

XGBoost confusion matrix

For more details on the classifier training and performance see: XGBoost experimentation and XGBoost finalized.

5. Demonstration/experimentation notebooks

Most of the testing and benchmarking during the design phase of the project was trialed in Jupyter notebooks before refactoring into modules. These notebooks are the best way to understand the approach and the engineered features used to train the classifier.

  1. Human and synthetic text training data
  2. Perplexity ratio score
  3. TF-IDF score
  4. XGBoost classifier

6. About the author

My name is Dr. George Perdrizet, I am a biochemistry & molecular biology PhD seeking a career step from academia to professional data science and/or machine learning engineering. This project was conceived from the scientific literature and built solo over the course of a few weeks - I strongly believe that I have a ton to offer the right organization. If you or anyone you know is interested in an ex-researcher from University of Chicago turned builder and data scientist, please reach out, I'd love to learn from and contribute to your project.

7. Disclaimer

Malone is an experimental research project meant for educational, informational and entertainment purposes only. Any predictions made are inherently probabilistic in nature and subject to stochastic errors. Text classifications, no matter how high or low the reported probability, should never be interpreted as proof of authorship or the lack thereof in regard to any text submitted for analysis. Decisions about the source or value of any texts are made by the user who considers all factors relevant to themselves and their purpose and takes full responsibility for their own judgments and any actions they may take as a result.