🚀 Awesome LLMSecOps

🔐 A curated list of awesome resources for LLMSecOps (Large Language Model Security Operations) 🧠

by @wearetyomsmnv

LLM safety is a huge body of knowledge that is important and relevant to society today. The purpose of this Awesome list is to provide the community with the necessary knowledge on how to build an LLM development process - safe, as well >as what threats may be encountered along the way. Everyone is welcome to contribute.

Important

This repository, unlike many existing repositories, emphasizes the practical implementation of security and does not provide a lot of references to arxiv in the description.

3 types of LLM architecture

Architecture risks

Risk	Description
Recursive Pollution	LLMs can produce incorrect output with high confidence. If such output is used in training data, it can cause future LLMs to be trained on polluted data, creating a feedback loop problem.
Data Debt	LLMs rely on massive datasets, often too large to thoroughly vet. This lack of transparency and control over data quality presents a significant risk.
Black Box Opacity	Many critical components of LLMs are hidden in a "black box" controlled by foundation model providers, making it difficult for users to manage and mitigate risks effectively.
Prompt Manipulation	Manipulating the input prompts can lead to unstable and unpredictable LLM behavior. This risk is similar to adversarial inputs in other ML systems.
Poison in the Data	Training data can be contaminated intentionally or unintentionally, leading to compromised model integrity. This is especially problematic given the size and scope of data used in LLMs.
Reproducibility Economics	The high cost of training LLMs limits reproducibility and independent verification, leading to a reliance on commercial entities and potentially unreviewed models.
Model Trustworthiness	The inherent stochastic nature of LLMs and their lack of true understanding can make their output unreliable. This raises questions about whether they should be trusted in critical applications.
Encoding Integrity	Data is often processed and re-represented in ways that can introduce bias and other issues. This is particularly challenging with LLMs due to their unsupervised learning nature.

From Berryville Institute of Machine Learning (BIML) paper

Vulnerabilities desctiption

by Giskard

Vulnerability	Description
Hallucination and Misinformation	These vulnerabilities often manifest themselves in the generation of fabricated content or the spread of false information, which can have far-reaching consequences such as disseminating misleading content or malicious narratives.
Harmful Content Generation	This vulnerability involves the creation of harmful or malicious content, including violence, hate speech, or misinformation with malicious intent, posing a threat to individuals or communities.
Prompt Injection	Users manipulating input prompts to bypass content filters or override model instructions can lead to the generation of inappropriate or biased content, circumventing intended safeguards.
Robustness	The lack of robustness in model outputs makes them sensitive to small perturbations, resulting in inconsistent or unpredictable responses that may cause confusion or undesired behavior.
Output Formatting	When model outputs do not align with specified format requirements, responses can be poorly structured or misformatted, failing to comply with the desired output format.
Information Disclosure	This vulnerability occurs when the model inadvertently reveals sensitive or private data about individuals, organizations, or entities, posing significant privacy risks and ethical concerns.
Stereotypes and Discrimination	If model's outputs are perpetuating biases, stereotypes, or discriminatory content, it leads to harmful societal consequences, undermining efforts to promote fairness, diversity, and inclusion.

LLMSecOps Life Cycle

🛠 Tools for scanning

Tool	Description	Stars
🔧 Garak	LLM vulnerability scanner
🔧 ps-fuzz 2	Make your GenAI Apps Safe & Secure 🚀 Test & harden your system prompt
🗺️ LLMmap	Tool for mapping LLM vulnerabilities
🛡️ Agentic Security	Security toolkit for AI agents
🧠 Mindgard CLI	Command-line interface for Mindgard security tools
🔒 LLM Confidentiality	Tool for ensuring confidentiality in LLMs
🔒 PyRIT	The Python Risk Identification Tool for generative AI (PyRIT) is an open access automation framework to empower security professionals and machine learning engineers to proactively find risks in their generative AI systems.

How to run garak
python -m pip install -U garak
Probe ChatGPT for encoding-based prompt injection (OSX/*nix) (replace example value with a real OpenAI API key)

Probes is a simple .py file with prompts for LLM

Examples
export OPENAI_API_KEY="sk-123XXXXXXXXXXXX"
python3 -m garak --model_type openai --model_name gpt-3.5-turbo --probes encoding
See if the Hugging Face version of GPT2 is vulnerable to DAN 11.0
python3 -m garak --model_type huggingface --model_name gpt2 --probes dan.Dan_11_0
More examples on Garak Tool instruction

🛡️Defense

Tool	Description	Stars
🛡️ PurpleLlama	Set of tools to assess and improve LLM security.
🛡️ Rebuff	API with built-in rules for identifying prompt injection and detecting data leakage through canary words.
🔒 LLM Guard	Self-hostable tool with multiple prompt and output scanners for various security issues.
🚧 NeMo Guardrails	Tool that protects against jailbreak and hallucinations with customizable rulesets.
👁️ Vigil	Offers dockerized and local setup options, using proprietary HuggingFace datasets for security detection.
🧰 LangKit	Provides functions for jailbreak detection, prompt injection, and sensitive information detection.
🛠️ GuardRails AI	Focuses on functionality, detects presence of secrets in responses.
🦸 Hyperion Alpha	Detects prompt injections and jailbreaks.	N/A
🛡️ LLM-Guard	Tool for securing LLM interactions.
🚨 Whistleblower	Tool for detecting and preventing LLM vulnerabilities.
🔍 Plexiglass	Security tool for LLM applications.
🔍 Prompt Injection defenses	Rules for protected LLM
🔍 LLM Data Protector	Tools for protected LLM in chatbots

Threat Modeling

Tool	Description
Secure LLM Deployment: Navigating and Mitigating Safety Risks	Research paper on LLM security [sorry, but is really cool]
ThreatModels	Repository for LLM threat models
Threat Modeling LLMs	AI Village resource on threat modeling for LLMs

Monitoring

Tool	Description
Langfuse	Open Source LLM Engineering Platform with security capabilities.

Watermarking

Tool	Description
MarkLLM	An Open-Source Toolkit for LLM Watermarking.

Jailbreaks

Resource	Description
JailbreakBench	Website dedicated to evaluating and analyzing jailbreak methods for language models
L1B3RT45	GitHub repository containing information and tools related to AI jailbreaking
llm-hacking-database	This repository contains various attack against Large Language Models
HaizeLabs jailbreak Database	This database contains jailbreaks for multimodal language models
Lakera PINT Benchmark	A benchmark for prompt injection detection systems.

PINT Benchmark scores (by lakera)

Name	PINT Score	Test Date
Lakera Guard	98.0964%	2024-06-12
protectai/deberta-v3-base-prompt-injection-v2	91.5706%	2024-06-12
Azure AI Prompt Shield for Documents	91.1914%	2024-04-05
Meta Prompt Guard	90.4496%	2024-07-26
protectai/deberta-v3-base-prompt-injection	88.6597%	2024-06-12
WhyLabs LangKit	80.0164%	2024-06-12
Azure AI Prompt Shield for User Prompts	77.504%	2024-04-05
Epivolis/Hyperion	62.6572%	2024-06-12
fmops/distilbert-prompt-injection	58.3508%	2024-06-12
deepset/deberta-v3-base-injection	57.7255%	2024-06-12
Myadav/setfit-prompt-injection-MiniLM-L3-v2	56.3973%	2024-06-12

Hallucinations Leaderboard

Model	Hallucination Rate	Factual Consistency Rate	Answer Rate	Average Summary Length (Words)
GPT-4o	1.5 %	98.5 %	100.0 %	77.8
Zhipu AI GLM-4-9B-Chat	1.6 %	98.4 %	100.0 %	58.1
GPT-4o-mini	1.7 %	98.3 %	100.0 %	76.3
GPT-4-Turbo	1.7 %	98.3 %	100.0 %	86.2
GPT-4	1.8 %	98.2 %	100.0 %	81.1
GPT-3.5-Turbo	1.9 %	98.1 %	99.6 %	84.1
Microsoft Orca-2-13b	2.5 %	97.5 %	100.0 %	66.2
Intel Neural-Chat-7B-v3-3	2.7 %	97.3 %	100.0 %	60.7
Snowflake-Arctic-Instruct	3.0 %	97.0 %	100.0 %	68.7
Microsoft Phi-3-mini-128k-instruct	3.1 %	96.9 %	100.0 %	60.1
01-AI Yi-1.5-34B-Chat	3.9 %	96.1 %	100.0 %	83.7
Llama-3.1-405B-Instruct	3.9 %	96.1 %	99.6 %	85.7
Microsoft Phi-3-mini-4k-instruct	4.0 %	96.0 %	100.0 %	86.8
Llama-3-70B-Chat-hf	4.1 %	95.9 %	99.2 %	68.5
Mistral-Large2	4.4 %	95.6 %	100.0 %	77.4
Mixtral-8x22B-Instruct-v0.1	4.7 %	95.3 %	99.9 %	92.0
Qwen2-72B-Instruct	4.9 %	95.1 %	100.0 %	100.1
Llama-3.1-70B-Instruct	5.0 %	95.0 %	100.0 %	79.6
01-AI Yi-1.5-9B-Chat	5.0 %	95.0 %	100.0 %	85.7
Llama-3.1-8B-Instruct	5.5 %	94.5 %	100.0 %	71.0
Llama-2-70B-Chat-hf	5.9 %	94.1 %	99.9 %	84.9
Google Gemini-1.5-flash	6.6 %	93.4 %	98.1 %	62.8
Microsoft phi-2	6.7 %	93.3 %	91.5 %	80.8
Google Gemma-2-2B-it	7.0 %	93.0 %	100.0 %	62.2
Llama-3-8B-Chat-hf	7.4 %	92.6 %	99.8 %	79.7
Google Gemini-Pro	7.7 %	92.3 %	98.4 %	89.5
CohereForAI c4ai-command-r-plus	7.8 %	92.2 %	100.0 %	71.2
01-AI Yi-1.5-6B-Chat	8.2 %	91.8 %	100.0 %	98.9
databricks dbrx-instruct	8.3 %	91.7 %	100.0 %	85.9
Anthropic Claude-3-5-sonnet	8.6 %	91.4 %	100.0 %	103.0
Mistral-7B-Instruct-v0.3	9.8 %	90.2 %	100.0 %	98.4
Anthropic Claude-3-opus	10.1 %	89.9 %	95.5 %	92.1
Google Gemma-2-9B-it	10.1 %	89.9 %	100.0 %	70.2
Llama-2-13B-Chat-hf	10.5 %	89.5 %	99.8 %	82.1
Llama-2-7B-Chat-hf	11.3 %	88.7 %	99.6 %	119.9
Microsoft WizardLM-2-8x22B	11.7 %	88.3 %	99.9 %	140.8
Amazon Titan-Express	13.5 %	86.5 %	99.5 %	98.4
Google PaLM-2	14.1 %	85.9 %	99.8 %	86.6
Google Gemma-7B-it	14.8 %	85.2 %	100.0 %	113.0
Cohere-Chat	15.4 %	84.6 %	98.0 %	74.4
Anthropic Claude-3-sonnet	16.3 %	83.7 %	100.0 %	108.5
Google Gemma-1.1-7B-it	17.0 %	83.0 %	100.0 %	64.3
Anthropic Claude-2	17.4 %	82.6 %	99.3 %	87.5
Google Flan-T5-large	18.3 %	81.7 %	99.3	20.9
Cohere	18.9 %	81.1 %	99.8 %	59.8
Mixtral-8x7B-Instruct-v0.1	20.1 %	79.9 %	99.9 %	90.7
Apple OpenELM-3B-Instruct	24.8 %	75.2 %	99.3 %	47.2
Google Gemma-1.1-2B-it	27.8 %	72.2 %	100.0 %	66.8
Google Gemini-1.5-Pro	28.1 %	71.9 %	89.3 %	82.1
TII falcon-7B-instruct	29.9 %	70.1 %	90.0 %	75.5

From this repo (update 5 aug)

This is a Safety Benchmark from stanford university

RAG Security

Resource	Description
Security Risks in RAG	Article on security risks in Retrieval-Augmented Generation (RAG)
How RAG Poisoning Made LLaMA3 Racist	Blog post about RAG poisoning and its effects on LLaMA3
Adversarial AI - RAG Attacks and Mitigations	GitHub repository on RAG attacks, mitigations, and defense strategies
PoisonedRAG	GitHub repository about poisoned RAG systems
ConfusedPilot: Compromising Enterprise Information Integrity and Confidentiality with Copilot for Microsoft 365	Article about RAG vulnerabilities

Agentic security

Tool	Description	Stars
invariant	A trace analysis tool for AI agents.
AgentBench	A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)

PoC

Tool	Description	Stars
Visual Adversarial Examples	Jailbreaking Large Language Models with Visual Adversarial Examples
Weak-to-Strong Generalization	Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision
Image Hijacks	Repository for image-based hijacks of large language models
CipherChat	Secure communication tool for large language models
LLMs Finetuning Safety	Safety measures for fine-tuning large language models
Virtual Prompt Injection	Tool for virtual prompt injection in language models
FigStep	Jailbreaking Large Vision-language Models via Typographic Visual Prompts
stealing-part-lm-supplementary	Some code for "Stealing Part of a Production Language Model"
Hallucination-Attack	Attack to induce LLMs within hallucinations
llm-hallucination-survey	Reading list of hallucination in LLMs. Check out our new survey paper: "Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models"
LMSanitator	LMSanitator: Defending Large Language Models Against Stealthy Prompt Injection Attacks
Imperio	Imperio: Robust Prompt Engineering for Anchoring Large Language Models
Backdoor Attacks on Fine-tuned LLaMA	Backdoor Attacks on Fine-tuned LLaMA Models
CBA	Consciousness-Based Authentication for LLM Security
MuScleLoRA	A Framework for Multi-scenario Backdoor Fine-tuning of LLMs
BadActs	BadActs: Backdoor Attacks against Large Language Models via Activation Steering
TrojText	Trojan Attacks on Text Classifiers
AnyDoor	Create Arbitrary Backdoor Instances in Language Models
PromptWare	A Jailbroken GenAI Model Can Cause Real Harm: GenAI-powered Applications are Vulnerable to PromptWares

Study resource

Tool	Description
Gandalf	Interactive LLM security challenge game
Prompt Airlines	Platform for learning and practicing prompt engineering
PortSwigger LLM Attacks	Educational resource on WEB LLM security vulnerabilities and attacks
Invariant Labs CTF 2024	CTF. Your should hack llm agentic
DeepLearning.AI Red Teaming Course	Short course on red teaming LLM applications
Learn Prompting: Offensive Measures	Guide on offensive prompt engineering techniques
Application Security LLM Testing	Free LLM security testing
Salt Security Blog: ChatGPT Extensions Vulnerabilities	Article on security flaws in ChatGPT browser extensions
safeguarding-llms	TMLS 2024 Workshop: A Practitioner's Guide To Safeguarding Your LLM Applications
Damn Vulnerable LLM Agent	Intentionally vulnerable LLM agent for security testing and education
GPT Agents Arena	Platform for testing and evaluating LLM agents in various scenarios
AI Battle	Interactive game focusing on AI security challenges

📊 Community research articles

Title	Authors	Year
📄 Bypassing Meta's LLaMA Classifier: A Simple Jailbreak	Robust Intelligence	2024
📄 Vulnerabilities in LangChain Gen AI	Unit42	2024
📄 Detecting Prompt Injection: BERT-based Classifier	WithSecure Labs	2024
📄 Practical LLM Security: Takeaways From a Year in the Trenches	NVIDIA	2024

🎓 Tutorials

Resource	Description
📚 HADESS - Web LLM Attacks	Understanding how to carry out web attacks using LLM
📚 Red Teaming with LLMs	Practical methods for attacking AI systems
📚 Lakera LLM Security	Overview of attacks on LLM

📚 Books

📖 Title	🖋️ Author(s)	🔍 Description
The Developer's Playbook for Large Language Model Security	Steve Wilson	🛡️ Comprehensive guide for developers on securing LLMs
Generative AI Security: Theories and Practices (Future of Business and Finance)	Ken Huang, Yang Wang, Ben Goertzel, Yale Li, Sean Wright, Jyoti Ponnapalli	🔬 In-depth exploration of security theories, laws, terms and practices in Generative AI
Adversarial AI Attacks, Mitigations, and Defense Strategies: A cybersecurity professional's guide to AI attacks, threat modeling, and securing AI with MLSecOps	John Sotiropoulos	Practical examples of code for your best mlsecops pipeline

BLOGS

Blog
https://embracethered.com/blog/
🐦 https://twitter.com/llm_sec
🐦 https://twitter.com/LLM_Top10
🐦 https://twitter.com/aivillage_dc
🐦 https://twitter.com/elder_plinius/
https://hiddenlayer.com/
https://t.me/llmsecurity

DATA

Resource	Description
Safety and privacy with Large Language Models	GitHub repository on LLM safety and privacy
Jailbreak LLMs	Data for jailbreaking Large Language Models
ChatGPT System Prompt	Repository containing ChatGPT system prompts
Do Not Answer	Project related to LLM response control
ToxiGen	Microsoft dataset
SafetyPrompts	A Living Catalogue of Open Datasets for LLM Safety
llm-security-prompt-injection	This project investigates the security of large language models by performing binary classification of a set of input prompts to discover malicious prompts. Several approaches have been analyzed using classical ML algorithms, a trained LLM model, and a fine-tuned LLM model.

OPS

Resource	Description
https://sysdig.com/blog/llmjacking-stolen-cloud-credentials-used-in-new-ai-attack/	LLMJacking: Stolen Cloud Credentials Used in New AI Attack
https://huggingface.co/docs/hub/security	Hugging Face Hub Security Documentation
https://developer.nvidia.com/blog/secure-llm-tokenizers-to-maintain-application-integrity/	Secure LLM Tokenizers to Maintain Application Integrity
https://sightline.protectai.com/	Sightline by ProtectAI Check vulnerabilities on: • Nemo by Nvidia • Deep Lake • Fine-Tuner AI • Snorkel AI • Zen ML • Lamini AI • Comet • Titan ML • Deepset AI • Valohai For finding LLMops tools vulnerabilities

🏗 Frameworks

_{OWASP LLM TOP 10}
10 vulnerabilities for llm

_{LLM AI Cybersecurity & Governance Checklist 2}
Brief explanation

💡 Best Practices

OWASP LLMSVS

Large Language Model Security Verification Standard

Project Link

The primary aim of the OWASP LLMSVS Project is to provide an open security standard for systems which leverage artificial intelligence and Large Language Models.

The standard provides a basis for designing, building, and testing robust LLM backed applications, including:

Architectural concerns
Model lifecycle
Model training
Model operation and integration
Model storage and monitoring

🌐 Community

Platform	Details
OWASP SLACK	Channels: • #project-top10-for-llm • #ml-risk-top5 • #project-ai-community • #project-mlsec-top10 • #team-llm_ai-secgov • #team-llm-redteam • #team-llm-v2-brainstorm
Awesome LLM Security	GitHub repository
PWNAI	Telegram channel
AiSec_X_Feed	Telegram channel
LVE_Project	Official website
Lakera AI Security resource hub	Google Sheets document
llm-testing-findings	Templates with recomendation, cwe and other

Name	LLM Security Company	URL
Giskard	AI quality management system for ML models, focusing on vulnerabilities such as performance bias, hallucinations, and prompt injections.	https://www.giskard.ai/
Lakera	Lakera Guard enhances LLM application security and counters a wide range of AI cyber threats.	https://www.lakera.ai/
Lasso Security	Focuses on LLMs, offering security assessment, advanced threat modeling, and specialized training programs.	https://www.lasso.security/
LLM Guard	Designed to strengthen LLM security, offers sanitization, malicious language detection, data leak prevention, and prompt injection resilience.	https://llmguard.com
LLM Fuzzer	Open-source fuzzing framework specifically designed for LLMs, focusing on integration into applications via LLM APIs.	https://github.com/llmfuzzer
Prompt Security	Provides a security, data privacy, and safety approach across all aspects of generative AI, independent of specific LLMs.	https://www.prompt.security
Rebuff	Self-hardening prompt injection detector for AI applications, using a multi-layered protection mechanism.	https://github.com/rebuff
Robust Intelligence	Provides AI firewall and continuous testing and evaluation. Creators of the airisk.io database donated to MITRE.	https://www.robustintelligence.com/
WhyLabs	Protects LLMs from security threats, focusing on data leak prevention, prompt injection monitoring, and misinformation prevention.	https://www.whylabs.ai/

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🚀 Awesome LLMSecOps

by @wearetyomsmnv

3 types of LLM architecture

Architecture risks

Vulnerabilities desctiption

by Giskard

LLMSecOps Life Cycle

🛠 Tools for scanning

How to run garak

🛡️Defense

Threat Modeling

Monitoring

Watermarking

Jailbreaks

PINT Benchmark scores (by lakera)

Hallucinations Leaderboard

RAG Security

Agentic security

PoC

Study resource

📊 Community research articles

🎓 Tutorials

📚 Books

BLOGS

DATA

OPS

🏗 Frameworks

💡 Best Practices

OWASP LLMSVS

🌐 Community

About

Releases

Packages

wearetyomsmnv/Awesome-LLMSecOps

Folders and files

Latest commit

History

Repository files navigation

🚀 Awesome LLMSecOps

by @wearetyomsmnv

3 types of LLM architecture

Architecture risks

Vulnerabilities desctiption

by Giskard

LLMSecOps Life Cycle

🛠 Tools for scanning

How to run garak

🛡️Defense

Threat Modeling

Monitoring

Watermarking

Jailbreaks

PINT Benchmark scores (by lakera)

Hallucinations Leaderboard

RAG Security

Agentic security

PoC

Study resource

📊 Community research articles

🎓 Tutorials

📚 Books

BLOGS

DATA

OPS

🏗 Frameworks

💡 Best Practices

OWASP LLMSVS

🌐 Community

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages