Skip to content

LLM | Security | Operations in one github repo with good links and pictures.

Notifications You must be signed in to change notification settings

wearetyomsmnv/Awesome-LLMSecOps

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

38 Commits
Β 
Β 

Repository files navigation

GIPHY Animation

πŸš€ Awesome LLMSecOps

Awesome GitHub stars GitHub forks GitHub last commit

πŸ” A curated list of awesome resources for LLMSecOps (Large Language Model Security Operations) 🧠

by @wearetyomsmnv

Architecture | Vulnerabilities | Tools | Defense | Threat Modeling | Jailbreaks | RAG Security | PoC's | Study Resources | Books | Blogs | Datasets for Testing | OPS Security | Frameworks | Best Practices | Research | Tutorials | Companies | Community Resources

LLM safety is a huge body of knowledge that is important and relevant to society today. The purpose of this Awesome list is to provide the community with the necessary knowledge on how to build an LLM development process - safe, as well >as what threats may be encountered along the way. Everyone is welcome to contribute.

Important

This repository, unlike many existing repositories, emphasizes the practical implementation of security and does not provide a lot of references to arxiv in the description.


3 types of LLM architecture

Group3

Architecture risks

Risk Description
Recursive Pollution LLMs can produce incorrect output with high confidence. If such output is used in training data, it can cause future LLMs to be trained on polluted data, creating a feedback loop problem.
Data Debt LLMs rely on massive datasets, often too large to thoroughly vet. This lack of transparency and control over data quality presents a significant risk.
Black Box Opacity Many critical components of LLMs are hidden in a "black box" controlled by foundation model providers, making it difficult for users to manage and mitigate risks effectively.
Prompt Manipulation Manipulating the input prompts can lead to unstable and unpredictable LLM behavior. This risk is similar to adversarial inputs in other ML systems.
Poison in the Data Training data can be contaminated intentionally or unintentionally, leading to compromised model integrity. This is especially problematic given the size and scope of data used in LLMs.
Reproducibility Economics The high cost of training LLMs limits reproducibility and independent verification, leading to a reliance on commercial entities and potentially unreviewed models.
Model Trustworthiness The inherent stochastic nature of LLMs and their lack of true understanding can make their output unreliable. This raises questions about whether they should be trusted in critical applications.
Encoding Integrity Data is often processed and re-represented in ways that can introduce bias and other issues. This is particularly challenging with LLMs due to their unsupervised learning nature.

From Berryville Institute of Machine Learning (BIML) paper

Vulnerabilities desctiption

by Giskard

Vulnerability Description
Hallucination and Misinformation These vulnerabilities often manifest themselves in the generation of fabricated content or the spread of false information, which can have far-reaching consequences such as disseminating misleading content or malicious narratives.
Harmful Content Generation This vulnerability involves the creation of harmful or malicious content, including violence, hate speech, or misinformation with malicious intent, posing a threat to individuals or communities.
Prompt Injection Users manipulating input prompts to bypass content filters or override model instructions can lead to the generation of inappropriate or biased content, circumventing intended safeguards.
Robustness The lack of robustness in model outputs makes them sensitive to small perturbations, resulting in inconsistent or unpredictable responses that may cause confusion or undesired behavior.
Output Formatting When model outputs do not align with specified format requirements, responses can be poorly structured or misformatted, failing to comply with the desired output format.
Information Disclosure This vulnerability occurs when the model inadvertently reveals sensitive or private data about individuals, organizations, or entities, posing significant privacy risks and ethical concerns.
Stereotypes and Discrimination If model's outputs are perpetuating biases, stereotypes, or discriminatory content, it leads to harmful societal consequences, undermining efforts to promote fairness, diversity, and inclusion.

LLMSecOps Life Cycle

Group 2

πŸ›  Tools for scanning

Tool Description Stars
πŸ”§ Garak LLM vulnerability scanner GitHub stars
πŸ”§ ps-fuzz 2 Make your GenAI Apps Safe & Secure πŸš€ Test & harden your system prompt GitHub stars
πŸ—ΊοΈ LLMmap Tool for mapping LLM vulnerabilities GitHub stars
πŸ›‘οΈ Agentic Security Security toolkit for AI agents GitHub stars
🧠 Mindgard CLI Command-line interface for Mindgard security tools GitHub stars
πŸ”’ LLM Confidentiality Tool for ensuring confidentiality in LLMs GitHub stars
πŸ”’ PyRIT The Python Risk Identification Tool for generative AI (PyRIT) is an open access automation framework to empower security professionals and machine learning engineers to proactively find risks in their generative AI systems. GitHub stars

How to run garak

python -m pip install -U garak

Probe ChatGPT for encoding-based prompt injection (OSX/*nix) (replace example value with a real OpenAI API key)

Probes is a simple .py file with prompts for LLM

Examples

export OPENAI_API_KEY="sk-123XXXXXXXXXXXX"
python3 -m garak --model_type openai --model_name gpt-3.5-turbo --probes encoding

See if the Hugging Face version of GPT2 is vulnerable to DAN 11.0

python3 -m garak --model_type huggingface --model_name gpt2 --probes dan.Dan_11_0

More examples on Garak Tool instruction

πŸ›‘οΈDefense

Tool Description Stars
πŸ›‘οΈ PurpleLlama Set of tools to assess and improve LLM security. GitHub stars
πŸ›‘οΈ Rebuff API with built-in rules for identifying prompt injection and detecting data leakage through canary words. GitHub stars
πŸ”’ LLM Guard Self-hostable tool with multiple prompt and output scanners for various security issues. GitHub stars
🚧 NeMo Guardrails Tool that protects against jailbreak and hallucinations with customizable rulesets. GitHub stars
πŸ‘οΈ Vigil Offers dockerized and local setup options, using proprietary HuggingFace datasets for security detection. GitHub stars
🧰 LangKit Provides functions for jailbreak detection, prompt injection, and sensitive information detection. GitHub stars
πŸ› οΈ GuardRails AI Focuses on functionality, detects presence of secrets in responses. GitHub stars
🦸 Hyperion Alpha Detects prompt injections and jailbreaks. N/A
πŸ›‘οΈ LLM-Guard Tool for securing LLM interactions. GitHub stars
🚨 Whistleblower Tool for detecting and preventing LLM vulnerabilities. GitHub stars
πŸ” Plexiglass Security tool for LLM applications. GitHub stars
πŸ” Prompt Injection defenses Rules for protected LLM GitHub stars
πŸ” LLM Data Protector Tools for protected LLM in chatbots

Threat Modeling

Tool Description
Secure LLM Deployment: Navigating and Mitigating Safety Risks Research paper on LLM security [sorry, but is really cool]
ThreatModels Repository for LLM threat models
Threat Modeling LLMs AI Village resource on threat modeling for LLMs

image image

Monitoring

Tool Description
Langfuse Open Source LLM Engineering Platform with security capabilities.

Watermarking

Tool Description
MarkLLM An Open-Source Toolkit for LLM Watermarking.

Jailbreaks

Resource Description
JailbreakBench Website dedicated to evaluating and analyzing jailbreak methods for language models
L1B3RT45 GitHub repository containing information and tools related to AI jailbreaking
llm-hacking-database This repository contains various attack against Large Language Models
HaizeLabs jailbreak Database This database contains jailbreaks for multimodal language models
Lakera PINT Benchmark A benchmark for prompt injection detection systems.

PINT Benchmark scores (by lakera)

Name PINT Score Test Date
Lakera Guard 98.0964% 2024-06-12
protectai/deberta-v3-base-prompt-injection-v2 91.5706% 2024-06-12
Azure AI Prompt Shield for Documents 91.1914% 2024-04-05
Meta Prompt Guard 90.4496% 2024-07-26
protectai/deberta-v3-base-prompt-injection 88.6597% 2024-06-12
WhyLabs LangKit 80.0164% 2024-06-12
Azure AI Prompt Shield for User Prompts 77.504% 2024-04-05
Epivolis/Hyperion 62.6572% 2024-06-12
fmops/distilbert-prompt-injection 58.3508% 2024-06-12
deepset/deberta-v3-base-injection 57.7255% 2024-06-12
Myadav/setfit-prompt-injection-MiniLM-L3-v2 56.3973% 2024-06-12

Hallucinations Leaderboard

Model Hallucination Rate Factual Consistency Rate Answer Rate Average Summary Length (Words)
GPT-4o 1.5 % 98.5 % 100.0 % 77.8
Zhipu AI GLM-4-9B-Chat 1.6 % 98.4 % 100.0 % 58.1
GPT-4o-mini 1.7 % 98.3 % 100.0 % 76.3
GPT-4-Turbo 1.7 % 98.3 % 100.0 % 86.2
GPT-4 1.8 % 98.2 % 100.0 % 81.1
GPT-3.5-Turbo 1.9 % 98.1 % 99.6 % 84.1
Microsoft Orca-2-13b 2.5 % 97.5 % 100.0 % 66.2
Intel Neural-Chat-7B-v3-3 2.7 % 97.3 % 100.0 % 60.7
Snowflake-Arctic-Instruct 3.0 % 97.0 % 100.0 % 68.7
Microsoft Phi-3-mini-128k-instruct 3.1 % 96.9 % 100.0 % 60.1
01-AI Yi-1.5-34B-Chat 3.9 % 96.1 % 100.0 % 83.7
Llama-3.1-405B-Instruct 3.9 % 96.1 % 99.6 % 85.7
Microsoft Phi-3-mini-4k-instruct 4.0 % 96.0 % 100.0 % 86.8
Llama-3-70B-Chat-hf 4.1 % 95.9 % 99.2 % 68.5
Mistral-Large2 4.4 % 95.6 % 100.0 % 77.4
Mixtral-8x22B-Instruct-v0.1 4.7 % 95.3 % 99.9 % 92.0
Qwen2-72B-Instruct 4.9 % 95.1 % 100.0 % 100.1
Llama-3.1-70B-Instruct 5.0 % 95.0 % 100.0 % 79.6
01-AI Yi-1.5-9B-Chat 5.0 % 95.0 % 100.0 % 85.7
Llama-3.1-8B-Instruct 5.5 % 94.5 % 100.0 % 71.0
Llama-2-70B-Chat-hf 5.9 % 94.1 % 99.9 % 84.9
Google Gemini-1.5-flash 6.6 % 93.4 % 98.1 % 62.8
Microsoft phi-2 6.7 % 93.3 % 91.5 % 80.8
Google Gemma-2-2B-it 7.0 % 93.0 % 100.0 % 62.2
Llama-3-8B-Chat-hf 7.4 % 92.6 % 99.8 % 79.7
Google Gemini-Pro 7.7 % 92.3 % 98.4 % 89.5
CohereForAI c4ai-command-r-plus 7.8 % 92.2 % 100.0 % 71.2
01-AI Yi-1.5-6B-Chat 8.2 % 91.8 % 100.0 % 98.9
databricks dbrx-instruct 8.3 % 91.7 % 100.0 % 85.9
Anthropic Claude-3-5-sonnet 8.6 % 91.4 % 100.0 % 103.0
Mistral-7B-Instruct-v0.3 9.8 % 90.2 % 100.0 % 98.4
Anthropic Claude-3-opus 10.1 % 89.9 % 95.5 % 92.1
Google Gemma-2-9B-it 10.1 % 89.9 % 100.0 % 70.2
Llama-2-13B-Chat-hf 10.5 % 89.5 % 99.8 % 82.1
Llama-2-7B-Chat-hf 11.3 % 88.7 % 99.6 % 119.9
Microsoft WizardLM-2-8x22B 11.7 % 88.3 % 99.9 % 140.8
Amazon Titan-Express 13.5 % 86.5 % 99.5 % 98.4
Google PaLM-2 14.1 % 85.9 % 99.8 % 86.6
Google Gemma-7B-it 14.8 % 85.2 % 100.0 % 113.0
Cohere-Chat 15.4 % 84.6 % 98.0 % 74.4
Anthropic Claude-3-sonnet 16.3 % 83.7 % 100.0 % 108.5
Google Gemma-1.1-7B-it 17.0 % 83.0 % 100.0 % 64.3
Anthropic Claude-2 17.4 % 82.6 % 99.3 % 87.5
Google Flan-T5-large 18.3 % 81.7 % 99.3 20.9
Cohere 18.9 % 81.1 % 99.8 % 59.8
Mixtral-8x7B-Instruct-v0.1 20.1 % 79.9 % 99.9 % 90.7
Apple OpenELM-3B-Instruct 24.8 % 75.2 % 99.3 % 47.2
Google Gemma-1.1-2B-it 27.8 % 72.2 % 100.0 % 66.8
Google Gemini-1.5-Pro 28.1 % 71.9 % 89.3 % 82.1
TII falcon-7B-instruct 29.9 % 70.1 % 90.0 % 75.5

From this repo (update 5 aug)

image

This is a Safety Benchmark from stanford university


RAG Security

Resource Description
Security Risks in RAG Article on security risks in Retrieval-Augmented Generation (RAG)
How RAG Poisoning Made LLaMA3 Racist Blog post about RAG poisoning and its effects on LLaMA3
Adversarial AI - RAG Attacks and Mitigations GitHub repository on RAG attacks, mitigations, and defense strategies
PoisonedRAG GitHub repository about poisoned RAG systems
ConfusedPilot: Compromising Enterprise Information Integrity and Confidentiality with Copilot for Microsoft 365 Article about RAG vulnerabilities

image

Agentic security

Tool Description Stars
invariant A trace analysis tool for AI agents. GitHub stars
AgentBench A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24) GitHub stars

PoC

Tool Description Stars
Visual Adversarial Examples Jailbreaking Large Language Models with Visual Adversarial Examples GitHub stars
Weak-to-Strong Generalization Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision GitHub stars
Image Hijacks Repository for image-based hijacks of large language models GitHub stars
CipherChat Secure communication tool for large language models GitHub stars
LLMs Finetuning Safety Safety measures for fine-tuning large language models GitHub stars
Virtual Prompt Injection Tool for virtual prompt injection in language models GitHub stars
FigStep Jailbreaking Large Vision-language Models via Typographic Visual Prompts GitHub stars
stealing-part-lm-supplementary Some code for "Stealing Part of a Production Language Model" GitHub stars
Hallucination-Attack Attack to induce LLMs within hallucinations GitHub stars
llm-hallucination-survey Reading list of hallucination in LLMs. Check out our new survey paper: "Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models" GitHub stars
LMSanitator LMSanitator: Defending Large Language Models Against Stealthy Prompt Injection Attacks GitHub stars
Imperio Imperio: Robust Prompt Engineering for Anchoring Large Language Models GitHub stars
Backdoor Attacks on Fine-tuned LLaMA Backdoor Attacks on Fine-tuned LLaMA Models GitHub stars
CBA Consciousness-Based Authentication for LLM Security GitHub stars
MuScleLoRA A Framework for Multi-scenario Backdoor Fine-tuning of LLMs GitHub stars
BadActs BadActs: Backdoor Attacks against Large Language Models via Activation Steering GitHub stars
TrojText Trojan Attacks on Text Classifiers GitHub stars
AnyDoor Create Arbitrary Backdoor Instances in Language Models GitHub stars
PromptWare A Jailbroken GenAI Model Can Cause Real Harm: GenAI-powered Applications are Vulnerable to PromptWares GitHub stars

Study resource

Tool Description
Gandalf Interactive LLM security challenge game
Prompt Airlines Platform for learning and practicing prompt engineering
PortSwigger LLM Attacks Educational resource on WEB LLM security vulnerabilities and attacks
Invariant Labs CTF 2024 CTF. Your should hack llm agentic
DeepLearning.AI Red Teaming Course Short course on red teaming LLM applications
Learn Prompting: Offensive Measures Guide on offensive prompt engineering techniques
Application Security LLM Testing Free LLM security testing
Salt Security Blog: ChatGPT Extensions Vulnerabilities Article on security flaws in ChatGPT browser extensions
safeguarding-llms TMLS 2024 Workshop: A Practitioner's Guide To Safeguarding Your LLM Applications
Damn Vulnerable LLM Agent Intentionally vulnerable LLM agent for security testing and education
GPT Agents Arena Platform for testing and evaluating LLM agents in various scenarios
AI Battle Interactive game focusing on AI security challenges

image

πŸ“Š Community research articles

Title Authors Year
πŸ“„ Bypassing Meta's LLaMA Classifier: A Simple Jailbreak Robust Intelligence 2024
πŸ“„ Vulnerabilities in LangChain Gen AI Unit42 2024
πŸ“„ Detecting Prompt Injection: BERT-based Classifier WithSecure Labs 2024
πŸ“„ Practical LLM Security: Takeaways From a Year in the Trenches NVIDIA 2024

πŸŽ“ Tutorials

Resource Description
πŸ“š HADESS - Web LLM Attacks Understanding how to carry out web attacks using LLM
πŸ“š Red Teaming with LLMs Practical methods for attacking AI systems
πŸ“š Lakera LLM Security Overview of attacks on LLM

πŸ“š Books

πŸ“– Title πŸ–‹οΈ Author(s) πŸ” Description
The Developer's Playbook for Large Language Model Security Steve Wilson πŸ›‘οΈ Comprehensive guide for developers on securing LLMs
Generative AI Security: Theories and Practices (Future of Business and Finance) Ken Huang, Yang Wang, Ben Goertzel, Yale Li, Sean Wright, Jyoti Ponnapalli πŸ”¬ In-depth exploration of security theories, laws, terms and practices in Generative AI
Adversarial AI Attacks, Mitigations, and Defense Strategies: A cybersecurity professional's guide to AI attacks, threat modeling, and securing AI with MLSecOps John Sotiropoulos Practical examples of code for your best mlsecops pipeline

BLOGS

Blog
https://embracethered.com/blog/
🐦 https://twitter.com/llm_sec
🐦 https://twitter.com/LLM_Top10
🐦 https://twitter.com/aivillage_dc
🐦 https://twitter.com/elder_plinius/
https://hiddenlayer.com/
https://t.me/llmsecurity

DATA

Resource Description
Safety and privacy with Large Language Models GitHub repository on LLM safety and privacy
Jailbreak LLMs Data for jailbreaking Large Language Models
ChatGPT System Prompt Repository containing ChatGPT system prompts
Do Not Answer Project related to LLM response control
ToxiGen Microsoft dataset
SafetyPrompts A Living Catalogue of Open Datasets for LLM Safety
llm-security-prompt-injection This project investigates the security of large language models by performing binary classification of a set of input prompts to discover malicious prompts. Several approaches have been analyzed using classical ML algorithms, a trained LLM model, and a fine-tuned LLM model.

OPS

Group 4

Resource Description
https://sysdig.com/blog/llmjacking-stolen-cloud-credentials-used-in-new-ai-attack/ LLMJacking: Stolen Cloud Credentials Used in New AI Attack
https://huggingface.co/docs/hub/security Hugging Face Hub Security Documentation
https://developer.nvidia.com/blog/secure-llm-tokenizers-to-maintain-application-integrity/ Secure LLM Tokenizers to Maintain Application Integrity
https://sightline.protectai.com/ Sightline by ProtectAI

Check vulnerabilities on:
β€’ Nemo by Nvidia
β€’ Deep Lake
β€’ Fine-Tuner AI
β€’ Snorkel AI
β€’ Zen ML
β€’ Lamini AI
β€’ Comet
β€’ Titan ML
β€’ Deepset AI
β€’ Valohai

For finding LLMops tools vulnerabilities

πŸ— Frameworks


OWASP LLM TOP 10

10 vulnerabilities for llm

LLM AI Cybersecurity & Governance Checklist 2

Brief explanation

πŸ’‘ Best Practices

OWASP LLMSVS

Large Language Model Security Verification Standard

Project Link

The primary aim of the OWASP LLMSVS Project is to provide an open security standard for systems which leverage artificial intelligence and Large Language Models.

The standard provides a basis for designing, building, and testing robust LLM backed applications, including:

  • Architectural concerns
  • Model lifecycle
  • Model training
  • Model operation and integration
  • Model storage and monitoring

🌐 Community

Platform Details
OWASP SLACK Channels:
β€’ #project-top10-for-llm
β€’ #ml-risk-top5
β€’ #project-ai-community
β€’ #project-mlsec-top10
β€’ #team-llm_ai-secgov
β€’ #team-llm-redteam
β€’ #team-llm-v2-brainstorm
Awesome LLM Security GitHub repository
PWNAI Telegram channel
AiSec_X_Feed Telegram channel
LVE_Project Official website
Lakera AI Security resource hub Google Sheets document
llm-testing-findings Templates with recomendation, cwe and other
Name LLM Security Company URL
Giskard AI quality management system for ML models, focusing on vulnerabilities such as performance bias, hallucinations, and prompt injections. https://www.giskard.ai/
Lakera Lakera Guard enhances LLM application security and counters a wide range of AI cyber threats. https://www.lakera.ai/
Lasso Security Focuses on LLMs, offering security assessment, advanced threat modeling, and specialized training programs. https://www.lasso.security/
LLM Guard Designed to strengthen LLM security, offers sanitization, malicious language detection, data leak prevention, and prompt injection resilience. https://llmguard.com
LLM Fuzzer Open-source fuzzing framework specifically designed for LLMs, focusing on integration into applications via LLM APIs. https://github.com/llmfuzzer
Prompt Security Provides a security, data privacy, and safety approach across all aspects of generative AI, independent of specific LLMs. https://www.prompt.security
Rebuff Self-hardening prompt injection detector for AI applications, using a multi-layered protection mechanism. https://github.com/rebuff
Robust Intelligence Provides AI firewall and continuous testing and evaluation. Creators of the airisk.io database donated to MITRE. https://www.robustintelligence.com/
WhyLabs Protects LLMs from security threats, focusing on data leak prevention, prompt injection monitoring, and misinformation prevention. https://www.whylabs.ai/

Releases

No releases published

Packages

No packages published