ml-safety

PromptInject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of LLMs to adversarial prompt attacks. 🏆 Best Paper Awards @ NeurIPS ML Safety Workshop 2022

machine-learning agi language-models ai-safety adversarial-attacks ai-alignment ml-safety gpt-3 large-language-models prompt-engineering chain-of-thought agi-alignment

Updated Feb 26, 2024
Python

hendrycks / ss-ood

Star

Self-Supervised Learning for OOD Detection (NeurIPS 2019)

uncertainty robustness self-supervised self-supervised-learning out-of-distribution-detection ml-safety

Updated Apr 29, 2021
Python

hendrycks / ethics

Star

Aligning AI With Shared Human Values (ICLR 2021)

ai-safety machine-ethics ml-safety ethical-ai gpt-3

Updated Apr 21, 2023
Python

hendrycks / imagenet-r

Star

ImageNet-R(endition) and DeepAugment (ICCV 2021)

robustness domain-adaptation out-of-distribution domain-generalization ml-safety

Updated Jul 23, 2021
Python

jiachens / ModelNet40-C

Star

Repo for "Benchmarking Robustness of 3D Point Cloud Recognition against Common Corruptions" https://arxiv.org/abs/2201.12296

benchmark computer-vision deep-learning pytorch regularization data-augmentation robustness point-cloud-processing ml-safety corruption-robustness

Updated Aug 26, 2023
Python

Giskard-AI / awesome-ai-safety

Sponsor

Star

📚 A curated list of papers & technical articles on AI Quality & Safety

Updated Oct 13, 2023

hendrycks / anomaly-seg

Star

The Combined Anomalous Object Segmentation (CAOS) Benchmark

segmentation anomaly-detection carla-simulator out-of-distribution-detection ml-safety

Updated Feb 2, 2023
Python

hendrycks / pre-training

Star

Pre-Training Buys Better Robustness and Uncertainty Estimates (ICML 2019)

uncertainty calibration robustness adversarial-examples pretrained data-corruption out-of-distribution-detection ml-safety

Updated Mar 1, 2022
Python

YyzHarry / ME-Net

Star

[ICML 2019] ME-Net: Towards Effective Adversarial Robustness with Matrix Estimation

defense matrix-completion icml robustness adversarial-example adversarial-attacks icml-2019 ml-safety matrix-estimation

Updated Sep 2, 2024
Python

hendrycks / jiminy-cricket

Star

Jiminy Cricket Environment (NeurIPS 2021)

machine-ethics ml-safety

Updated Feb 12, 2022
ZAP

yaodongyu / ProjNorm

Star

Predicting Out-of-Distribution Error with the Projection Norm

ml-safety distribution-shift

Updated Jul 27, 2022
Python

moonwatcher-ai / moonwatcher

Star

Evaluation & testing framework for computer vision models

computer-vision ai-safety ethical-artificial-intelligence ai-security mlops ml-safety ml-validation trustworthy-ai ml-testing

Updated Jun 20, 2024
Python

harsmac / MUFIACode

Star

Code for the attack multiplicative filter attack MUFIA, from the paper "Frequency-based vulnerability analysis of deep learning models against image corruptions".

machine-learning computer-vision pytorch dct-coefficients imagenet filters cifar10 frequency-analysis robustness cifar100 adversarial-machine-learning adversarial-examples adversarial-attacks domain-generalization ml-safety

Updated Jul 7, 2023
Python

ArianeDlns / adv-AI-project

Star

This repository contains the project for the Advanced AI course @CentraleSupélec

ml-safety

Updated Apr 11, 2022
Jupyter Notebook

Improve this page

Add a description, image, and links to the ml-safety topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the ml-safety topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ml-safety

Here are 19 public repositories matching this topic...

Giskard-AI / giskard

hendrycks / robustness

hendrycks / natural-adv-examples

hendrycks / outlier-exposure

JohnSnowLabs / langtest

agencyenterprise / PromptInject

hendrycks / ss-ood

hendrycks / ethics

hendrycks / imagenet-r

jiachens / ModelNet40-C

Giskard-AI / awesome-ai-safety

hendrycks / anomaly-seg

hendrycks / pre-training

YyzHarry / ME-Net

hendrycks / jiminy-cricket

yaodongyu / ProjNorm

moonwatcher-ai / moonwatcher

harsmac / MUFIACode

ArianeDlns / adv-AI-project

Improve this page

Add this topic to your repo