ictnlp / TruthX Star 106 Code Issues Pull requests Code for ACL 2024 paper "TruthX: Alleviating Hallucinations by Editing Large Language Models in Truthful Space" safety llama representation language-model mistral explainable-ai hallucination baichuan hallucinations gpt-4 truthfulness llm llms chatgpt chatglm llm-inference llama2 llama3 Updated Mar 26, 2024 Python
thu-ml / MMTrustEval Star 90 Code Issues Pull requests A toolbox for benchmarking trustworthiness of multimodal large language models (MultiTrust, NeurIPS 2024 Track Datasets and Benchmarks) benchmark privacy toolbox safety multi-modal fairness robustness claude gpt-4 trustworthy-ai truthfulness mllm Updated Sep 28, 2024 Python
OpenMOSS / Say-I-Dont-Know Star 66 Code Issues Pull requests [ICML'2024] Can AI Assistants Know What They Don't Know? alignment truthfulness large-language-models Updated Feb 5, 2024 Python
alexisrozhkov / llm-calib Star 0 Code Issues Pull requests Improving LLM truthfulness via reporting confidence alignment truthfulness llm rlhf Updated Jun 9, 2024 Python