Stars
Sharing both practical insights and theoretical knowledge about LLM evaluation that we gathered while managing the Open LLM Leaderboard and designing lighteval!
Dataset, amendment for RST annotation guidelines, and code for analysis experiments for the paper "Rhetorical Strategies in the UN Security Council: Rhetorical Structure Theory and Conflicts".
Utility for behavioral and representational analyses of Language Models
总结梳理自然语言处理工程师(NLP)需要积累的各方面知识,包括面试题,各种基础知识,工程能力等等,提升核心竞争力
Latest data for the multilingual DISRPT discourse benchmark
Fast Discourse Parser to find latent Rhetorical STructure (RST) in text.
CausalGym: Benchmarking causal interpretability methods on linguistic tasks
BLEURT is a metric for Natural Language Generation based on transfer learning.
Code to compute permutation and drop-column importances in Python scikit-learn models
ACL 2023 paper "A Critical Evaluation of Evaluations for Long-form Question Answering"
GUCorpling's DISRPT 2021 shared task submission
The official code repo of paper: COBRA Frames: Contextual Reasoning about Effects and Harms of Offensive Statements(https://arxiv.org/abs/2306.01985)
Adversarial Natural Language Inference Benchmark
Code and data for "A fine-grained comparison of pragmatic language understanding in humans and language models"
Seahorse is a dataset for multilingual, multi-faceted summarization evaluation. It consists of 96K summaries with human ratings along 6 quality dimensions: comprehensibility, repetition, grammar, a…
Official Implementation of ACL 2023 Paper: "Generating EDU Extracts for Plan-Guided Summary Re-Ranking"
arXiv LaTeX Cleaner: Easily clean the LaTeX code of your paper to submit to arXiv
Apps built using Inspired Cognition's Critique.
Dataset, metrics, and models for TACL 2023 paper MACSUM: Controllable Summarization with Mixed Attributes.
Resources for the "CTRLsum: Towards Generic Controllable Text Summarization" paper
Must-read papers, related blogs and API tools on the pre-training and tuning methods for ChatGPT.