Skip to content

Dialog Evaluation Paper List: include multiple different dialog tasks

Notifications You must be signed in to change notification settings

pygongnlp/dialog_evaluation_paper_list

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 

Repository files navigation

Dialog Evaluation Paper List

We collect and classify multiple evaluation methods for different dialog tasks, start from 2012.

Tasks include:

  • Open-domain Dialog
  • Task-oriented Dialog
  • Dialog Summarization
  • Dialog Management
  • Dialog State Track
  • Dialog Policy
  • Knowledge-ground Dialog
  • Conversational Search
  • Conversational Recommendation
  • Others

Modals include:

  • Text-based Dialog
  • Speech-based Dialog
  • Visual-based Dialog
  • MultiModal-based Dialog

Survey

  1. Survey on evaluation methods for dialogue systems. Artificial Intelligence Review2021
  2. Conversational Recommendation: Formulation, Methods, and Evaluation. SIGIR2020
  3. A review of evaluation techniques for social dialogue systems. ISIAA@ICMI2017
  4. A Comprehensive Assessment of Dialog Evaluation Metrics. CoRR2021
  5. How to Evaluate Your Dialogue Models: A Review of Approaches. CoRR2021

2022

  1. MDD-Eval: Self-Training on Augmented Data for Multi-Domain Dialogue Evaluation. AAAI
  2. Towards Fair Evaluation of Dialogue State Tracking by Flexible Incorporation of Turn-level Performances. ACL
  3. What is wrong with you?: Leveraging User Sentiment for Automatic Dialog Evaluation. ACL
  4. DEAM: Dialogue Coherence Evaluation using AMR-based Semantic Manipulations. ACL
  5. Probing the Robustness of Trained Metrics for Conversational Dialogue Systems. ACL
  6. Mismatch between Multi-turn Dialogue and its Evaluation Metric in Dialogue State Tracking. ACL
  7. Human Evaluation of Conversations is an Open Problem: comparing the sensitivity of various methods for evaluating dialogue agents. ConvAI@ACL
  8. Relevance in Dialogue: Is Less More? An Empirical Comparison of Existing Metrics, and a Novel Simple Metric. ConvAI@ACL 2022
  9. Doctor XAvIer: Explainable Diagnosis on Physician-Patient Dialogues and XAI Evaluation. BioNLP@ACL
  10. Open-Domain Dialog Evaluation Using Follow-Ups Likelihood. COLING
  11. Does GPT-3 Generate Empathetic Dialogues? A Novel In-Context Example Selection Method and Automatic Evaluation Metric for Empathetic Dialogue Generation. COLING
  12. SelF-Eval: Self-supervised Fine-grained Dialogue Evaluation. COLING
  13. Integrating Pretrained Language Model for Dialogue Policy Evaluation. ICASSP
  14. A Dependency-Aware Utterances Permutation Strategy to Improve Conversational Evaluation. ECIR
  15. DialSummEval: Revisiting Summarization Evaluation for Dialogues. NAACL
  16. Explaining Dialogue Evaluation Metrics using Adversarial Behavioral Analysis. NAACL
  17. Long-term Control for Dialogue Generation: Methods and Evaluation. NAACL
  18. Generate, Evaluate, and Select: A Dialogue System with a Response Evaluator for Diversity-Aware Response Generation. NAACL-HLT (Student Research Workshop)
  19. MultiWOZ 2.4: A Multi-Domain Task-Oriented Dialogue Dataset with Essential Annotation Corrections to Improve State Tracking Evaluation. SIGDIAL
  20. A Systematic Evaluation of Response Selection for Open Domain Dialogue. SIGDIAL
  21. Dialogue Evaluation with Offline Reinforcement Learning. SIGDIAL
  22. Evaluating N-best Calibration of Natural Language Understanding for Dialogue Systems. SIGDIAL
  23. Evaluation of Off-the-shelf Speech Recognizers on Different Accents in a Dialogue Domain. LREC
  24. Evaluating the Effects of Embedding with Speaker Identity Information in Dialogue Summarization. LREC
  25. Design and Evaluation of the Corpus of Everyday Japanese Conversation. LREC
  26. Evaluating Gender Bias in Film Dialogue. NLDB
  27. Statistical and clinical utility of multimodal dialogue-based speech and facial metrics for Parkinson's disease assessment. INTERSPEECH
  28. Which Model is Best: Comparing Methods and Metrics for Automatic Laughter Detection in a Naturalistic Conversational Dataset. INTERSPEECH
  29. Evaluation of call centre conversations based on a high-level symbolic representation. INTERSPEECH
  30. Evaluating Attribution in Dialogue Systems: The BEGIN Benchmark. TACL
  31. A Review of Evaluation Practices of Gesture Generation in Embodied Conversational Agents. IEEE Trans. Hum. Mach. Syst.
  32. Does Social Presence Increase Perceived Competence?: Evaluating Conversational Agents in Advice Giving Through a Video-Based Survey. Proc. ACM Hum. Comput. Interact
  33. "I don't know what you mean by 'I am anxious'": A New Method for Evaluating Conversational Agent Responses to Standardized Mental Health Inputs for Anxiety and Depression. TIIS
  34. Ditch the Gold Standard: Re-evaluating Conversational Question Answering. ACL
  35. Evaluating the Cranfield Paradigm for Conversational Search Systems. ICTIR
  36. Evaluating Mixed-initiative Conversational Search Systems via User Simulation. WSDM
  37. FlowEval: A Consensus-Based Dialogue Evaluation Framework Using Segment Act Flows. CoRR
  38. Report from the NSF Future Directions Workshop on Automatic Evaluation of Dialog: Research Directions and Challenges. CoRR
  39. MME-CRS: Multi-Metric Evaluation Based on Correlation Re-Scaling for Evaluating Open-Domain Dialogue. CoRR
  40. Interactive Evaluation of Dialog Track at DSTC9. CoRR
  41. EnDex: Evaluation of Dialogue Engagingness at Scale. CoRR
  42. FineD-Eval: Fine-grained Automatic Dialogue-Level Evaluation. CoRR
  43. End-to-End Evaluation of a Spoken Dialogue System for Learning Basic Mathematics. CoRR
  44. Bipartite-play Dialogue Collection for Practical Automatic Evaluation of Dialogue Systems. CoRR
  45. CGoDial: A Large-Scale Benchmark for Chinese Goal-oriented Dialog Evaluation. CoRR
  46. Analyzing and Evaluating Faithfulness in Dialogue Summarization. CoRR
  47. ED-FAITH: Evaluating Dialogue Summarization on Faithfulness. CoRR
  48. INFACT: An Online Human Evaluation Framework for Conversational Recommendation. CoRR
  49. Evaluation of Automated Speech Recognition Systems for Conversational Speech: A Linguistic Perspective. CoRR
  50. Evaluating Data-Driven Co-Speech Gestures of Embodied Conversational Agents through Real-Time Interaction. CoRR
  51. Evaluating Conversational Recommender Systems.CoRR

2021

  1. Conversation Graph: Data Augmentation, Training and Evaluation for Non-Deterministic Dialogue Management. TACL
  2. Meta-evaluation of Conversational Search Evaluation Metrics. TIS
  3. D-Score: Holistic Dialogue Evaluation Without Reference. TASLP
  4. How Am I Doing?: Evaluating Conversational Search Systems Offline. TIS
  5. Preserving Conversations with Contemporary Holocaust Witnesses: Evaluation of Interactions with a Digital 3D Testimony. CHI Extended Abstracts
  6. Heuristic Evaluation of Conversational Agents. CHI
  7. "How Robust R U?": Evaluating Task-Oriented Dialogue Systems on Spoken Conversations. ASRU
  8. POSSCORE: A Simple Yet Effective Evaluation of Conversational Search with Part of Speech Labelling. CIKM
  9. Evaluating Human-AI Hybrid Conversational Systems with Chatbot Message Suggestions. CIKM
  10. Enhancing the Open-Domain Dialogue Evaluation in Latent Space. ACL Findings
  11. RedditBias: A Real-World Resource for Bias Evaluation and Debiasing of Conversational Language Models. ACL
  12. Towards a more Robust Evaluation for Conversational Question Answering. ACL
  13. Improving Automated Evaluation of Open Domain Dialog via Diverse Reference Augmentation. ACL Findings
  14. REAM$\sharp$: An Enhancement Approach to Reference-based Evaluation Metrics for Open-domain Dialog Generation. ACL Findings
  15. What Did You Refer to? Evaluating Co-References in Dialogue. ACL Findings
  16. RADDLE: An Evaluation Benchmark and Analysis Platform for Robust Task-oriented Dialog Systems. ACL
  17. A Human-machine Collaborative Framework for Evaluating Malevolence in Dialogues. ACL
  18. LEGOEval: An Open-Source Toolkit for Dialogue System Evaluation via Crowdsourcing. ACL demo
  19. Towards Quantifiable Dialogue Coherence Evaluation。 ACL
  20. DynaEval: Unifying Turn and Dialogue Level Evaluation. ACL
  21. Hierarchical Dependence-aware Evaluation Measures for Conversational Search. SIGIR
  22. The Interplay of Task Success and Dialogue Quality: An in-depth Evaluation in Task-Oriented Visual Dialogues. EACL
  23. Towards Automatic Evaluation of Dialog Systems: A Model-Free Off-Policy Evaluation Approach. EMNLP
  24. $Q2$: Evaluating Factual Consistency in Knowledge-Grounded Dialogues via Question Generation and Question Answering. EMNLP
  25. NDH-Full: Learning and Evaluating Navigational Agents on Full-Length Dialogue. EMNLP
  26. Building and Evaluating Open-Domain Dialogue Corpora with Clarifying Questions. EMNLP
  27. Large-Scale Quantitative Evaluation of Dialogue Agents' Response Strategies against Offensive Users. SIGDIAL
  28. How "open" are the conversations with open-domain chatbots? A proposal for Speech Event based evaluation. SIGDIAL
  29. Contrastive Response Pairs for Automatic Evaluation of Non-task-oriented Neural Conversational Models. SIGDIAL
  30. Simulating User Satisfaction for the Evaluation of Task-oriented Dialogue Systems. SIGIR
  31. Non-goal oriented dialogue agents: state of the art, dataset, and evaluation. Artif. Intell. Rev
  32. An Evaluation of Chinese Human-Computer Dialogue Technology. Data Intell.
  33. CoCo: Controllable Counterfactuals for Evaluating Dialogue State Trackers. ICLR
  34. WeChat AI's Submission for DSTC9 Interactive Dialogue Evaluation Track. CoRR
  35. On the Use of Linguistic Features for the Evaluation of Generative Dialogue Systems. CoRR
  36. Towards Quantifiable Dialogue Coherence Evaluation. CoRR
  37. Improving Computer Generated Dialog with Auxiliary Loss Functions and Custom Evaluation Metrics. CoRR
  38. Naturalness Evaluation of Natural Language Generation in Task-oriented Dialogues using BERT. CoRR
  39. Investigating the Impact of Pre-trained Language Models on Dialog Evaluation. CoRR
  40. Automatic Evaluation and Moderation of Open-domain Dialogue Systems. CoRR
  41. User Response and Sentiment Prediction for Automatic Dialogue Evaluation. CoRR
  42. Evaluate On-the-job Learning Dialogue Systems and a Case Study for Natural Language Understanding. CoRR
  43. Evaluating Predictive Uncertainty under Distributional Shift on Dialogue Dataset. CoRR
  44. Evaluating Pretrained Transformer Models for Entity Linking in Task-Oriented Dialog. CoRR
  45. A Conceptual Framework for Implicit Evaluation of Conversational Search Interfaces. CoRR
  46. An Automated Quality Evaluation Framework of Psychotherapy Conversations with Local Quality Estimates. CoRR
  47. Is my agent good enough? Evaluating Embodied Conversational Agents with Long and Short-term interactions. CoRR
  48. Evaluating Trust in the Context of Conversational Information Systems for new users of the Internet. CoRR

2020

  1. Improving Dialog Evaluation with a Multi-reference Adversarial Dataset and Large Scale Pretraining. TACL
  2. PONE: A Novel Automatic Evaluation Metric for Open-domain Generative Dialogue Systems. TIS
  3. How to Evaluate Single-Round Dialogues Like Humans: An Information-Oriented Metric. TASLP
  4. Predictive Engagement: An Efficient Metric for Automatic Evaluation of Open-Domain Dialogue Systems. AAAI
  5. Studying the Effects of Cognitive Biases in Evaluation of Conversational Agents. CHI
  6. A Conversational Agent to Improve Response Quality in Course Evaluations. CHI Extended Abstracts
  7. Beyond User Self-Reported Likert Scale Ratings: A Comparison Model for Automatic Dialog Evaluation. ACL
  8. USR: An Unsupervised and Reference Free Evaluation Metric for Dialog Generation. ACL
  9. Towards Holistic and Automatic Evaluation of Open-Domain Dialogue Generation. ACL
  10. Can You Put it All Together: Evaluating Conversational Agents' Ability to Blend Skills. ACL
  11. Learning an Unreferenced Metric for Online Dialogue Evaluation. ACL
  12. Evaluating Dialogue Generation Systems via Response Selection. ACL
  13. Designing Precise and Robust Dialogue Response Evaluators. ACL
  14. uBLEU: Uncertainty-Aware Automatic Evaluation Method for Open-Domain Dialogue Systems. ACL student
  15. ConvLab-2: An Open-Source Toolkit for Building, Evaluating, and Diagnosing Dialogue Systems. ACL demo
  16. Voiceai Systems to NIST Sre19 Evaluation: Robust Speaker Recognition on Conversational Telephone Speech. ICASSP
  17. Semantic Diversity for Natural Language Understanding Evaluation in Dialog Systems. COLING Industry
  18. Evaluating Cross-Lingual Transfer Learning Approaches in Multilingual Conversational Agent Models. COLING (Industry)
  19. Language Model Transformers as Evaluators for Open-domain Dialogues. COLING
  20. Deconstruct to Reconstruct a Configurable Evaluation Metric for Open-Domain Dialogue Systems. COLING
  21. A Comprehensive Evaluation of Incremental Speech Recognition and Diarization for Conversational AI. COLING
  22. Spot The Bot: A Robust and Efficient Framework for the Evaluation of Conversational Dialogue Systems. EMNLP
  23. GRADE: Automatic Graph-Enhanced Coherence Metric for Evaluating Open-Domain Dialogue Systems. EMNLP
  24. Interactive Evaluation of Conversational Agents: Reflections on the Impact of Search Task Design. ICTIR
  25. Treating Dialogue Quality Evaluation as an Anomaly Detection Problem. LREC
  26. Evaluation of Off-the-shelf Speech Recognizers Across Diverse Dialogue Domains. LREC
  27. Evaluation of Argument Search Approaches in the Context of Argumentative Dialogue Systems. LREC
  28. Towards Unified Dialogue System Evaluation: A Comprehensive Analysis of Current Evaluation Protocols. SIGDIAL
  29. Unsupervised Evaluation of Interactive Dialog with DialoGPT. SIGDIAL
  30. Is Your Goal-Oriented Dialog Model Performing Really Well? Empirical Analysis of System-wise Evaluation. SIGDIAL
  31. FinChat: Corpus and Evaluation Setup for Finnish Chat Conversations on Everyday Topics. INTERSPEECH
  32. Challenges in the Evaluation of Conversational Search Systems. Converse@KDD
  33. Evaluating Conversational Recommender Systems via User Simulation. KDD
  34. A Revised Generative Evaluation of Visual Dialogue. CoRR
  35. How To Evaluate Your Dialogue System: Probe Tasks as an Alternative for Token-level Evaluation Metrics. CoRR
  36. Turn-level Dialog Evaluation with Dialog-level Weak Signals for Bot-Human Hybrid Customer Service Systems. CoRR
  37. Submitting surveys via a conversational interface: an evaluation of user acceptance and approach effectiveness. CoRR
  38. An Evaluation Protocol for Generative Conversational Systems. CoRR

2019

  1. SSA: A More Humanized Automatic Evaluation Method for Open Dialogue Generation. IJCNN
  2. Re-Evaluating ADEM: A Deeper Look at Scoring Dialogue Responses. AAAI
  3. Probabilistic-Logic Bots for Efficient Evaluation of Business Rules Using Conversational Interfaces. AAAI
  4. Towards a Metric for Automated Conversational Dialogue System Evaluation and Improvement. INLG
  5. Importance of Search and Evaluation Strategies in Neural Dialogue Modeling. INLG
  6. Towards Best Experiment Design for Evaluating Dialogue System Output. INLG
  7. Towards Coherent and Engaging Spoken Dialog Response Generation Using Automatic Conversation Evaluators. INLG
  8. Are the Tools up to the Task? an Evaluation of Commercial Dialog Tools in Developing Conversational Enterprise-grade Dialog Systems. NAACL-HLT
  9. Evaluating and Enhancing the Robustness of Dialogue Systems: A Case Study on a Negotiation Agent. NAACL-HLT
  10. Evaluating Coherence in Dialogue Systems using Entailment. NAACL-HLT
  11. Evaluating and Enhancing the Robustness of Retrieval-Based Dialogue Systems with Adversarial Examples. NLPCC
  12. Approximating Interactive Human Evaluation with Self-Play for Open-Domain Dialog Systems. NeurIPS
  13. Investigating Evaluation of Open-Domain Dialogue Systems With Human Generated Multiple References. SIGdial`
  14. A Crowd-based Evaluation of Abuse Response Strategies in Conversational Agents. SIGdial
  15. User Evaluation of a Multi-dimensional Statistical Dialogue System. SIGdial
  16. Automatic evaluation of end-to-end dialog systems with adequacy-fluency metrics. Comput. Speech Lang.
  17. MusicBot: Evaluating Critiquing-Based Music Recommenders with Conversational Interaction. CIKM
  18. Better Automatic Evaluation of Open-Domain Dialogue Systems with Contextualized Embeddings. CoRR
  19. Domain-Independent turn-level Dialogue Quality Evaluation via User Satisfaction Estimation. CoRR
  20. ACUTE-EVAL: Improved Dialogue Evaluation with Optimized Questions and Multi-turn Comparisons. CoRR
  21. How to Evaluate the Next System: Automatic Dialogue Evaluation from the Perspective of Continual Learning. CoRR
  22. Evaluating Older Users' Experiences with Commercial Dialogue Systems: Implications for Future Design and Development.CoRR
  23. Short Text Conversation Based on Deep Neural Network and Analysis on Evaluation Measure. CoRR
  24. SIMMC: Situated Interactive Multi-Modal Conversational Data Collection And Evaluation Platform. CoRR
  25. Multi-domain Conversation Quality Evaluation via User Satisfaction Estimation. CoRR

2018

  1. RUBER: An Unsupervised Method for Automatic Evaluation of Open-Domain Dialog Systems. AAAI
  2. Evaluation of Real-time Deep Learning Turn-taking Models for Multiple Dialogue Scenarios. ICMI
  3. One "Ruler" for All Languages: Multi-Lingual Dialogue Evaluation with Adversarial Multi-Task Learning. IJCAI
  4. Evaluating and Complementing Vision-to-Language Technology for People who are Blind with Conversational Crowdsourcing. IJCAI
  5. Adaboost with Auto-Evaluation for Conversational Models. IJCAI
  6. Towards a Structured Evaluation of Improv-bots: Improvisational Theatre as a Non-goal-driven Dialogue System. LaCATODA@IJCAI
  7. Expert Evaluation of a Spoken Dialogue System in a Clinical Operating Room. LREC
  8. EuroGames16: Evaluating Change Detection in Online Conversation. LREC
  9. LSDSCC: a Large Scale Domain-Specific Conversational Corpus for Response Generation with Diversity Oriented Evaluation Metrics. NAACL-HLT
  10. Empirical Evaluation of Character-Based Model on Neural Named-Entity Recognition in Indonesian Conversational Texts. NUT@EMNLP
  11. A Methodology for Evaluating Interaction Strategies of Task-Oriented Conversational Agents. SCAI@EMNLP
  12. Topic-based Evaluation for Conversational Bots. CoRR
  13. On Evaluating and Comparing Conversational Agents. CoRR

2017

  1. Adversarial evaluation for open-domain dialogue generation. SIGDIAL Conference
  2. Evaluating Natural Language Understanding Services for Conversational Question Answering Systems. SIGDIAL Conference
  3. Generating and Evaluating Summaries for Partial Email Threads: Conversational Bayesian Surprise and Silver Standards. SIGDIAL Conference
  4. Towards an Automatic Turing Test: Learning to Evaluate Dialogue Responses. ACL
  5. Evaluating Persuasion Strategies and Deep Reinforcement Learning methods for Negotiation Dialogue agents. EACL
  6. Sherlock: Experimental Evaluation of a Conversational Agent for Mobile Information Tasks. IEEE Trans. Hum. Mach. Syst
  7. Adversarial Evaluation of Dialogue Models. CoRR
  8. The First Evaluation of Chinese Human-Computer Dialogue Technology. CoRR
  9. Relevance of Unsupervised Metrics in Task-Oriented Dialogue for Evaluating Natural Language Generation. CoRR
  10. Evaluating Quality of Chatbots and Intelligent Conversational Agents. CoRR
  11. Perspectives for Evaluating Conversational AI. CoRR
  12. Evaluating Visual Conversational Agents via Cooperative Human-AI Games. CoRR

2016

  1. How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation. EMNLP
  2. Evaluation Dataset (DT-Grade) and Word Weighting Approach towards Constructed Short Answers Assessment in Tutorial Dialogue Context. BEA@NAACL-HLT
  3. On the Evaluation of Dialogue Systems with Next Utterance Classification. SIGDIAL Conference
  4. The dialogue breakdown detection challenge: Task description, datasets, and evaluation metrics. LREC
  5. Automatic creation of scenarios for evaluating spoken dialogue systems via user-simulation. Knowl. Based Syst
  6. Evaluating Prerequisite Qualities for Learning End-to-End Dialog Systems. ICLR (Poster)
  7. Interactive Topic Modeling for Exploring Asynchronous Online Conversations: Design and Evaluation of ConVisIT. TIS

2015

  1. Evaluation of Crowdsourced User Input Data for Spoken Dialog Systems. SIGDIAL Conference
  2. Evaluating Spoken Dialogue Processing for Time-Offset Interaction. SIGDIAL Conference
  3. Query Refinement Using Conversational Context: A Method and an Evaluation Resource. NLDB

2014

  1. Extrinsic Evaluation of Dialog State Tracking and Predictive Metrics for Dialog Policy Optimization. SIGDIAL Conference
  2. Evaluating a Spoken Dialogue System that Detects and Adapts to User Affective States. SIGDIAL Conference
  3. Evaluating coherence in open domain conversational systems. INTERSPEECH
  4. Modeling and evaluating dialog success in the LAST MINUTE corpus. LREC
  5. Japanese conversation corpus for training and evaluation of backchannel prediction model. LREC
  6. Network assisted rate adaptation for conversational video over LTE, concept and performance evaluation. CSWS@SIGCOMM
  7. Evaluation of a Conversation Management Toolkit for Multi Agent Programming. CoRR

2013

  1. Development and evaluation of spoken dialog systems with one or two agents. INTERSPEECH
  2. Affective evaluation of multimodal dialogue games for preschoolers using physiological signals. INTERSPEECH
  3. Evaluating spoken dialogue models under the interactive pattern recognition framework. INTERSPEECH
  4. Evaluating an adaptive dialog system for the public. INTERSPEECH
  5. How Was Your Day? Evaluating a Conversational Companion. TAC
  6. In-Context Evaluation of Unsupervised Dialogue Act Models for Tutorial Dialogue. SIGDIAL Conference
  7. Evaluation of Speech Dialog Strategies for Internet Applications in the Car. SIGDIAL Conference
  8. Evaluating State Representations for Reinforcement Learning of Turn-Taking Policies in Tutorial Dialogue. SIGDIAL Conference
  9. Evaluating a City Exploration Dialogue System with Integrated Question-Answering and Pedestrian Navigation. ACL
  10. Implementation and evaluation of a multimodal addressee identification mechanism for multiparty conversation systems. ICMI
  11. Iterative Development and Evaluation of a Social Conversational Agent. IJCNLP
  12. An Automatic Dialog Simulation Technique to Develop and Evaluate Interactive Conversational Agents. Appl. Artif. Intell

2012

  1. Practical Evaluation of Human and Synthesized Speech for Virtual Human Dialogue Systems. LREC
  2. Evaluation of Online Dialogue Policy Learning Techniques. LREC
  3. Resource Evaluation for Usable Speech Interfaces: Utilizing Human-Human Dialogue. LREC
  4. Evaluation of the KomParse Conversational Non-Player Characters in a Commercial Virtual World. LREC
  5. Evaluating expressive speech synthesis from audiobook corpora for conversational phrases. LREC
  6. Developing and evaluating an emergency scenario dialogue corpus. LREC
  7. Intrinsic and Extrinsic Evaluation of an Automatic User Disengagement Detector for an Uncertainty-Adaptive Spoken Dialogue System. HLT-NAACL
  8. Position Paper: Towards Standardized Metrics and Tools for Spoken and Multimodal Dialog System Evaluation. SDCTD@NAACL-HLT
  9. An End-to-End Evaluation of Two Situated Dialog Systems. SIGDIAL Conference
  10. Evaluating language understanding accuracy with respect to objective outcomes in a dialogue system. EACL
  11. Topic identification based extrinsic evaluation of summarization techniques applied to conversational speech. ICASSP
  12. Conversational evaluation of artificial bandwidth extension of telephone speech using a mobile handset. ICASSP
  13. Synthesis and evaluation of conversational characteristics in HMM-based speech synthesis. Speech Commun.
  14. Conversational Evaluation of Speech Bandwidth Extension Using a Mobile Handset. IEEE Signal Process. Lett.
  15. Designing generalisation evaluation function through human-machine dialogue. CoRR

Contact

If you have any questions related to the repository or want to increase any work about dialog evaluation, feel free to open an issue or email Peiyuan Gong (pygongnlp@gmail.com).

About

Dialog Evaluation Paper List: include multiple different dialog tasks

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published