PubEvalAI is a comprehensive evaluation framework designed to enhance decision-making in the public procurement of conversational AI solutions. By focusing on critical aspects of performance, cost, and security, this framework aims to provide government agencies with robust tools for assessing and selecting AI products that meet their needs effectively.
Public procurement is the process by which government agencies acquire goods, services, or works from private suppliers through a structured method of needs assessment, bidding, evaluation, and contract award.
Procurement of conversational AI solutions can present several challenges:
- Cost Management: The total cost of ownership can be difficult to estimate, with hidden costs potentially arising.
- Scalability: Handling multiple simultaneous interactions efficiently is crucial for maintaining performance during peak usage times.
- Perfromance meaurement: Measuring the performance and of conversational AI solutions can be difficult.
- Data Privacy and Security Concerns: Ensuring compliance with data protection regulations and securing sensitive information is critical.
PubEvalAI addresses these challenges by providing a structured evaluation framework that focuses on key metrics to assess conversational AI solutions effectively.
Evaluation Metrics Cost per Query: Measures the cost associated with each interaction to ensure cost-effectiveness. Concurrency: Assesses the system's ability to handle multiple simultaneous interactions without performance degradation. Latency: Evaluates response times to ensure timely and efficient interactions.
Novel "MU" Metric helps in measuring how often the conversational AI correctly avoids answering irrelevant queries. Irrelevant queries are off-topic questions that are not related to the intended focus of the conversational AI solution.
Naming the Metric: In Robert M. Pirsig's 1974 novel Zen and the Art of Motorcycle Maintenance, "mu" is translated as “no thing”, reflecting a concept of “un-asking the question+