Skip to content

Developing a Evaluation Framework for Conversational AI in Public Procurement: Enhancing Decision-Making in AI Product Selection

Notifications You must be signed in to change notification settings

aditisinghh17/PubEvalAI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PubEvalAI: Enhancing Decision-Making in AI Product Selection

PubEvalAI is a comprehensive evaluation framework designed to enhance decision-making in the public procurement of conversational AI solutions. By focusing on critical aspects of performance, cost, and security, this framework aims to provide government agencies with robust tools for assessing and selecting AI products that meet their needs effectively.

Understanding Public Procurement

Public procurement is the process by which government agencies acquire goods, services, or works from private suppliers through a structured method of needs assessment, bidding, evaluation, and contract award.

Procurement of conversational AI solutions can present several challenges:

  1. Cost Management: The total cost of ownership can be difficult to estimate, with hidden costs potentially arising.
  2. Scalability: Handling multiple simultaneous interactions efficiently is crucial for maintaining performance during peak usage times.
  3. Perfromance meaurement: Measuring the performance and of conversational AI solutions can be difficult.
  4. Data Privacy and Security Concerns: Ensuring compliance with data protection regulations and securing sensitive information is critical.

PubEvalAI addresses these challenges by providing a structured evaluation framework that focuses on key metrics to assess conversational AI solutions effectively.

Evaluation Metrics Cost per Query: Measures the cost associated with each interaction to ensure cost-effectiveness. Concurrency: Assesses the system's ability to handle multiple simultaneous interactions without performance degradation. Latency: Evaluates response times to ensure timely and efficient interactions.

Novel "MU" Metric helps in measuring how often the conversational AI correctly avoids answering irrelevant queries. Irrelevant queries are off-topic questions that are not related to the intended focus of the conversational AI solution.

Naming the Metric: In Robert M. Pirsig's 1974 novel Zen and the Art of Motorcycle Maintenance, "mu" is translated as “no thing”, reflecting a concept of “un-asking the question+

About

Developing a Evaluation Framework for Conversational AI in Public Procurement: Enhancing Decision-Making in AI Product Selection

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published