PubEvalAI: Enhancing Decision-Making in AI Product Selection

PubEvalAI is a comprehensive evaluation framework designed to enhance decision-making in the public procurement of conversational AI solutions. By focusing on critical aspects of performance, cost, and security, this framework aims to provide government agencies with robust tools for assessing and selecting AI products that meet their needs effectively.

Understanding Public Procurement

Public procurement is the process by which government agencies acquire goods, services, or works from private suppliers through a structured method of needs assessment, bidding, evaluation, and contract award.

Procurement of conversational AI solutions can present several challenges:

Cost Management: The total cost of ownership can be difficult to estimate, with hidden costs potentially arising.
Scalability: Handling multiple simultaneous interactions efficiently is crucial for maintaining performance during peak usage times.
Perfromance meaurement: Measuring the performance and of conversational AI solutions can be difficult.
Data Privacy and Security Concerns: Ensuring compliance with data protection regulations and securing sensitive information is critical.

PubEvalAI addresses these challenges by providing a structured evaluation framework that focuses on key metrics to assess conversational AI solutions effectively.

Evaluation Metrics Cost per Query: Measures the cost associated with each interaction to ensure cost-effectiveness. Concurrency: Assesses the system's ability to handle multiple simultaneous interactions without performance degradation. Latency: Evaluates response times to ensure timely and efficient interactions.

Novel "MU" Metric helps in measuring how often the conversational AI correctly avoids answering irrelevant queries. Irrelevant queries are off-topic questions that are not related to the intended focus of the conversational AI solution.

Naming the Metric: In Robert M. Pirsig's 1974 novel Zen and the Art of Motorcycle Maintenance, "mu" is translated as “no thing”, reflecting a concept of “un-asking the question+

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
MU_generate_queries.ipynb		MU_generate_queries.ipynb
Mu_metric.ipynb		Mu_metric.ipynb
README.md		README.md
data.xlsx		data.xlsx
output.xlsx		output.xlsx

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PubEvalAI: Enhancing Decision-Making in AI Product Selection

Understanding Public Procurement

PubEvalAI addresses these challenges by providing a structured evaluation framework that focuses on key metrics to assess conversational AI solutions effectively.

About

Releases

Packages

Contributors 2

Languages

aditisinghh17/PubEvalAI

Folders and files

Latest commit

History

Repository files navigation

PubEvalAI: Enhancing Decision-Making in AI Product Selection

Understanding Public Procurement

PubEvalAI addresses these challenges by providing a structured evaluation framework that focuses on key metrics to assess conversational AI solutions effectively.

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages