-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Auto (annotation-free) evaluation of RAG #140
Conversation
Signed-off-by: aasavari <aasavari.dhananjay.kakne@intel.com>
Signed-off-by: Yingchun Guo <yingchun.guo@intel.com> Signed-off-by: aasavari <aasavari.dhananjay.kakne@intel.com>
Co-authored-by: root <root@idc708073.jf.intel.com> Signed-off-by: aasavari <aasavari.dhananjay.kakne@intel.com>
Signed-off-by: aasavari <aasavari.dhananjay.kakne@intel.com>
for more information, see https://pre-commit.ci Signed-off-by: aasavari <aasavari.dhananjay.kakne@intel.com>
Signed-off-by: ZePan110 <ze.pan@intel.com> Signed-off-by: aasavari <aasavari.dhananjay.kakne@intel.com>
Signed-off-by: aasavari <aasavari.dhananjay.kakne@intel.com>
Signed-off-by: aasavari <aasavari.dhananjay.kakne@intel.com>
Signed-off-by: aasavari <aasavari.dhananjay.kakne@intel.com>
Signed-off-by: aasavari <aasavari.dhananjay.kakne@intel.com>
Signed-off-by: aasavari <aasavari.dhananjay.kakne@intel.com>
Signed-off-by: aasavari <aasavari.dhananjay.kakne@intel.com>
Signed-off-by: aasavari <aasavari.dhananjay.kakne@intel.com>
Signed-off-by: aasavari <aasavari.dhananjay.kakne@intel.com>
Signed-off-by: aasavari <aasavari.dhananjay.kakne@intel.com>
for more information, see https://pre-commit.ci
Signed-off-by: aasavari <aasavari.dhananjay.kakne@intel.com>
for more information, see https://pre-commit.ci
Hi @lkk12014402, The new UT is running correctly locally - When I pass absolute path for template_dir, it did not work as we're loading environment using dotenv. So, relative path is to be provided to jinja2. |
Signed-off-by: aasavari <aasavari.dhananjay.kakne@intel.com>
@adkakne I feel that auto-eval is more like metrics, so should be in the |
Hi Minmin, thank you for your comments. I moved auto_eval to metrics folder. |
Signed-off-by: aasavari <aasavari.dhananjay.kakne@intel.com>
hi, kakne @adkakne what are the differences between original ragas. I think we can also use ragas when there are not ground truth answers? |
Hi Kaokao, Thank you for your review and comment. I shared a detailed response over MS teams. Short answer - we want to supplement ragas with annotation free feature as raga allows 4 out of 10 metrics to run without ground truth. (we are not counting summarization score as we don't offer it yet). Secondly, ragas prompting technique does not work with every open-source LLM (I shared more details offline). AutoEval uses a different prompting technique (which is similar to what used in C-RAG paper by meta). Lastly, auto_eval makes 1 LLM call per example while ragas makes num_metrics calls per example. So, user can save on cost for proprietary LLMs or improve efficiency while using endpoint on Gaudi / local compute. Please let me know if you have any more feedback. Thank you. |
Signed-off-by: aasavari <aasavari.dhananjay.kakne@intel.com>
Thanks for you explainations. It looks goods to me. Thanks~ |
the metric name auto_eval doesn't make sense like ragas, so let us think about a proper name |
for more information, see https://pre-commit.ci
Signed-off-by: aasavari <aasavari.dhananjay.kakne@intel.com>
* Optimize path and link validity check. Signed-off-by: ZePan110 <ze.pan@intel.com>
Signed-off-by: rowenaal <rowena.almeida@intel.com>
* added output_folders * updated subprocess run for output folders * get report done * fixed the output_folder issues * add the func of run_benchmark * add return * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* Support sharegpt dataset in chatqna e2e test * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Change the log level for selected questions --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
…roject#156) Signed-off-by: ZePan110 <ze.pan@intel.com>
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
Signed-off-by: aasavari <aasavari.dhananjay.kakne@intel.com>
for more information, see https://pre-commit.ci
Closing this PR in order to limit number of commits, added a fresh PR. |
Description
This PR contains code for Auto Evaluation of RAG and is meant to serve use-case where ground truth answers are not available for evaluation of RAG. We also allow users to bring their own metrics and their own data. Evaluation is performed using LLM-as-a-judge technique and user can choose from OpenAI, HF endpoint and local hardware. Finally, user can see reasoning and score for each metric as generated by the LLM.
Issues
n/a
.Type of change
Dependencies
Python3 packages
openai
,datasets
,python-dotenv
andjsonlines
are needed.Tests
I added unit test -
tests/test_auto_eval.py
and ensured it passes locally in environment built usingtests/requirements.txt
with python 3.10.Furthermore, OpenAI models were tested with
gpt-4o
,gpt-4
andgpt-3.5-turbo-16k
. Endpoint testing was performed withmeta-llama/Meta-Llama-3-8B-Instruct
andmistralai/Mixtral-8x7B-Instruct-v0.1
on Gaudi2 machines. Local testing was performed usingmistralai/Mixtral-8x7B-Instruct-v0.1
.For all tests, I used -
explodinggradients/ragas-wikiqa
.