Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproduce the results from the Figure 7 #6

Open
RealAntonVoronov opened this issue Mar 24, 2023 · 1 comment
Open

Reproduce the results from the Figure 7 #6

RealAntonVoronov opened this issue Mar 24, 2023 · 1 comment

Comments

@RealAntonVoronov
Copy link

Hello. Thank you for a great work and for sharing the code to reproduce the results. Unfortunately, I wasn't able to find the way to run the experiments verifying that conceptual calibration increases model resilience to the template change. Can you help me? I couldn't find in the repo neither 15 various prompt templates for 'sst2' provided in the paper appendix nor the code that runs that specific experiment.

@RealAntonVoronov
Copy link
Author

RealAntonVoronov commented Mar 24, 2023

Moreover, I've tried manually changing template for 'sst2' dataset and here's the results that I've got for 2-shot, 3 seeds:

vanilla template Review: {}\nSentiment: {}:

Original Accuracy   | Mean: 0.5321, Low: 0.5138, High: 0.5688, Std: 0.0259
Calibrated Accuracy | Mean: 0.8058, Low: 0.7569, High: 0.8773, Std: 0.0517

template sentence: {}\nsentiment: {}:

Original Accuracy   | Mean: 0.5138, Low: 0.5103, High: 0.5183, Std: 0.0034
Calibrated Accuracy | Mean: 0.5719, Low: 0.5126, High: 0.6812, Std: 0.0774

template input: {}\ntarget: {}:

Original Accuracy   | Mean: 0.5138, Low: 0.5103, High: 0.5183, Std: 0.0034
Calibrated Accuracy | Mean: 0.5719, Low: 0.5126, High: 0.6812, Std: 0.0774

template Input: {}\nTarget: {}:

Original Accuracy   | Mean: 0.5233, Low: 0.5080, High: 0.5528, Std: 0.0208
Calibrated Accuracy | Mean: 0.5585, Low: 0.5080, High: 0.6583, Std: 0.0706

What can be the reason for such an unstable behaviour?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant