Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lower in the F1 score obtained by using the fine-tuned FLAN-T5-XL and FLAN-T5-large models #1

Open
GuptaSonam opened this issue Jul 31, 2023 · 1 comment

Comments

@GuptaSonam
Copy link

Hi,
Thank you for providing the fine-tuned models in the repository. I used the inference_alpaca.py code to evaluate the FLAN-T5-XL and FLAN-T5-large models on simulation dataset. However, the F1 score that I am getting are lower than what has been reported in the repository. Can you tell me if there is some setting that needs to be changed?

Following are the number that I am getting on running the inference:
FLAN-T5-large (reported) | 57.3. | 50.1 | 70.5
FLAN-T5-large (obtained) | 53. | 49 | 57

Thanks,
Sonam Gupta

@xiangyue9607
Copy link
Collaborator

Hi Sonam,

Sorry for the late reply. Just saw the issue. It might be due to the mismatch of the prompt used for training and evaluation. Could you try the prompt in the following example?

"prompt": "As an Attribution Validator, your task is to verify whether a given context can support the given claim. A claim can be either a plain sentence or a question followed by its answer.Specifically, your response should clearly indicate the relationship: Attributable, Contradictory or Extrapolatory. A contradictory error occurs when you can infer that the answer contradicts the fact presented in the context, while an extrapolatory error means that you cannot infer the correctness of the answer based on the information provided in the context.\n\nClaim: "[Question] [Answer].\n\nContext: [Context]"

Let me know if this can replicate the result:)

Thanks,
Xiang

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants