You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As for CMATH's evaluation, the prompting method is 6-shot in the Table 2 of your technical report. However, in your open-source evaluation code, it seems to use 0-shot. Additionally, the official paper of CMATH utiltizes 0-shot evaluation.
So I would like to ask which method is correct? If 6-shot is correct, could you provide the used demonstrations? Thanks a lot!
The text was updated successfully, but these errors were encountered:
We evaluate the base models in the 6-shot setting and the instruct models in the zero-shot setting. The repo provides evaluation for the instruct models.
Thanks for your reply! Could you please provide the 6 demonstrations used in your evaluation process? This allows me align the evaluation methods with your models. Thank you very much!
As for CMATH's evaluation, the prompting method is 6-shot in the Table 2 of your technical report. However, in your open-source evaluation code, it seems to use 0-shot. Additionally, the official paper of CMATH utiltizes 0-shot evaluation.
So I would like to ask which method is correct? If 6-shot is correct, could you provide the used demonstrations? Thanks a lot!
The text was updated successfully, but these errors were encountered: