Performance List Reproduction 性能榜单复现 #273
Replies: 6 comments
-
是否可以提供验证脚本能够复现一下在VQAV2 testVQA这些数据集上的指标?谢谢。 |
Beta Was this translation helpful? Give feedback.
-
Thanks for your excellent work. But I have some questiosn for the VQA benchmark experiments. Could you help me out? |
Beta Was this translation helpful? Give feedback.
-
无法复现指标,使用原始prompt DocVQA是73.4% anls,你们用的是什么prompt,能不能提供一下测试脚本,我们还想在我们的paper中对比一下其他benchmark的指标 |
Beta Was this translation helpful? Give feedback.
-
感谢您的工作,请问能否提供有关aitw数据集的测试脚本? |
Beta Was this translation helpful? Give feedback.
-
Hi I tried to reimplement the results in AITW based on Cogagent. While, I only can get around 34% in overall performance when I follow the repo readme instruction and the discussion in #259. Would you like to share more details for the implementation? For example, whether you incorporate summary of historical actions and how cogagent can do this summary? Also, would you like to share the script for this dataset? |
Beta Was this translation helpful? Give feedback.
-
My test results are as follows, is this considered good or bad? [2024-07-19 09:38:21,589] [INFO] [RANK 0] validation loss at the end of training for test data | loss: 0.000000E+00 | PPL: 1.000000E+00 acc 5.319865E-02 | acc_w/o_case 5.319865E-02 | |
Beta Was this translation helpful? Give feedback.
-
Any questions about the Benchmark in paper can be raised here.
任何对榜单有疑问的内容都可以在这里进行提出
Beta Was this translation helpful? Give feedback.
All reactions