Skip to content

js-d/bongard

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

bongard

Evaluating Claude models on Bongard problems. The problems are drawn from the Online Encyclopaedia of Bongard Problems.

You can visualize results in the Streamlit app here.

Next steps

  • Investigate model descriptions of abstract images, independent of the Bongard task. Indeed, this currently seems to be the bottleneck, not the abstract reasoning aspects.
  • Add an LLM evaluation of model responses: give an LLM the solution and a model response, and ask it to determine whether the response is correct.
  • Evaluate more models than Haiku and Opus (including future releases targeting diagrams).
  • Do more prompt engineering.
  • Evaluate via classification rather than description. That is, given 5 left images and 5 right images, ask the model to place a new image in the correct group.

Resources

As far as I can tell, no one has evaluated modern multimodal models on this exact task, but there is some related work:

Resources on Bongard problems:

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages