Skip to content

LeanEuclid is a benchmark for autoformalization in the domain of Euclidean geometry, targeting the proof assistant Lean.

License

Notifications You must be signed in to change notification settings

loganrjmurphy/LeanEuclid

Repository files navigation

Autoformalizing Euclidean Geometry

LeanEuclid

Code for the paper:

Autoformalizing Euclidean Geometry
International Conference on Machine Learning (ICML), 2024
Logan Murphy*, Kaiyu Yang* (* equal contribution), Jialiang Sun, Zhaoyu Li, Anima Anandkumar, and Xujie Si

@inproceedings{murphy2024leaneuclid,
  title={Autoformalizing {Euclidean} Geometry},
  author={Murphy, Logan and Yang, Kaiyu and Sun, Jialiang and Li, Zhaoyu and Anandkumar, Anima and Si, Xujie},
  booktitle={International Conference on Machine Learning (ICML)},
  year={2024}
}

GitHub license


Quick Links

  1. Requirements
  2. Building
  3. The Formal System E
  4. LeanEuclid
    1. Euclid's Elements
    2. UniGeo
  5. Evaluating Autoformalized Theorem Statements
  6. Experiments
  7. Acknowledgements

Requirements

  • A working setup of Lean 4, including elan and Lean's VSCode extension
  • Install the latest version of Z3 and CVC5 and make sure they can be accessed from the command line
  • Install Python dependencies: pip install smt-portfolio openai
  • Find out the location of smt-portfolio and make sure Lean's VSCode extension can also access it. For example, if which smt-portfolio outputs /Users/yangky/miniconda3/envs/lean/bin/smt-portfolio, you should set Server Env Paths in Lean's VSCode extension as below:
image

Building

Take the following steps to build the Lean project:

  1. Run lake script run check to check if the requirements are satisfied.
  2. Run lake exe cache get to download the mathlib cache
  3. Run lake build to compile the formal system E
  4. Open a file for Euclid's Elements in VS Code, e.g., Book/Prop01.lean. You should expect to see:

Elements Prop1

Optionally, you can:

  1. Run lake -R -Kenv=dev build SystemE:docs to build the documentation
  2. View the auto-generated documentation locally at ./lake/build/doc, e.g., using python -mhttp.server

If you have problems building the project, Dockerfile and scripts/build.sh may serve as useful references.

The Formal System E

E is a formal system introduced by Avigad et al., 2009 for faithfully formalizing the proofs in Euclid’s Elements, including the use of diagrammatic reasoning. It defines a language for expressing basic objects in 2D Euclidean geometry (points, lines, circles, etc.), as well as the form of theorems and proofs. We implement the formal system E in Lean and use SMT solvers to perform diagrammatic reasoning automatically and implicitly. Details of our implementation can be found here and in the auto-generated documentation.

LeanEuclid

LeanEuclid is a benchmark for testing autoformalization. It consists of 173 Euclidean geometry problems manually formalized into Lean. The theorems and proofs are in the format prescribed by the formal system E. Among the 173 problems, 48 are from Euclid's Elements, and 125 are adapted from the UniGeo dataset (Chen et al., 2022).

Euclid's Elements

We manually formalized the first book of Euclid's Elements, using an open-source LaTex version. The formalized theorems and proofs can be found here. Thanks to E and automatic diagrammatic reasoning, our formalized proofs are "faithful" in the sense that they correspond very closely to Euclid's original proofs. To our knowledge, this is the first time Euclid's Elements has been faithfully formalized in a proof assistant such as Lean, and our formalization has uncovered errors in Euclid's proofs (Appendix B of our paper).

For example, below is our formalization of Euclid's first proposition in Book I, which states you can construct an equilateral triangle given two distinct points:

image

Our formalized theorem and proof in Book/Prop01.lean:

theorem proposition_1 :
∀ (a b : Point) (AB : Line),
  distinctPointsOnLine a b AB →
    ∃ c : Point, |(c─a)| = |(a─b)| ∧ |(c─b)| = |(a─b)| := by
    euclid_intros
    euclid_apply circle_from_points a b as BCD
    euclid_apply circle_from_points b a as ACE
    euclid_apply intersection_circles BCD ACE as c
    euclid_apply point_on_circle_onlyif a b c BCD
    euclid_apply point_on_circle_onlyif b a c ACE
    use c
    euclid_finish

UniGeo

Problems from UniGeo are generally easier than those from Elements. Unlike Elements, UniGeo's problem text does not include complete information about the problem, so we manually provide missing diagrammatic details to the text. For example:

Diagram in UniGeo/Congruent/diagrams/1.png:

UniGeo Congruent Theorem 1

UniGeo's theorem statement in UniGeo/Congruent/texts/1.txt.

Given T U ≅ R S. R S ∥ T U. Complete the proof that △ R T U ≅ △ T R S.

UniGeo's proof in UniGeo/Congruent/proofs/1.txt:

RS ∥ TU
TU ≅ RS
∠RTU ≅ ∠SRT
RT ≅ RT
△RTU ≅ △TRS

Diagrammatic details we provide:

There is a quadrilateral RSUT with vertices R, S, U, and T. There is a diagonal line segment RT drawn inside the quadrilateral, connecting vertices R and T. 

Our formalized theorem and proof in UniGeo/Congruent/Thm01.lean:

theorem theorem_1 : ∀ (R S T U : Point) (RS ST RT TU RU : Line),
  formTriangle R S T RS ST RT ∧
  formTriangle R T U RT TU RU ∧
  S.opposingSides U RT ∧
  |(T─U)| = |(R─S)| ∧
  ¬ RS.intersectsLine TU →
  (△ R:T:U).congruent (△ T:R:S) :=
by
  euclid_intros
  euclid_apply Elements.Book1.proposition_29''' S U R T RS TU RT
  euclid_finish

Evaluating Autoformalized Theorem Statements

In addition to proof automation, our symbolic reasoning engine can be used to check the equivalence of autoformalized theorem statements against ground-truth theorem statements. We have built a wrapper E3 which performs two forms of equivalence checking:

  1. Standard equivalence checking, i.e., check if the two theorem statements are logically equivalent or not. It may also be that one is strictly stronger than the other.
  2. Approximate equivalence checking, where we try to quantify how close the two statements are to one another.

Typically, the latter is only done if the statements are not proven equivalent, but you'd like to see how "close" the prediction was to the ground truth. More details on E3 can be found here.

Experiments

In our paper, we used LeanEuclid to test state-of-the-art LLMs on theorem statements and proof autoformalization. We provide a Python wrapper which can be used to replicate our experiment procedure. More details and usage examples can be found here.

Acknowledgements

  • We use (our own fork of) lean-smt for running SMT solvers from Lean.
  • There are concurrent efforts on formalizing Euclidean geometry in Lean (e.g., EG and Hernandez-Espiet's work). To the best of our knowledge, LeanEuclid is unique in its use of external solvers to handle diagrammatic reasoning, which is typically performed explicitly. However, this comes at the expense of breaking soundness.

About

LeanEuclid is a benchmark for autoformalization in the domain of Euclidean geometry, targeting the proof assistant Lean.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published