This repository demonstrates how to use Docling for document conversion (PDF, HTML, etc.) to structured formats like Markdown and JSON.
The use I have for docling is particularly for tables.
- Clone the repository:
git clone https://github.ibm.com/Quentin-Lefevre/docling-testing.git
cd docling-testing
- Install dependencies:
pip install -r requirements.txt
Run examples in the examples/ folder to test different document types:
docling /copy/the/path/assets/maintenance-auto.pdf --to md --no-ocr
docling /copy/the/path/assets/meteo_montpellier.html --to md --ocr
Note that the ‘ocr’ option almost never works with pdf files...
Use the notebook in notebook/guide.ipynb to see my tests and results.