Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add PDF Table Extract Tool #127

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

sachinspanicker
Copy link

PDF Table Extract Tool

Description

Add new PDFTableExtractTool for extracting tables from PDF documents and converting them to markdown format.

Features

  • Extract tables from PDF documents
  • Convert tables to markdown format
  • Handle multiple tables and large tables
  • Support both sync and async operations
  • Comprehensive error handling

Implementation

  • Added PDFTableExtractTool class
  • Added comprehensive test suite
  • Added documentation with usage examples
  • Implemented proper error handling
  • Added type hints and docstrings

Dependencies

Added to pyproject.toml:

  • PyMuPDF
  • pandas
  • tabulate

Testing

All tests passing:

  • Basic functionality
  • Error handling
  • Edge cases
  • Async operations

Documentation

  • Added detailed README
  • Added usage examples
  • Added inline documentation

- Add PDFTableExtractTool for extracting tables from PDFs
- Convert extracted tables to markdown format
- Add comprehensive test suite
- Add documentation and usage examples
- Handle edge cases and error conditions
- Support both sync and async operations
@joaomdmoura
Copy link
Collaborator

Looks good but missing a init import if you dont mind adding it :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants