This project demonstrates how to effortlessly serve an OCR model using BentoML. It accepts PDFs as input and returns the text contained within. The service employs Microsoft's DiT using Meta's detectron2 for image segmentation and EasyOCR for OCR.
The most convenient way to run this service is through containers, as the project relies on numerous external dependencies. We provide two pre-built containers optimized for CPU and GPU usage, respectively.
To run the service, you'll need a container engine such as Docker, Podman, etc. Quickly test the service by running the appropriate container:
# cpu
docker run -p 3000:3000 ghcr.io/bentoml/ocr-as-a-service:cpu
# gpu
docker run --gpus all -p 3000:3000 ghcr.io/bentoml/ocr-as-a-service:gpu
This project requires Python 3.8 or higher.
On MacOS, make sure to install poppler to use pdf2image
:
brew install poppler
On Linux distros, install pdftoppm
and pdftocairo
using your package manager, i.e. with apt-get
:
sudo apt install poppler-utils
To build the Detectron2 wheel, python3-dev
package is required. On Linux distros, run the following:
sudo apt install python3-dev
You may need to install a specific version of python3-dev, e.g.,
python3.10-dev
for Python 3.10.
For MacOS, Python Development Package is installed by default.
Refer to Detectron2 installation page for platform specific instructions and further troubleshootings.
Once you have all prerequisite installed, clone the repository and install the dependencies:
git clone https://github.com/bentoml/OCR-as-a-Service.git && cd OCR-as-a-Service
pip install -r requirements/pypi.txt
# This depends on PyTorch, hence needs to be installed afterwards
pip install 'git+https://github.com/facebookresearch/detectron2.git'
To serve the model with BentoML:
bentoml serve
You can then open your browser at http://127.0.0.1:3000 and interact with the service through Swagger UI.
BentoML's default model serving method is through an HTTP server. In this section, we demonstrate various ways to interact with the service:
curl -X 'POST' \
'http://localhost:3000/image_to_text' \
-H 'accept: application/pdf' \
-H 'Content-Type: multipart/form-data' \
-F file=@path-to-pdf
Replace
path-to-pdf
with the file path of the PDF you want to send to the service.
To send requests in Python, one can use bentoml.client.Client
to send requests to the service. Check out client.py
for the example code.
You can use Swagger UI to quickly explore the available endpoints of any BentoML service.
Effortlessly transition your project into a production-ready application using BentoCloud, the production-ready platform for managing and deploying machine learning models.
Start by creating a BentoCloud account. Once you've signed up, log in to your BentoCloud account using the command:
bentoml cloud login --api-token <your-api-token> --endpoint <bento-cloud-endpoint>
Note: Replace
<your-api-token>
and<bento-cloud-endpoint>
with your specific API token and the BentoCloud endpoint respectively.
Next, build your BentoML service using the build
command:
bentoml build
Then, push your freshly-built Bento service to BentoCloud using the push
command:
bentoml push <name:version>
Lastly, deploy this application to BentoCloud with a single bentoml deployment create
command following the deployment instructions.
BentoML offers a number of options for deploying and hosting online ML services into production, learn more at Deploying a Bento.
BentoML has a thriving open source community where thousands of ML/AI practitioners are contributing to the project, helping other users and discussing the future of AI. 👉 Pop into our Slack community!