Skip to content

Simple tool to extract markdown text from images of scientific formulae.

Notifications You must be signed in to change notification settings

apiraccini/formulae

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Formulae 🤗📑

Intro

This is a simple tool that processes a png image (ideally a screenshot of a mathematical formula) and returns the markdown text. The base model is Nougat, first proposed in Nougat: Neural Optical Understanding for Academic Documents and accessible via HuggingFace transformers.

Usage guide

First clone git repo in a directory of your choice,

git clone https://github.com/apiraccini/formulae.git
cd formulae

then you can either set up the app from terminal

python -m venv venv
venv\Scripts\activate
pip install -r requirements.txt
python app.py

or build a docker image and run it in a container

docker build -t formulae_app .
docker run -p 7860:7860 formulae_app

and paste the URL printed in the terminal in your browser.

Base utilities

You can also use the base utilities provided for handling files directly from python code. Inside the cloned repository, in a python shell, you can do

from utils.app_utils import process_formula, process_folder

text = process_formula('your_image_folder/your_image.png')
print(text)

process_folder('your_image_folder')

About

Simple tool to extract markdown text from images of scientific formulae.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published