Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update #125

Open
wants to merge 14 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .flake8
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
[flake8]
max-line-length = 88
extend-ignore = E203

1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ var/
*.egg-info/
.installed.cfg
*.egg
.mypy_cache/

# PyInstaller
# Usually these files are written by a python script from a template
Expand Down
12 changes: 12 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
repos:
- repo: https://github.com/psf/black
rev: stable
hooks:
- id: black
language_version: python3.7
- repo: https://github.com/timothycrosley/isort
rev: 4.3.21
hooks:
- id: isort
language_version: python3.7

9 changes: 9 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# Changelog
All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [Unreleased]

[Unreleased]: https://github.com/Breta01/handwriting-ocr/releases
2 changes: 1 addition & 1 deletion LICENSE
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
MIT License

Copyright (c) 2017 Břetislav Hájek
Copyright (c) 2020 Břetislav Hájek

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
Expand Down
51 changes: 51 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
.PHONY: help bootstrap data lint clean

SHELL=/bin/bash

VENV_NAME?=venv
VENV_BIN=$(shell pwd)/${VENV_NAME}/bin
VENV_ACTIVATE=source $(VENV_NAME)/bin/activate

PROJECT_DIR=handwriting_ocr

PYTHON=${VENV_NAME}/bin/python3

.DEFAULT: help
help:
@echo "Make file commands:"
@echo " make bootstrap"
@echo " Prepare complete development environment"
@echo " make data"
@echo " Download and prepare data for training"
@echo " make lint"
@echo " Run pylint and mypy"
@echo " make clean"
@echo " Clean repository"

bootstrap:
sudo xargs apt-get -y install < requirements-apt.txt
python3.7 -m pip install pip
python3.7 -m pip install virtualenv
make venv
${VENV_ACTIVATE}; pre-commit install

# Runs when the file changes
venv: $(VENV_NAME)/bin/activate
$(VENV_NAME)/bin/activate: setup.py requirements.txt requirements-dev.txt
test -d $(VENV_NAME) || virtualenv -p python3.7 $(VENV_NAME)
${PYTHON} -m pip install -U pip
${PYTHON} -m pip install -e .[dev]
touch $(VENV_NAME)/bin/activate

data:
${PYTHON} ${PROJECT_DIR}/data/data_create_sets.py

lint: venv
# pylint supports pyproject.toml from 2.5 version. Switch to following cmd once updated:
# ${PYTHON} -m pylint src
${PYTHON} -m pylint --extension-pkg-whitelist=cv2 --variable-rgx='[a-z_][a-z0-9_]{0,30}$' --max-line-length=88 src
${PYTHON} -m flake8 src

clean:
find . -name '*.pyc' -exec rm --force {} +
rm -rf $(VENV_NAME) *.eggs *.egg-info dist build docs/_build .cache
35 changes: 20 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
# Handwriting OCR
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)

The project tries to create software for recognition of a handwritten text from photos (also for Czech language). It uses computer vision and machine learning. And it experiments with different approaches to the problem. It started as a school project which I got a chance to present on Intel ISEF 2018.

<p align="center"><img src ="doc/imgs/poster.png?raw=true" height="340" alt="Sublime's custom image" /></p>
Expand All @@ -15,34 +17,37 @@ Main files combining all the steps are [OCR.ipynb](notebooks/OCR.ipynb) or [OCR-

## Getting Started
### 1. Clone the repository
```
```bash
git clone https://github.com/Breta01/handwriting-ocr.git
```
After downloading the repo, you have to download the datasets and models (for more info look into [data](data/) and [models](models/) folders).

### 2. Requirements
The project is created using Python 3.6 with Jupyter Notebook. I recommend using Anaconda. If you have it, you can run the installation as:
```bash
make bootstrap
```
conda create --name ocr-env --file environment.yml
conda activate ocr-env
The project is using Python 3.7 with Jupyter Notebook. I recommend using virtualenv. Running command `make bootstrap` should install all necessary packages. If you have it, you can run the installation as:

Main libraries (all required libraries are in [requirements.txt](requirements.txt) and [requirements-dev.txt](requirements-dev.txt)):
* Numpy
* Tensorflow
* OpenCV
* Pandas
* Matplotlib

### Activate and Run
```bash
source venv/bin/activate
```
Main libraries (all required libraries are in [environment.yml](environment.yml)):
* Numpy (1.13)
* Tensorflow (1.4)
* OpenCV (3.1)
* Pandas (0.21)
* Matplotlib (2.1)

### Run
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Write better documentation on usage. Example sections https://gist.github.com/jxson/1784669

With all required libraries installed and cloned repo, run `jupyter notebook` in the directory of the project. Then you can work on the particular notebook.
This command will activate the virtualenv. Then you can run `jupyter notebook` in the directory of the project and work on the particular notebook.

## Contributing
Best way how to get involved is through creating [GitHub issues](https://github.com/Breta01/handwriting-ocr/issues) or solving one! If there aren't any issues you can contact me directly on email.

## License
**MIT**
[MIT](./LICENSE.md)

## Support the project
If this project helped you or you want to support quick answers to questions and issues. Or you just think it is an interesting project. Please consider a small donation.

[![paypal](https://www.paypalobjects.com/en_US/i/btn/btn_donate_LG.gif)](https://paypal.me/bretahajek/2)
[![paypal](https://www.paypalobjects.com/en_US/i/btn/btn_donate_LG.gif)](https://paypal.me/bretahajek/5)
5 changes: 3 additions & 2 deletions data/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,10 @@ After downloading these datasets, there are scripts in `src/data/` folder which

### Breta’s data (1)
*5000 images*
All data owned by [@Breta01](https://github.com/Breta01) are available on this link (distributed under the same license as this repository). The data should be placed either in `raw/breta/` or `processed/breta/` folder according to their location in archive from the link below. (I removed the Czech accents from words. If you want to use them, you have to recover them using CSV files containing: `word_without_accents, original_word` in UTF-8 encoding.)
All data owned by [@Breta01](https://github.com/Breta01) are available on this link (distributed under the same license as this repository). The data should be placed either in `raw/breta/` or `processed/breta/` folder accordingly (see links below). (I removed the Czech accents from words. If you want to use them, you have to recover them using CSV files containing: `word_without_accents, original_word` in UTF-8 encoding.)

<https://drive.google.com/file/d/0Bw95a8U_pp2aakE0emZraHpHczA/view?usp=sharing>
`raw/breta/`: <https://drive.google.com/file/d/1y6Kkcfk4DkEacdy34HJtwjPVa1ZhyBgg/view?usp=sharing>
`processed/brata/`: <https://drive.google.com/file/d/1p7tZWzK0yWZO35lipNZ_9wnfXRNIZOqj/view?usp=sharing>

### IAM Handwriting Database (2)
*85000 images*
Expand Down
15 changes: 0 additions & 15 deletions data/characters/README.md

This file was deleted.

10 changes: 0 additions & 10 deletions data/raw/README.md

This file was deleted.

127 changes: 0 additions & 127 deletions environment.yml

This file was deleted.

File renamed without changes.
File renamed without changes.
3 changes: 3 additions & 0 deletions handwriting_ocr/data/data.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Copyright 2020 Břetislav Hájek <info@bretahajek.com>
# Licensed under the MIT License. See LICENSE for details.
"""Modelu for providing datasets for training and inference."""
Loading