Skip to content

Commit

Permalink
Simpler install (#16)
Browse files Browse the repository at this point in the history
* Cleanup tesseract files, no more need to install tess

* Update readme+add ico

* correction creation

* icon tweaking
  • Loading branch information
chpoit authored May 21, 2021
1 parent 038e801 commit 85ad9cf
Show file tree
Hide file tree
Showing 10 changed files with 243 additions and 118 deletions.
39 changes: 27 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,20 +14,13 @@ It's called Utsushi's charm because I thought it would be funny to make a comple

## Requirements
- A computer (Windows)
- Linux and Mac might work too, you wont be able to run the EXE and will have to run from source in a terminal window.
- Linux and Mac might work too, you wont be able to run the EXE and will have to run from source in a terminal window. Refer to [Running from source](#Running-from-source)
- A USB cable to connect your switch to transfer files
- This latest version of this downloaded to your computer (Utsushis-Charm_**vx_x**.zip)
- You can find it [here](https://github.com/chpoit/utsushis-charm/releases/latest)
- **Google Tesseract** installed and in path
- A copy of the version 4 is bundled with the release. Just run it, no extra packages needed
- Built by UB-Mannheim [License (Apache 2.0)](https://github.com/tesseract-ocr/tesseract/blob/master/LICENSE)
- Alternatively, download the same version here: [Installer here](https://digi.bib.uni-mannheim.de/tesseract/tesseract-ocr-w64-setup-v4.1.0.20190314.exe)
- Other Versions available on the [UB-Mannheim Github](https://github.com/UB-Mannheim/tesseract/wiki) page
- Some knowledge of how to type things in the terminal
- AKA: Knowing how to type in a hacker box
- Being able to read


## Steps

0. Unequip all jewels. You will create "fake" charms otherwise.
Expand Down Expand Up @@ -156,19 +149,41 @@ In all seriousness, the work is done in a few broad steps:

Sometimes windows will lock some files for a while and there is nothing you can do about it other than wait.

# Running from source on a mac
# Running from source
Common requirements:
- Python3 installed and in path
- Set up a virtual environment (optional)
- Install pip packages (in virtualenv if you use it.)

Normal instructions apply once the application starts.

## MacOS
- Requirements:
- have python3 and tesseract installed via brew (or some other way)
- setup a virtual environment
- `pip3 install virtualenv` to install virtualenv
- create a virtual env at the root of the repository `virtualenv -p python3 env`
- Virtual env on mac (optional): `virtualenv -p python3 env`
- Running:
- switch to the virtualenv `source env/bin/activate` (run at the root of the repository)
- set TESSDATA_PREFIX: `export TESSDATA_PREFIX=/usr/local/Cellar/tesseract/<version>/share/tessdata`
- install the project dependencies: `pip3 install .`
- run with `python3 main.py`

## Linux
- Requirements
- You will need to install Google tesseract with your package manager of choice.
- Running
- `python3 main.py`
-

## Windows
- Requirements
- **Google Tesseract** installed and in path
- A copy of the version 4 is bundled with the release. Just run it, no extra packages needed
- You can download the installer here version here: [Installer here](https://digi.bib.uni-mannheim.de/tesseract/tesseract-ocr-w64-setup-v4.1.0.20190314.exe)
- Make sure it's in path
- Python3 installed and in path
- Running:
- `source env/bin/activate` (if you use a virtual env)

# Extra command line options
If you run from source, or call the executable from the terminal you can make use of the following flags/arguments to achieve different functionality

Expand Down
Binary file added media/icon.ico
Binary file not shown.
7 changes: 0 additions & 7 deletions scripts/build_release.bat
Original file line number Diff line number Diff line change
Expand Up @@ -17,11 +17,4 @@ cmd /c ".\env\scripts\activate & python -m PyInstaller .\utsushis-charm.spec --o

copy %skill_corrections% "dist\%skill_corrections%"

if not exist %tesseract_name% (
echo "Tesseract installer missing, downloading..."
curl https://digi.bib.uni-mannheim.de/tesseract/tesseract-ocr-w64-setup-v4.1.0.20190314.exe -o %tesseract_name%
)

copy %tesseract_name% "dist\%tesseract_name%"

7z a -tzip %archive_name% ".\dist\*"
11 changes: 11 additions & 0 deletions skill_corrections.csv
Original file line number Diff line number Diff line change
Expand Up @@ -901,3 +901,14 @@ slugs",slugger
SDSIS,spare
Free:,Free
Free.,Free
Aftack,Attack
Affnity,Affinity
Siding,Sliding
Wide—Range,Wide-Range
Artilery,Artillery
Ballstics,Ballistics
Fortity,Fortify
Eyo,Eye
Sholls,Shells
Recoll,Recoil
Consttution,Constitution
34 changes: 23 additions & 11 deletions src/charm_extraction.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@
import json
import cv2
import os
from pathlib import Path
DEBUG = False


Expand All @@ -32,19 +33,30 @@
spell.load_dictionary(get_resource_path("skill_dict"), 0, 1)


known_corrections = {}
with open(get_resource_path('skill_corrections'), encoding='utf-8') as scf:
for line in scf.readlines():
line = line.strip()
w, r = line.split(',')
known_corrections[w] = r
def load_corrections(known_corrections=None):
known_corrections = known_corrections or {}
corrections_path = get_resource_path('skill_corrections')
Path(corrections_path).touch() # if not exists
with open(corrections_path, encoding='utf-8') as scf:
for line in scf.readlines():
line = line.strip()
w, r = line.split(',')
known_corrections[w] = r

return known_corrections

all_skills = {}
with open(get_resource_path('skill_list')) as slf:
for line in slf.readlines():
skill_name = line.strip()
all_skills[skill_name.lower()] = skill_name

def load_all_skills(all_skills=None):
all_skills = all_skills or {}
with open(get_resource_path('skill_list')) as slf:
for line in slf.readlines():
skill_name = line.strip()
all_skills[skill_name.lower()] = skill_name
return all_skills


known_corrections = load_corrections()
all_skills = load_all_skills()


def is_skill(skill_dict, skill_name):
Expand Down
91 changes: 6 additions & 85 deletions src/manual_tesseract_bindings.py → src/tesseract/Tesseract.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,69 +9,11 @@
import shutil
import platform
import logging
from pathlib import Path
from .tesseract_utils import *
logger = logging.getLogger(__name__)


class TesseractError(Exception):
pass


def find_tesseract():
# TODO: Make this resilient to "change" (tesseract version), probably not necessary
locations = [
ctypes.util.find_library("libtesseract-4"), # win32
ctypes.util.find_library("libtesseract302"), # win32 version 3.2
ctypes.util.find_library("tesseract"), # others
]

if platform.system() == "Windows":
locations += [
os.path.join(os.getenv("ProgramW6432"),
"Tesseract-OCR", "libtesseract-4.dll"),
os.path.join(os.getenv('LOCALAPPDATA'),
"Tesseract-OCR", "libtesseract-4.dll"),
os.path.join(os.getenv("ProgramFiles"),
"Tesseract-OCR", "libtesseract-4.dll"),
os.path.join(os.getenv("programfiles(x86)"),
"Tesseract-OCR", "libtesseract-4.dll"),
]
elif platform.system() == "Darwin": # MacOS
locations += [
# add potential environment paths here:
# Example:
# os.path.join(os.getenv("MACOS_ENV_NAME"), "Tesseract-OCR", "libtesseract-4.dll"),
]
elif platform.system() == "Linux":
locations += [
# add potential environment paths here:
# Example:
# os.path.join(os.getenv("LINUX_ENV_NAME"), "Tesseract-OCR", "libtesseract-4.dll"),
]

for potential in filter(lambda x: x, locations):
if os.path.isfile(potential):
logger.debug(f"Using tesseract at {potential}")
return potential

raise TesseractError(
'Tesseract library was not found on your system. Please install it')


def set_tessdata():
if 'TESSDATA_PREFIX' in os.environ:
return
path = find_tesseract()
path = os.path.dirname(path)
TESSDATA_PREFIX = os.path.join(path, 'tessdata')
os.environ['TESSDATA_PREFIX'] = TESSDATA_PREFIX
logger.debug(f"Set 'TESSDATA_PREFIX' to {TESSDATA_PREFIX}")


def get_datapath():
if 'TESSDATA_PREFIX' not in os.environ:
set_tessdata()
return os.environ['TESSDATA_PREFIX']

from .TesseractError import TesseractError

class Tesseract(object):
_lib = None
Expand Down Expand Up @@ -130,6 +72,8 @@ def __init__(self, language='eng', datapath=None, lib_path=None):
self.setup_lib(lib_path)
self._api = self._lib.TessBaseAPICreate()

download_language_data(language)

# required windows nonsense
encoded_lang = language.encode("utf-8")
datapath = get_datapath() if datapath is None else datapath
Expand Down Expand Up @@ -182,36 +126,13 @@ def set_variable(self, key, val):
self._lib.TessBaseAPISetVariable(self._api, key, val)


def convert_to_grayscale(image_data):
return cv2.cvtColor(image_data, cv2.COLOR_BGR2GRAY)


def process_image_with_tesseract(tesseract, image):
whitelist = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz1234567890\'/-"

height, width = image.shape[:2]
if len(image.shape) == 2:
depth = 1
else:
depth = image.shape[2]

# Forcing obnoxious type conversion, probably some windows BS
image = image.astype(np.uint8)

tesseract.set_image(image.ctypes, width, height, depth)
tesseract.set_variable("whitelist", whitelist)
tesseract.set_resolution()
text = tesseract.get_text()
return text.strip()


if __name__ == '__main__':
set_tessdata()
PACKAGE_PARENT = '..'
SCRIPT_DIR = os.path.dirname(os.path.realpath(
os.path.join(os.getcwd(), os.path.expanduser(__file__))))
sys.path.append(os.path.normpath(os.path.join(SCRIPT_DIR, PACKAGE_PARENT)))
from src.utils import remove_non_skill_info, apply_trunc_threshold, get_skills, _trim_image_past_skill_name
from ..utils import remove_non_skill_info, apply_trunc_threshold, get_skills, _trim_image_past_skill_name

test_img = [
"frames/frame0.png",
Expand Down
3 changes: 3 additions & 0 deletions src/tesseract/TesseractError.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@

class TesseractError(Exception):
pass
Loading

0 comments on commit 85ad9cf

Please sign in to comment.