Verbapie

Python code to produce an SQLite database, ready to offer lemma search on the web for Epidocs XML greek documents.

Caution

This code is at a very early stage and is not ready for distribution. It works, at least for the developpers. Fill an issue upper if you want to work with the developpers to make it work for your corpus.

Requirements

A corpus of greek texts conforming to the tei-epidoc.rng schema. Example
…/myprojects$ git clone https://github.com/OpenGreekAndLatin/First1KGreek.git
A python 3 installation, >= 3.6, < 3.10 (at 2022-01)
ubuntu.21.10:…$ python3 -V
Python 3.9.7
The pip packager
The Python libxml wrapper for XSLT transformations
ubuntu.21.10:…$ sudo pip3 install lxml
pie_extended, the lemmatizer from Thibault Clérice, with the greek model, takes a while, and can fall in a depedencies hell if you have some required packages installed in other versions than desired by pie. This scenario has worked (Cython allow scikit to recompile itself)
ubuntu.21.10:…$ sudo pip3 install Cython
ubuntu.21.10:…$ sudo pip3 install pie-extended
ubuntu.21.10:…$ pie-extended download grc

Usage

Not stable for now.

Optional, Cuda with nvidia graphic cards

For a faster lemmatisation, if you have an Nvidia graphic card, you can use it for work (and not only gaming). Install the latest Nvidia pilots, and the Cuda toolkit to use the processors of your graphic card, ant install the python lib
ubuntu.21.10:~$ sudo apt install nvidia-cuda-toolkit

Installation for Windows

Install nvidia cuda pilots
Install PyTorch 1.7.1, lemmatization with papie 0.3.9 requires torch<=1.7.1,>=1.3.1, chose the torch version according to your cuda pilot version
(The full lematization of the Iliad and the Odyssey takes about 5 minutes with cuda on an rtx 3060ti and about 13 minutes without, the 2.6 multiplication factor is about the same with a much larger corpus.)

Install Python for Windows 10

A python package suppose usually that you have already a running Python installation, but if not, and if you are on windows, the system will not help vou to make good choices like linux. Here some hints that may save you time, at least at date (2022-01).

Install Python 3.8, don’t try to be newer than others. Verbapy is a Digital Humanity library, it requires research libs. Researchers are not paid to dicover new bugs on new versions of Python. Tick NOW (much more easier to explain than after) Add Python 3.8 to PATH, and pip.
Don’t try to install python globally on windows (ex: ~~C:\Program Files\Python38~~). This good practice as a linux admin will run you in "deps hell" with windows.
Verify thoses commands in your preferred console
win10> python -V
Python 3.8.10
win10> where python
C:\Users{YOU}\AppData\Local\Programs\Python\Python38\python.exe
Update pip (the python package installer)
win10> pip install --user --upgrade pip
(--user should not be required, but sometimes, it seems)
Now you should have a Python correct to work, try to install an omportant requirement
win10> pip install lxml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Verbapie

Caution

Requirements

Usage

Optional, Cuda with nvidia graphic cards

Install Python for Windows 10

Files

README.md

Latest commit

History

README.md

File metadata and controls

Verbapie

Caution

Requirements

Usage

Optional, Cuda with nvidia graphic cards

Install Python for Windows 10