-
Notifications
You must be signed in to change notification settings - Fork 210
Zemberek For Developers
- Java Development Kit 8 or higher. Open JDK or Oracle JDK can be used.
- Maven. For Ubuntu users, use
sudo apt-get install maven
- Git
If you intend to work on code and make changes, create a fork from Github page (Check top right corner of the project page).
Use git for cloning the code to local computer. For Ubuntu:
git clone [your fork git link]
This will create a directory zemberek-nlp
. if you just want to explore the code, you can use:
git clone https://github.com/ahmetaa/zemberek-nlp.git
Here are the steps for opening the project:
- Start IDE.
- Select
Open
fromFile
menu and choose[zemberek-nlp]/pom.xml
file. - Select open as project in from the dialog.
- it will take some time to download dependencies with maven and index librariees. User also may need to point the installed JDK directory. For Ubuntu, if system wide JDK is used, it usually stays in
/usr/lib/jvm/[java-8-oracle or something else]
directory.
Here is a screenshot of the project after initial opening:
From IDE
Once opening and initial maven downloads and indexing is finished, trying to rebuild project from IDE menu 'Build - Rebuild All' will fail with errors like:
Error:(25, 41) java: package zemberek.morphology.lexicon.proto does not exist
Error:(26, 54) java: package zemberek.morphology.lexicon.proto.LexiconProto does not exist
Error:(31, 48) java: package LexiconProto does not exist
Because some modules uses auto-generated code from protocol buffers. Therefore first, user needs to select compile
from the zemberek-nlp (root)
module on the maven window at the right.
After compilation is finished, user does not need to compile it from maven anymore and use the Build-Run menu items instead.
From Console
Go to zemberek-nlp
directory and compile with
mvn compile
From IDE
Just like above, use package
from maven window on the right and from zemberek-nlp root module.
From Console
mvn package
Module Jar files will be in several places. But for convenience they are copied to dependencies
directory. However, the zemberek-full.jar
will not be there. For that check the all/target
directory.
For testing if it is working, copy the zemberek-full.jar to a convenient directory and after going there from console, try:
java -jar zemberek-full.jar MorphologyConsole
If user needs to change internal dictionary files, a binary dictionary needs to be re-generated. For this, go to DictionarySerializer class (From IDE, use shift-shift or CTRL-N for finding it. It is located in [project-root]/morphology/src/main/java/zemberek/morphology/lexicon/
directory. Just run the class by right clicking on the editor screen and run. This will re-create the [project-root]/morphology/src/main/resources/tr/lexicon.bin
file.
Zemberek uses Google coding style (two space indentation etc.). If user wants to adapt this, select Scheme
on the top of the from Settings -> Code Style
settings. And use the intellij-java-google-style.xml
file in [zemberek-nlp]/dev
folder.