Skip to content

Zemberek For Developers

Ahmet A. Akın edited this page Aug 20, 2018 · 8 revisions

Requirements

  • Java Development Kit 8 or higher. Open JDK or Oracle JDK can be used.
  • Maven. For Ubuntu users, use sudo apt-get install maven
  • Git

Download Code

If you intend to work on code and make changes, create a fork from Github page (Check top right corner of the project page).

Use git for cloning the code to local computer. For Ubuntu:

 git clone [your fork git link]

This will create a directory zemberek-nlp. if you just want to explore the code, you can use:

 git clone https://github.com/ahmetaa/zemberek-nlp.git

Open project with IntelliJ IDEA

Here are the steps for opening the project:

  • Start IDE.
  • Select Open from File menu and choose [zemberek-nlp]/pom.xml file.
  • Select open as project in from the dialog.
  • it will take some time to download dependencies with maven and index librariees. User also may need to point the installed JDK directory. For Ubuntu, if system wide JDK is used, it usually stays in /usr/lib/jvm/[java-8-oracle or something else] directory.

Here is a screenshot of the project after initial opening:

Main IDE page

Compile

From IDE

Once download is finished, trying to rebuild project from IDE menu 'Build - Rebuild All' will fail with errors like:

 Error:(25, 41) java: package zemberek.morphology.lexicon.proto does not exist
 Error:(26, 54) java: package zemberek.morphology.lexicon.proto.LexiconProto does not exist
 Error:(31, 48) java: package LexiconProto does not exist 

Because some modules uses auto-generated code from protocol buffers. Therefore first, user needs to select compile from the zemberek-nlp (root) module on the maven window at the right.

IDE Maven Compile

After compilation is finished, user does not need to compile it from maven anymore and use the Build-Run menu items instead.

From Console

Go to zemberek-nlp directory and compile with

 mvn compile 

Generating jar files

From IDE

Just like above, use package from maven window on the right and from zemberek-nlp root module.

IDE Maven Package

From Console

 mvn package

Module Jar files will be in several places. But for convenience they are copied to dependencies directory. However, the zemberek-full.jar will not be there. For that check the all/target directory.

For testing if it is working, copy the zemberek-full.jar to a convenient directory and after going there from console, try:

 java -jar zemberek-full.jar MorphologyConsole

Changing Dictionaries

If user needs to change internal dictionary files, a binary dictionary needs to be re-generated. For this, go to DictionarySerializer class (From IDE, use shift-shift or CTRL-N for finding it. It is located in [project-root]/morphology/src/main/java/zemberek/morphology/lexicon/ directory. Just run the class by right clicking on the editor screen and run. This will re-create the [project-root]/morphology/src/main/resources/tr/lexicon.bin file.

Changing Code Style

Zemberek uses Google coding style (two space indentation etc.). If user wants to adapt this, select Scheme on the top of the from Settings -> Code Style settings. And use the intellij-java-google-style.xml file in [zemberek-nlp]/dev folder.

Clone this wiki locally