JoSimText

This system performs word sense induction form text. This is an implementation of the JoBimText approach in Scala, Spark, tuned for induction of word Senses (hence the "S" instead of "B" in the name, but also because of the name of the initial developer of the project Johannes Simon). The original JoBimText implementation is written in Java/Pig and is more generic as it supposes that "Jo"s (i.e. objects) and "Bims (i.e. features) can be any linguistic objects. This particular implementation is designed for modeling of words and multiword expressions.

The system consist of several modules:

Term feature extraction
Term similarity (this reposiroty). This repository performs construction of a distributional thesaurus from word-feature frequencies.
Word sense induction

System requirements:

git
Java 1.8+
Apache Spark 2.2+

Installation of the tool:

Get the source code:

git clone https://github.com/uhh-lt/josimtext.git
cd josimtext

Build the tool:

make

Set the environment variable SPARK_HOME to the directory with Spark installation.

Run a command:

To see the list of available commands:

./run

To see arguments of a particular command, e.g. :

./run WordSimFromTermContext --help

By default, the tool is running locally. To change Spark and Hadoop parameters of the job (queue, number of executors, memory per job, and so on) you need to modify the conf/env.sh file. A sample file for running the jobs using the CDH YARN cluster are provided in conf/cdh.sh.

Name		Name	Last commit message	Last commit date
Latest commit History 494 Commits
conf		conf
project		project
sbt		sbt
scripts		scripts
src		src
.gitignore		.gitignore
.travis.yml		.travis.yml
Makefile		Makefile
README.md		README.md
build.sbt		build.sbt
index.sh		index.sh
run		run

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

JoSimText

System requirements:

Installation of the tool:

Run a command:

About

Releases

Packages

Contributors 3

Languages

uhh-lt/josimtext

Folders and files

Latest commit

History

Repository files navigation

JoSimText

System requirements:

Installation of the tool:

Run a command:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages