TGist Taxonomy Creator

Simplistic and unfinished code that creates a taxonomy from a small corpus, where the corpus is represented by a list of terms and feature vectors and roles associated with those terms.

Dependencies:

The object exploration code by Dimitris Andreou (https://github.com/DimitrisAndreou/memory-measurer). Make sure that the following jars are in your classpath: dist/objectexplorer.jar, lib/guava-r09.jar and lib/jsr305.jar. This code is used for debugging purposes only and technically you will not need this code when you check out the master branch of this repository.

Data required

Creating a taxonomy requires a list of terms with technology scores, feature vectors for those terms and a list of terms with their roles. These three should be together in one directory and are expected to have the following names:

filename	description
classify.MaxEnt.out.s4.scores.sum.az	list of terms
features.txt.gz	feature vectors for all terms
NB.IG50.test1.woc.9999.results.classes	list of terms with their roles

The first of those files is created by running the feature extraction and technology classifier over a corpus (see https://github.com/techknowledgist/tgist-features and https://github.com/techknowledgist/tgist-classifiers). The second can be created from the feature vectors created by tgist-features using the extract_features.py script. The third is created with the domain roles code in https://github.com/techknowledgist/act, this file is allowed to be empty.

Creating a taxonomy

The one-command way to create a new taxonomy from a directory with the needed input data is:

> java -jar dist/TGistTaxonomy.jar --create <TaxonomyLocation> <DataLocation>

Here, <DataLocation> is the directory with the three required files mentioned above and <TaxonomyLocation> is the location of the new taxonomy, if the path already exists the program exits with a warning.

Using the --create option is a shorthand for four separate commands:

> java -jar dist/TGistTaxonomy.jar --init <TaxonomyLocation> <DataLocation>
> java -jar dist/TGistTaxonomy.jar --import <TaxonomyLocation>
> java -jar dist/TGistTaxonomy.jar --build-hierarchy <TaxonomyLocation>
> java -jar dist/TGistTaxonomy.jar --add-relations <TaxonomyLocation>

With --init the taxonomy is initialized, which boils down to creating a directory with in it one file named properties.txt which stores a short name for the taxonomy (the base name of the path where the taxonomy is created) and the location of the input data directory. With --import the data in the input directory are imported into the taxonomy directory. Only terms with a minimal technology score and minimum frequency are added and only the feature vectors and roles for those terms are added (which reduces the size of the data significantly). Finally, with --build-hierarchy and --add-relations the taxonomy's hierarchy is built and relations between terms are added.

During the above processing the following files are created inside the taxonomy:

option	files created
--init	properties.txt
--import	technologies.txt, features.txt, act.txt
--build-hierarchy	hierarchy.txt
--add-relations	relations-cooc.txt, relations-term.txt

Browsing a taxonomy

To browse a taxonomy do the following:

> java -jar dist/TGistTaxonomy.jar --browse <TaxonomyLocation>

You will get a splash screen and some limited functionality for navigating the taxonomy. Enter q followed by a return to exit the browser.

TODO: document this better once some minimal improvements are made.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
doc/notes		doc/notes
src		src
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TGist Taxonomy Creator

Data required

Creating a taxonomy

Browsing a taxonomy

About

Releases

Packages

Languages

techknowledgist/tgist-taxonomy

Folders and files

Latest commit

History

Repository files navigation

TGist Taxonomy Creator

Data required

Creating a taxonomy

Browsing a taxonomy

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages