WN16S

WordNet dataset with semantic relations only

Motivation

In WordNet two kinds of relations are recognized: lexical and semantic. Lexical relations hold between word forms (lemmas); semantic relations hold between word meanings (synsets).

I wanted to have a dataset with the lexical relations filtered out to build synset embeddings based only on the semantic relations of the WN graph.

Structure

In the dataset folder, you can find many tsv and txt files the meaning of which is explained hereafter.

file name	purpose	notes
`count_synsets.txt`	File that contains the number of synsets.
`count_relations.txt`	Files that contain the number of relations.
`count_edges_all.txt`	File that contains the number of total edges.
`count_edges_*.tsv`	Files that contain the number of edges of type *.
`synset_name_to_id.tsv`	File that maps each synset's name to a numeric id starting from 0.	The file is sorted on the first column.
`synset_id_to_name.tsv`	File that maps each synset id to a synset's name.	The file is sorted on the first column.
`relation_name_to_id.tsv`	File that maps each relation to a numeric id starting from 0.	The file is sorted on the first column.
`relation_id_to_name.tsv`	File that maps each relation id to a relation's name.	The file is sorted on the first column.
`edges_as_id_all.tsv`	File that contains all the edges of the WordNet's semantic subgraph as triples of ids (id synset 1, id relation, id synset 2).	The file is sorted on the second column.
`edges_as_id_*.tsv`	Files that contain only the edges of type *.	The file is sorted on the second column.
`edges_as_name_all.tsv`	File that contains all the edges of the WordNet's semantic subgraph as triples of names (name synset 1, name relation, name synset 2).	The file is sorted on the second column.
`edges_as_name_*.tsv`	Files that contain only the edges of type *.	The file is sorted on the second column.

Download

A compressed version of the dataset can be downloaded from the release page or by clicking here.

Source

The dataset is generated using nltk and is a subset of the WordNet dataset.

License

All source code of this project is licensed under the MIT License - see the license file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

readme.md

readme.md

WN16S

Motivation

Structure

Download

Source

License

Files

readme.md

Latest commit

History

readme.md

File metadata and controls

WN16S

Motivation

Structure

Download

Source

License