Skip to content

Latest commit

 

History

History
54 lines (45 loc) · 3.59 KB

readme.md

File metadata and controls

54 lines (45 loc) · 3.59 KB

WN16S

github downloads dataset download dataset format dataset source software license


WordNet dataset with semantic relations only

Motivation

In WordNet two kinds of relations are recognized: lexical and semantic. Lexical relations hold between word forms (lemmas); semantic relations hold between word meanings (synsets).

I wanted to have a dataset with the lexical relations filtered out to build synset embeddings based only on the semantic relations of the WN graph.

Structure

In the dataset folder, you can find many tsv and txt files the meaning of which is explained hereafter.

file name purpose notes
count_synsets.txt File that contains the number of synsets.
count_relations.txt Files that contain the number of relations.
count_edges_all.txt File that contains the number of total edges.
count_edges_*.tsv Files that contain the number of edges of type *.
synset_name_to_id.tsv File that maps each synset's name to a numeric id starting from 0. The file is sorted on the first column.
synset_id_to_name.tsv File that maps each synset id to a synset's name. The file is sorted on the first column.
relation_name_to_id.tsv File that maps each relation to a numeric id starting from 0. The file is sorted on the first column.
relation_id_to_name.tsv File that maps each relation id to a relation's name. The file is sorted on the first column.
edges_as_id_all.tsv File that contains all the edges of the WordNet's semantic subgraph as triples of ids (id synset 1, id relation, id synset 2). The file is sorted on the second column.
edges_as_id_*.tsv Files that contain only the edges of type *. The file is sorted on the second column.
edges_as_name_all.tsv File that contains all the edges of the WordNet's semantic subgraph as triples of names (name synset 1, name relation, name synset 2). The file is sorted on the second column.
edges_as_name_*.tsv Files that contain only the edges of type *. The file is sorted on the second column.

Download

A compressed version of the dataset can be downloaded from the release page or by clicking here.

Source

The dataset is generated using nltk and is a subset of the WordNet dataset.

License

All source code of this project is licensed under the MIT License - see the license file for details.