Releases · minimalparts/nonce2vec

02 Nov 19:10

v2.0.2

d2393e3

Merge pull request #16 from akb89/develop

updated wget urls

Assets 2

10 Jan 18:56

akb89

v2.0.1

cb1d4be

v2.0.1

Fixed bug on empty probs in informativeness

Assets 2

29 Jul 12:04

akb89

v2.0.0

e5e42aa

v2.0.0

This is the version accompanying the SRW 2019 paper Towards Incremental Learning of Word Embeddings Using Context Informativeness (Kabbach et al., 2019).

Abstract
In this paper, we investigate the task of learning word embeddings from very sparse data in an incremental, cognitively-plausible way. We focus on the notion of informativeness, that is, the idea that some content is more valuable to the learning process than other. We further highlight the challenges of online learning and argue that previous systems fall short of implementing incrementality. Concretely, we incorporate informativeness in a previously proposed model of nonce learning, using it for context selection and learning rate modulation. We test our system on the task of learning new words from definitions, as well as on the task of learning new words from potentially uninformative contexts. We demonstrate that informativeness is crucial to obtaining state-of-the-art performance in a truly incremental setup.

Citation

@inproceedings{kabbach-etal-2019-towards,
    title = "Towards Incremental Learning of Word Embeddings Using Context Informativeness",
    author = "Kabbach, Alexandre  and
      Gulordava, Kristina  and
      Herbelot, Aur{\'e}lie",
    booktitle = "Proceedings of the 57th Conference of the Association for Computational Linguistics: Student Research Workshop",
    month = jul,
    year = "2019",
    address = "Florence, Italy",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/P19-2022",
    pages = "162--168"
}

Assets 2

21 Sep 11:47

akb89

v1.2.1

d08c961

v1.2.1

added zen DOI
Added info about code location
Fixed chimera dataset

Assets 2

31 Dec 11:19

minimalparts

v1.0

ce4d6d9

Initial release of nonce2vec.

This is the repo accompanying the paper "High-risk learning: acquiring new word vectors from tiny data" (Herbelot & Baroni, 2017).

Abstract

Distributional semantics models are known to struggle with small data. It is generally accepted that in order to learn 'a good vector' for a word, a model must have sufficient examples of its usage. This contradicts the fact that humans can guess the meaning of a word from a few occurrences only. In this paper, we show that a neural language model such as Word2Vec only necessitates minor modifications to its standard architecture to learn new terms from tiny data, using background knowledge from a previously learnt semantic space. We test our model on word definitions and on a nonce task involving 2-6 sentences' worth of context, showing a large increase in performance over state-of-the-art models on the definitional task.

Citation

A. Herbelot and M. Baroni. 2017. High-risk learning: Acquiring new word vectors from tiny data. Proceedings of EMNLP 2017 (Conference on Empirical Methods in Natural Language Processing).

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Abstract

Citation

Releases: minimalparts/nonce2vec

v2.0.2

v2.0.1

v2.0.0

v1.2.1

Initial release of nonce2vec.

Abstract

Citation