Master's Thesis

Text Generation with BERT Embeddings, WordNet senses and Frame Embeddings built on FrameNet graph.

The objective of this thesis is to blend state-of-the-art neural architectures with the still scarcely exploited potential of symbolic knowledge bases, to contribute tackling one of the biggest open problems in artificial intelligence in a specific subfield of natural language processing (NLP): commonsense reasoning in text generation.

Given a set of concepts (expressed by nouns and verbs), the goal is to generate a short sentence that acts as a description of a scene, plausible according to human commonsense knowledge. This problem can be seen as a special case of constrained text generation, with two major challenges: the generation of sentences given an unordered set of keywords with potential morphological changes, and the comprehension of commonsense relations between sets of concepts, finding an appropriate composition.

Since most recent approaches to the problem show no interest in the use of symbolic knowledge resources, this work intends to take the best of both worlds (neural/symbolic), lying at their intersection. The recently proven capabilities of transformer models are leveraged in com- bination with word sense disambiguation and frame embeddings extracted from FrameNet, an English language knowledge base built upon the theory of Frame Semantics.

The evaluation has been conducted on Commongen, a dataset suitably built for this purpose. A long pre-processing phase, including disambiguation of nominal entities, has been conducted, before training the model for three main scenarios: raw text (as baseline); disambiguated text; disambiguated text with frame embeddings.

The approach has been evaluated on a manually built test-set from the post-processed initial dataset, by means of BLEU and ROUGE metrics. The proposed approach leads to an increase in performance with respect to the baseline, achieving promising results on both automatic metrics, and suggesting further steps to refine our methodology.

Link to AMSLaurea

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
UniLM		UniLM
embeddings		embeddings
frame2vec		frame2vec
vocab		vocab
wsd_json		wsd_json
wsd_txt		wsd_txt
.DS_Store		.DS_Store
Frame_data.ipynb		Frame_data.ipynb
README.md		README.md
Transformer_frame_wsd.ipynb		Transformer_frame_wsd.ipynb
Transformer_raw.ipynb		Transformer_raw.ipynb
Transformer_wsd.ipynb		Transformer_wsd.ipynb
WSD.ipynb		WSD.ipynb
bn_wn_map.p		bn_wn_map.p
conc_sent_frame_wsd.json		conc_sent_frame_wsd.json
train_test_disambiguated.json		train_test_disambiguated.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Master's Thesis

Text Generation with BERT Embeddings, WordNet senses and Frame Embeddings built on FrameNet graph.

About

Releases

Packages

Languages

edivadiranatnom/Master-Thesis

Folders and files

Latest commit

History

Repository files navigation

Master's Thesis

Text Generation with BERT Embeddings, WordNet senses and Frame Embeddings built on FrameNet graph.

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages