Skip to content

kizoey/text-style-transfer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

text-style-transfer

   

Text style transfer is to convert a formal piece of text into an informal piece of text, vice cersa. We've encountered the problem of lack of parallel corpora to train these models that could lead to good task performance. In this project, we aim to increase performance of formality style transfer model using in-domain data augmentation methods including synonym replacement and round trip translation. Both of these methods focuses on in-domain training that avoids losing generality and assures the quality of data. Augmentation that replies on synonym replacement replaces certain words of the sentence and round trip translation translate the sentence from one language to another and back. For our baseline model, we incorporate GYFAC(Grammarly's Yahoo Answers Formality Corpus) corpus.
Machine transformers of our model are modeled through sequence to sequence(Seq2Seq) neural architecture along with the attention mechanism. The language model leverages the likelihood of belonging to the target domain and predicts the next word. Furthermore, we explore three different scoring functions that are dot, general and concatenate and evaluate our augmented model compared to the baseline model using BLEU score as our metric.

major Contributions

  • Improve model performance with data augmentation methods
  • Explore formality style transfer datasets and models
  • Compare results of different scoring variants
  • Application of Attention mechanism to solve the bottleneck problem of seq2seq
  • Wrote short full-paper on our findings and experiments
  • Conducted baseline, augmented, ablation study experiments

Directory

algorithms

  • tknizer: tokenizer_formal, informal / augmented_formal, informal
  • model: Data preprocessing, encoder/decoder with attention, train

ppt

  • proposal
  • final_report
  • final_presentation (top 13 teams)
GYFAC corpus is not shown due to confidential issues.