Skip to content

sagahansson/lt2316-ml-project

Repository files navigation

Instructions:

The steps followed by an asterisk (*) are not necessary, as the files produced by these steps can be found in the main directory.

1. *

Download data from this link. Put the kaggle.csv file into the main directory (containing clean_poems.py among other files.)

2. *

Run clean_poems.py. This creates three files:

  • kaggle_poems.p -- A nested list of cleaned poems, where each word/punctuation mark is a string.
  • flat_poems.p -- Same as kaggle_poems.p, except not a nested list, i.e. all poems are in one big list.
  • i2w.p -- Dictionary consisting of a number (key) for each unique word/punctuation mark (value) in poems.

3.

Run train_models.py. This trains the model in poetrymodel.py with hyperparameter settings as specified in hp.csv. (This will take a while.) Each model is saved as a .pt in ./models-outputs/.

4.

Run the load_generate.ipynb notebook for each model. In the notebook, the model is loaded and generates poetry. (The seed words can be changed in the variable wsand the topk value can be changed in the variable topk.) The notebook creates a .txt file for each model, containing the poetry generated by that model. The notebook can also save each model name and hyperparamers in hp_new.csv, to keep track of which models have been used to generate poetry.