Skip to content

Extract & enrich data in Ibsen miscellaneous documents

Notifications You must be signed in to change notification settings

Loke-git/Ibsen-Varia-Letters

Repository files navigation

Ibsen: Varia and Letters to CSV

Introduction

This script was created as part of a few days' long workshop collaborating with Ruth Sander. Its purpose is to read the Ibsen varia (miscellaneous documents) files and to extract relevant data from them in conjunction with the person registry CSV. The resulting dataset is enriched with names and new IDs where this is possible.

Use

Provide a varia.xml and a navneregister.xml file. Run varia.py (or use the Jupyter Notebook file, if you prefer).

Note

The script does NOT automatically create new XML files that replace older ones. This was done manually.

1.0.1

Fixed a critical bug that prevented the script from executing properly.

Created a clean .py script file with automatic installation of dependencies (which should work).

About

Extract & enrich data in Ibsen miscellaneous documents

Resources

Stars

Watchers

Forks

Languages