Skip to content

This repository is part of an NLP course for humanities and cultural studies. This course uses historical newspapers as a source and applies NLP methods to them. NLP tasks: Tokenization, Lemmatization, TF-IDF, Part-of-speech tagging, semantic search with transformers, article extraction and OCR post-correction with LLMs, NER and text classification

Notifications You must be signed in to change notification settings

ieg-dhr/NLP-Course4Humanities_2024

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Created by Sarah Oberbichler ORCID

NLP Course for Digital Humanities and Cultural Studies

Welcome to the repository of the NLP course for Digital Methods in the Humanities

About the Course

This course offers an introduction to Natural Language Processing (NLP) and its application in digital humanities. The course is part of the Master's program "Digital Methods for Humanities and Cultural Studies (DMGK)" in Mainz.

Course Website: https://ieg-dhr.github.io/NLP-Course4Humanities_2024/

Course Contents

The course covers the following topics:

  • Introduction to NLP, Jupyter Notebooks, and Python
  • Using SpaCy for NLP tasks
  • Introduction: German Newspaper Portal and its API
  • Recent advances in NLP: Transformer models for semantic search and text similarity (Word Embeddings)
  • Recent advances in NLP: Large Language Models (LLMs) for Semantic Text Extraction (Article Segmentation) and Post-OCR Correction
  • Named Entity Recognition (NER) and Text Classification

Repository Structure

  • index.html: Main page of the course
  • styles.css: CSS stylesheet for the course website
  • datasets/: Folder for course materials and resources

License

About

This repository is part of an NLP course for humanities and cultural studies. This course uses historical newspapers as a source and applies NLP methods to them. NLP tasks: Tokenization, Lemmatization, TF-IDF, Part-of-speech tagging, semantic search with transformers, article extraction and OCR post-correction with LLMs, NER and text classification

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published