Skip to content

Introductory workshop on Python and NLP for humanities

License

Notifications You must be signed in to change notification settings

SCDH/python-nlp-workshop

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

40 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Programming for humanities scholars II: NLP with Python

Objective:

This workshop aims to provide humanities scholars with advanced techniques to efficiently handle, model, transform, and visually present text and tabular data. The focus will be on user-friendly yet powerful tools and libraries to make modern data analysis and visualization methods accessible.

Prerequisites:

Participants should have basic knowledge of Python, including loops, conditions, functions, lists, dictionaries, and experience in creating a simple word counter.

Workshop Format:

  • Duration: 8 hours (including breaks)
  • Format: Theoretical introductions interspersed with practical exercise sessions.
  • Tools: Jupyter Notebooks, Python (Pandas, Matplotlib, Seaborn), with optional use of Tableau for additional visualizations, and an introduction to NLP tools like SpaCy and API requests to GPT.

Schedule and Content

Part 1: Introduction to Jupyter Notebooks (1 hour)

  • Goals: Familiarize participants with the Jupyter Notebook environment, which will be used throughout the workshop.
  • Content:
    • Overview of Jupyter Notebooks: installation, launching, and basic navigation.
    • Understanding and creating cells (Markdown vs Code).
    • Running code and documenting work within notebooks.
    • Practical exercise: Creating a simple notebook with basic Python code and Markdown documentation.

Part 2: Introduction to Data Modeling (1,5 hours)

  • Goals: Understanding data structures and data modeling concepts.
  • Content:
    • Brief review of Python data structures.
    • Introduction to advanced data structures (Sets, Tuples, Lists) and their use cases.
    • Fundamental data modeling concepts: entities, relationships.
    • Hands-on exercise: Model a small dataset relevant to the humanities.

Part 3: Data Handling and Transformation with Pandas (1,5 hours)

  • Goals: Learn techniques for data preparation and transformation.
  • Content:
    • Introduction to Pandas: DataFrames and Series.
    • Loading, inspecting, and cleaning data (e.g., text files, CSV).
    • Data transformation: filtering, sorting, grouping.
    • Practical exercise: Apply concepts to a real dataset (e.g., literary texts, historical records).

Part 4: Introduction to Natural Language Processing (NLP) with SpaCy (2 hours)

  • Goals: Basic understanding of NLP and practical use of the SpaCy library.
  • Content:
    • Overview of NLP and its applications in the humanities.
    • Introduction to SpaCy: installation, basic usage
    • Practical exercise: Structurize a text with the help of Spacy's data format.

Part 5: Practical Usage of Natural Language Processing (NLP) with SpaCy (2 hours)

  • Goals: Learn the usage of NLP and the SpaCy library.
  • Content:
    • Overview of NLP and its applications in the humanities.
    • We discuss common NLP tasks (tokenization, part-of-speech tagging, named entity recognition)
    • Practical exercise: Extract named entities from a sample text and perform basic text analysis.

Materials and Resources

  • Jupyter Notebooks for all practical exercises.
  • Datasets from the humanities, prepared for use in the workshop.
  • Access to online resources for further information and learning materials.

Notes for Participants:

  • It is recommended to bring a personal laptop with pre-installed Python and the mentioned libraries (Pandas, Matplotlib, Seaborn, SpaCy). Installation guides for these tools will be provided prior to the workshop.
  • No prior experience with Pandas, visualization tools, or NLP libraries is needed.

Through this workshop, participants will be able to independently conduct data projects from conception to visual presentation and effectively analyze and interpret complex datasets using advanced Python tools.

About

Introductory workshop on Python and NLP for humanities

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published