Skip to content

sfb833-a3/tueba-ddp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 

Repository files navigation

TüBa-D/DP release 5

Introduction

TüBa-D/DP is a machine-annotated dependency treebank of German. The goal of TüBa-D/DP is to offer high-qualitity syntactic annotations for a huge amount of contemporary German text. The annotations following the TüBa-D/Z annotation guidelines (Telljohann et al, 2006) as closely as possible. TüBa-D/DP currently consists of the following subcorpora:

Subcorpus Genre Sentences Tokens Download
Europarl Parliamentary proceedings 2.2M 55M Download
Political speeches Speeches held by officials 619,152 12.8M Download
taz (1986-2009) Newspaper 29.9M 393.7M Contact us
Wikipedia (2020) Encyclopedia 45.5M 917.5M Download

Each subcorpus has the following annotation layers:

  • Part-of-speech tags
  • Inflectional morphology
  • Lemmas
  • Topological fields
  • Dependency relations

A description of the annotation format can be found in the stylebook.

Licensing & availability

Questions

Feel free to ask any questions by creating an issue on GitHub.

About

TüBa-D/DP tools and stylebook

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published