Skip to content

Latest commit

 

History

History
44 lines (29 loc) · 1.92 KB

STATISTICS.md

File metadata and controls

44 lines (29 loc) · 1.92 KB

Statistics

This was made after analyzing ~50 Swedish riksdagen open data documents out of 600k in total.

Swedish

Lexical categories

image

540k links in total

Number of sentences

image

Number of sentences with a langdetect score > 0.7

image

Number of sentences with at least one linked entity

Note: only one document was analyzed since the PR was merged

image

Rawtokens with a langdetect score > 0.7 (we accept > 0.4)

image

Garbage tokens

image

count_entities_in_sv_sentences_group_by_label.sql

image

Average entity count per document

Note: only one document was analyzed since the PR was merged

image

Entity count per document

Note: only one document was analyzed since the PR was merged

image

Database size

image

Source

Queries are found in th sql subfolder, see https://github.com/dpriskorn/riksdagen_sentences/tree/add_lookup_endpoint/sql