Authors: Neven Jovanović, Petar Soldo, Department of Classical Philology, Faculty of Humanities and Social Sciences, University of Zagreb, Croatia
A Sunoikisis Digital Classics Session, Summer 2019
Demonstrate how to use BaseX and XQuery to produce Anki spaced repetition vocabulary exercises from a set of morphologically annotated and lemmatized short texts in Greek.
Concentrate on reoccurring words, and on words which are very frequent in Greek (according to the Dickinson College Core Vocabulary list).
Produce three types of exercises:
- from the form to the lemma
- from the form to the grammatical description
- from words in the text to entries in the DC Greek Core vocabulary list (Croatian version, converted to XML)
The Greek texts, annotated in Arethusa (on Perseids), are in data directory.
The Croatian translation of Greek and Latin DC Core lists, converted to XML with some additional fields, is in grclatcore
The BaseX scripts are in scripts.
- Create the main database
sunGreek
with linguistically annotated Greek texts: createDbGreek.xq - Create a DCC Greek list (with Croatian translations) as a BaseX database
grclatcore
: createDbGrcLatCore.xq
- For a given lemma, get a list of forms and POS tags in the collection: forLemmaGetFormPOStag.xq
- Create a list of lemmata: findLemma.xq
- Create a list of lemmata, order by frequency: findLemmaFrequency.xq
- Narrow the list to lemmata whose forms occur at least twice (and exclude punctuation): findLemmaFrequencyTwoPlus.xq
- Explore frequencies of linguistic annotations: getFrequenciesAttributes.xq (lemma, form, postag)
- For lemmata where f >= 2, get a list of occurring forms: fromLemmaToForms.xq
- For a pair of form and lemma, produce an Anki exercise: fromLemmaToAnki.xq
- Narrow to a specific number of occurrences: fromLemmaToAnkiNarrowNumber.xq
- Narrow to specific types of words (e. g. just inflected words: nouns, verbs, adjectives, pronouns): fromLemmaToAnkiNarrowMorphology.xq
Here a list of codes / attributes used for Greek in Arethusa is quite helpful.
- Create a list of morphological descriptions (parts of speech, POS tags): findPOStag.xq
- Get frequency of morphological configurations: findPOStagFrequency.xq
- Select only POS tags for inflected forms, select frequent configurations (e. g. where f >= 14): findPOStagInflectedFrequency.xq
- For a set of POS tags, get forms, lemma, POS: retrievePOS.xq
- Produce Anki exercises asking for the lemma and morphological description of a given form: retrievePOSmapToWords.xq (with Arethusa / Alpheios morphological codes expanded)
- Get vocabulary of one text: vocabularyOneText.xq
- Find lemmata reoccurring in other texts: vocabularyRepeatedInOtherTexts.xq
- Prepare Anki exercises for such lemmata: vocabularyRepeatedInOtherTexts.xq
- Find all DCC lemmata occurring in our texts: findWordsInDCCore.xq
- Produce a set of Anki exercises for these lemmata: DCCoreToAnki.xq
About the program: the Anki User Manual
Form of exercises to be imported into Anki (no field names necessary; the "tag" field can be omitted):
question ; answer ; tag
αὐτός αὐτή αὐτό ; on, isti ; grmorf01
καί ; i ; grmorf01
δέ ; a ; grmorf01
οὗτος αὕτη τοῦτο ; ovaj ; grmorf01
The results of BaseX scripts (...ToAnki
) can be saved as text files (extension is not important), edited in a text editor (recommended, but just for pedagogical reasons -- to select what we want to teach and learn), and then imported into the Anki database (File / Import).
For better control, it is recommended to first add new user to Anki (Add / Open on the welcome screen).