adding language: Esperanto #36

niru86 · 2021-12-30T10:33:12Z

I'm trying to adapt prosodic to Esperanto: its stress is always paroxytonic abelo (en. bee) [a.'be.lo] but in poetry there can be elision and the word would become oxytonic abel'

Esperanto is as phonematic as Finnish, so I decided to use the orth feature, but I'm puzzled in LANG_stress.py because I don't understand its code :( Could you help me? I want to use prosodic for my MA research.

quadrismegistus · 2024-08-01T09:35:14Z

Hi there, are you still interested in working on this? I'd be happy to collaborate if you're still interested.

We can model Esperanto parser on the FinnishLanguage object in finnish.py:

class FinnishLanguage(Language):
    pronunciation_dictionary_filename = os.path.join(PATH_DICTS,'en','english.tsv')
    lang = 'fi'
    cache_fn = 'finnish_wordtypes'

    @cache
    def get(self, token):
        token=token.strip()
        Annotation = make_annotation(token)
        syllables=[]
        wordbroken=False
        for ij in range(len(Annotation.syllables)):
            try:
                sylldat=Annotation.split_sylls[ij]
            except IndexError:
                sylldat=["","",""]

            syllStr=""
            onsetStr=sylldat[0].strip().replace("'","").lower()
            nucleusStr=sylldat[1].strip().replace("'","").lower()
            codaStr=sylldat[2].strip().replace("'","").lower()

            for x in [onsetStr,nucleusStr,codaStr]:
                x=x.strip()
                if not x: continue
                if (not x in orth2phon):
                    for y in x:
                        y=y.strip()
                        if not y: continue
                        if (not y in orth2phon):
                            wordbroken=True
                        else:
                            syllStr+="".join(orth2phon[y])
                else:
                    syllStr+="".join(orth2phon[x])
            syllables.append(syllStr)

        wordforms=[]
        sylls_text=[syll for syll in Annotation.syllables]
        for stress in Annotation.stresses:
            sylls_ipa = [stress2stroke[stress[i]]+syllables[i] for i in range(len(syllables))]
            wf=WordForm(
                token, 
                sylls_ipa=sylls_ipa, 
                sylls_text=sylls_text,
            )
            wordforms.append(wf)
        wordtype = WordType(token, children=wordforms, lang=self.lang)
        return wordtype

All we need is a .get(token) method that can take an arbitrary word string and return a WordType object composed of the syllabified data (phonemes + orthography).

It then should work like this:

In [10]: from prosodic.langs.finnish import Finnish

In [11]: word = Finnish().get('kalevala')

In [12]: for syll in word.syllables:
    ...:     print(syll)
    ...: 
Syllable(ipa="'kɑ", num=1, txt='ka', is_stressed=True, is_heavy=False, is_strong=True, is_weak=False)
Syllable(ipa='le', num=2, txt='le', is_stressed=False, is_heavy=False, is_strong=False, is_weak=True)
Syllable(ipa='`vɑ', num=3, txt='va', is_stressed=True, is_heavy=False, is_strong=True, is_weak=False)
Syllable(ipa='lɑ', num=4, txt='la', is_stressed=False, is_heavy=False, is_strong=False, is_weak=True)

Let me know if you have thoughts. It's great that Esperanto is rule-based in its stress: seems doable to incorporate!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adding language: Esperanto #36

adding language: Esperanto #36

niru86 commented Dec 30, 2021

quadrismegistus commented Aug 1, 2024

adding language: Esperanto #36

adding language: Esperanto #36

Comments

niru86 commented Dec 30, 2021

quadrismegistus commented Aug 1, 2024