Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding language: Esperanto #36

Open
niru86 opened this issue Dec 30, 2021 · 1 comment
Open

adding language: Esperanto #36

niru86 opened this issue Dec 30, 2021 · 1 comment

Comments

@niru86
Copy link

niru86 commented Dec 30, 2021

I'm trying to adapt prosodic to Esperanto: its stress is always paroxytonic abelo (en. bee) [a.'be.lo] but in poetry there can be elision and the word would become oxytonic abel'

Esperanto is as phonematic as Finnish, so I decided to use the orth feature, but I'm puzzled in LANG_stress.py because I don't understand its code :( Could you help me? I want to use prosodic for my MA research.

@quadrismegistus
Copy link
Owner

Hi there, are you still interested in working on this? I'd be happy to collaborate if you're still interested.

We can model Esperanto parser on the FinnishLanguage object in finnish.py:

class FinnishLanguage(Language):
    pronunciation_dictionary_filename = os.path.join(PATH_DICTS,'en','english.tsv')
    lang = 'fi'
    cache_fn = 'finnish_wordtypes'

    @cache
    def get(self, token):
        token=token.strip()
        Annotation = make_annotation(token)
        syllables=[]
        wordbroken=False
        for ij in range(len(Annotation.syllables)):
            try:
                sylldat=Annotation.split_sylls[ij]
            except IndexError:
                sylldat=["","",""]

            syllStr=""
            onsetStr=sylldat[0].strip().replace("'","").lower()
            nucleusStr=sylldat[1].strip().replace("'","").lower()
            codaStr=sylldat[2].strip().replace("'","").lower()

            for x in [onsetStr,nucleusStr,codaStr]:
                x=x.strip()
                if not x: continue
                if (not x in orth2phon):
                    for y in x:
                        y=y.strip()
                        if not y: continue
                        if (not y in orth2phon):
                            wordbroken=True
                        else:
                            syllStr+="".join(orth2phon[y])
                else:
                    syllStr+="".join(orth2phon[x])
            syllables.append(syllStr)

        wordforms=[]
        sylls_text=[syll for syll in Annotation.syllables]
        for stress in Annotation.stresses:
            sylls_ipa = [stress2stroke[stress[i]]+syllables[i] for i in range(len(syllables))]
            wf=WordForm(
                token, 
                sylls_ipa=sylls_ipa, 
                sylls_text=sylls_text,
            )
            wordforms.append(wf)
        wordtype = WordType(token, children=wordforms, lang=self.lang)
        return wordtype

All we need is a .get(token) method that can take an arbitrary word string and return a WordType object composed of the syllabified data (phonemes + orthography).

It then should work like this:

In [10]: from prosodic.langs.finnish import Finnish

In [11]: word = Finnish().get('kalevala')

In [12]: for syll in word.syllables:
    ...:     print(syll)
    ...: 
Syllable(ipa="'kɑ", num=1, txt='ka', is_stressed=True, is_heavy=False, is_strong=True, is_weak=False)
Syllable(ipa='le', num=2, txt='le', is_stressed=False, is_heavy=False, is_strong=False, is_weak=True)
Syllable(ipa='`vɑ', num=3, txt='va', is_stressed=True, is_heavy=False, is_strong=True, is_weak=False)
Syllable(ipa='lɑ', num=4, txt='la', is_stressed=False, is_heavy=False, is_strong=False, is_weak=True)

Let me know if you have thoughts. It's great that Esperanto is rule-based in its stress: seems doable to incorporate!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants