-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adding language: Esperanto #36
Comments
Hi there, are you still interested in working on this? I'd be happy to collaborate if you're still interested. We can model Esperanto parser on the class FinnishLanguage(Language):
pronunciation_dictionary_filename = os.path.join(PATH_DICTS,'en','english.tsv')
lang = 'fi'
cache_fn = 'finnish_wordtypes'
@cache
def get(self, token):
token=token.strip()
Annotation = make_annotation(token)
syllables=[]
wordbroken=False
for ij in range(len(Annotation.syllables)):
try:
sylldat=Annotation.split_sylls[ij]
except IndexError:
sylldat=["","",""]
syllStr=""
onsetStr=sylldat[0].strip().replace("'","").lower()
nucleusStr=sylldat[1].strip().replace("'","").lower()
codaStr=sylldat[2].strip().replace("'","").lower()
for x in [onsetStr,nucleusStr,codaStr]:
x=x.strip()
if not x: continue
if (not x in orth2phon):
for y in x:
y=y.strip()
if not y: continue
if (not y in orth2phon):
wordbroken=True
else:
syllStr+="".join(orth2phon[y])
else:
syllStr+="".join(orth2phon[x])
syllables.append(syllStr)
wordforms=[]
sylls_text=[syll for syll in Annotation.syllables]
for stress in Annotation.stresses:
sylls_ipa = [stress2stroke[stress[i]]+syllables[i] for i in range(len(syllables))]
wf=WordForm(
token,
sylls_ipa=sylls_ipa,
sylls_text=sylls_text,
)
wordforms.append(wf)
wordtype = WordType(token, children=wordforms, lang=self.lang)
return wordtype All we need is a It then should work like this: In [10]: from prosodic.langs.finnish import Finnish
In [11]: word = Finnish().get('kalevala')
In [12]: for syll in word.syllables:
...: print(syll)
...:
Syllable(ipa="'kɑ", num=1, txt='ka', is_stressed=True, is_heavy=False, is_strong=True, is_weak=False)
Syllable(ipa='le', num=2, txt='le', is_stressed=False, is_heavy=False, is_strong=False, is_weak=True)
Syllable(ipa='`vɑ', num=3, txt='va', is_stressed=True, is_heavy=False, is_strong=True, is_weak=False)
Syllable(ipa='lɑ', num=4, txt='la', is_stressed=False, is_heavy=False, is_strong=False, is_weak=True) Let me know if you have thoughts. It's great that Esperanto is rule-based in its stress: seems doable to incorporate! |
I'm trying to adapt prosodic to Esperanto: its stress is always paroxytonic abelo (en. bee) [a.'be.lo] but in poetry there can be elision and the word would become oxytonic abel'
Esperanto is as phonematic as Finnish, so I decided to use the orth feature, but I'm puzzled in LANG_stress.py because I don't understand its code :( Could you help me? I want to use prosodic for my MA research.
The text was updated successfully, but these errors were encountered: