Skip to content

Latest commit

 

History

History
34 lines (31 loc) · 2.86 KB

Meeting200326.md

File metadata and controls

34 lines (31 loc) · 2.86 KB

Meeting XX.3.2020

Topics

  • Niko needs for pension purposes a paper from Rogier that mentions working in the project after 1.6.2020 (same as last year)

  • Niko has set up the FST's and CG to work so that we can get both Komi and Russian analysis, and just take the Russian when there is no Komi reading

  • Both Komi and Russian analysis go now through Komi CG

    • This operation reduces the number of unknown forms from 15% to 10%
  • Some Saami part needs to be checked

    • Micha will do this.
  • Quite good portion of the dialectal forms are now coming out correctly, and some can be solved with regex-based rules (although those aren't so pretty)

  • There are some questions that go way beyond Niko's FST skills, i.e. вӧӧ : вӧлӧн, кайны : каа, especially Iźva vowel lenghtening related changes are complex that fixing them is far from trivial. In this point we can also just say that there is a subset of problems we will solve in later time.

    • Micha will set up an experimental folder and write these twolc rules.
  • We have run out of CSC's billing units (this is normal and expected), so Micha as CSC's project head person should apply for more billing units, this should be just one click away

    • One reason why we ran out of them was that Niko didn't know what he was doing with servers, but what can we do, that's how we learn
  • Micha could also apply at the same time for us to have this Allas tool from CSC

    • The logic would be that our multimedia is in IDA storage, but CSC recommends now Allas for files that change often
    • I'm also collaborating bit with Oulu people, and they are setting up their system now same way
  • Participation to DoReCo: Need to decide which files to send over. Niko has a list of potentially good files (mainly based to having one speaker from all different regions, maybe two from Ižma as it is so heavily populated).

  • Possible DoReCo filelist:

    • kpv_izva20140325-2-a – older male speaker, from village, but worked at tundra
    • kpv_izva20140330-1-b – young female speaker, from Ižma
    • kpv_izva20150402-7-b – older female speaker, grew up and lived at tundra
    • kpv_izva20150411-1-b – older female speaker from Nenetsia, bilingual in Komi and Nenets
    • kpv_izva20150703-01-b – young female speaker from Tarko-Sale
    • kpv_izva20150703-03-b – young male speaker from Kola Peninsula
    • kpv_izva20150703-05-b – older female speaker from Siberia
    • kpv_izva20150705-02-b – young male speaker from Ižma
    • kpv_izva20150705-03-b - middle age male speaker from Ižma
    • kpv_izva20150707-01-b - middle age male speaker from Upper Ižma
    • kpv_izva20160619-03 - older male from speaker from Kola Peninsula
    • kpv_izva20160620-06 - older female speaker from Kola Peninsula
  • SVN move to GitHub (Micha talked with Sjur)