-
Notifications
You must be signed in to change notification settings - Fork 1
GT Disambiguation
Niko Partanen edited this page Sep 26, 2017
·
6 revisions
This page describes how constraint grammar based disambiguation can be used.
If I understand correctly, the lookup output has to be converted into something else with cg-conv
tool, after which vislcg3
does the disambiguation. The tools are documented here.
So in this demonstration I have an example file like this, called test.txt
. It is a simple test and one could think about something that has maximal amount of ambiguity:
бур
лун
дона
ёртъяс
да
тӧдсаяс
.
With command:
cat test.txt | lookup src/analyser-gt-desc.xfst
We get:
***** LEXICON LOOK-UP *****
бур бур+A
бур бур+A+Attr
бур бур+A+Der/MWN+N+Sg+Nom
бур бур+A+Der/MWN+N+Sg+Acc
лун лун+N+Sg+Nom
лун лун+N+Sg+Acc
дона дон+N+Sg+Der/а+HabObjMod+A
дона дон+N+Sg+Der/а+HabObjMod+A+Attr
дона дон+N+Sg+Der/а+HabObjMod+A+Der/MWN+N+Sg+Nom
дона дон+N+Sg+Der/а+HabObjMod+A+Der/MWN+N+Sg+Acc
дона дон+A+Der/а+Adv
дона дона+A
дона дона+A+Attr
дона дона+A+Der/MWN+N+Sg+Nom
дона дона+A+Der/MWN+N+Sg+Acc
ёртъяс ёрт+N+Pl+Nom
ёртъяс ёрт+N+Pl+Acc
да да+CC
да да+CS
да да+Pcle
да да+Adv
тӧдсаяс тӧд+N+Sg+Der/са+IneMod+A+Der/MWN+N+Pl+Nom
тӧдсаяс тӧд+N+Sg+Der/са+IneMod+A+Der/MWN+N+Pl+Acc
тӧдсаяс тӧдса+N+Pl+Nom
тӧдсаяс тӧдса+N+Pl+Acc
тӧдсаяс тӧдса+A+Der/MWN+N+Pl+Nom
тӧдсаяс тӧдса+A+Der/MWN+N+Pl+Acc
. . +CLB
By piping that into constraint grammar file with this:
cat test.txt | lookup src/analyser-gt-desc.xfst | cg-conv | vislcg3 -g src/syntax/disambiguation.cg3
We get:
"<бур>"
"бур" A
"бур" A Attr
"<лун>"
"лун" N Sg Nom
"<дона>"
"дона" A Attr
"дона" A
"<ёртъяс>"
"ёрт" N Pl Nom
"<да>"
"да" CC
"да" CS
"<тӧдсаяс>"
"тӧдса" N Pl Nom
"<.>"
"." <W:0>
Much better! We can also see that the output format is somewhat different, but this should not be a problem.
echo " Бун лун, дона ёртъяс да тӧдсаяс." \
| hfst-tokenize --giella-cg tools/tokenisers/tokeniser-disamb-gt-desc.pmhfst \
| vislcg3 -g src/syntax/disambiguation.cg3
This produces:
:
"<Бун>"
"Бун" N Prop Sem/Sur Sg Nom <W:0.0000000000>
:
"<лун>"
"лун" N Sg Nom <W:0.0000000000>
"<,>"
"," CLB <W:0.0000000000>
:
"<дона>"
"дона" A Attr <W:0.0000000000>
"дона" A <W:0.0000000000>
:
"<ёртъяс>"
"ёрт" N Pl Nom <W:0.0000000000>
:
"<да>"
"да" CS <W:0.0000000000>
"да" CC <W:0.0000000000>
:
"<тӧдсаяс>"
"тӧдса" N Pl Nom <W:0.0000000000>
"<.>"
"." CLB <W:0.0000000000>
:\n
Which also seems to resolve lots of problematic issues related to parsing the output.