Skip to content

GT Disambiguation

Niko Partanen edited this page Sep 26, 2017 · 6 revisions

This page describes how constraint grammar based disambiguation can be used.

If I understand correctly, the lookup output has to be converted into something else with cg-conv tool, after which vislcg3 does the disambiguation. The tools are documented here.

So in this demonstration I have an example file like this, called test.txt. It is a simple test and one could think about something that has maximal amount of ambiguity:

бур
лун
дона
ёртъяс
да
тӧдсаяс
.

With command:

cat test.txt | lookup src/analyser-gt-desc.xfst

We get:

  *****  LEXICON LOOK-UP  *****

бур	бур+A
бур	бур+A+Attr
бур	бур+A+Der/MWN+N+Sg+Nom
бур	бур+A+Der/MWN+N+Sg+Acc

лун	лун+N+Sg+Nom
лун	лун+N+Sg+Acc

дона	дон+N+Sg+Der/а+HabObjMod+A
дона	дон+N+Sg+Der/а+HabObjMod+A+Attr
дона	дон+N+Sg+Der/а+HabObjMod+A+Der/MWN+N+Sg+Nom
дона	дон+N+Sg+Der/а+HabObjMod+A+Der/MWN+N+Sg+Acc
дона	дон+A+Der/а+Adv
дона	дона+A
дона	дона+A+Attr
дона	дона+A+Der/MWN+N+Sg+Nom
дона	дона+A+Der/MWN+N+Sg+Acc

ёртъяс	ёрт+N+Pl+Nom
ёртъяс	ёрт+N+Pl+Acc

да	да+CC
да	да+CS
да	да+Pcle
да	да+Adv

тӧдсаяс	тӧд+N+Sg+Der/са+IneMod+A+Der/MWN+N+Pl+Nom
тӧдсаяс	тӧд+N+Sg+Der/са+IneMod+A+Der/MWN+N+Pl+Acc
тӧдсаяс	тӧдса+N+Pl+Nom
тӧдсаяс	тӧдса+N+Pl+Acc
тӧдсаяс	тӧдса+A+Der/MWN+N+Pl+Nom
тӧдсаяс	тӧдса+A+Der/MWN+N+Pl+Acc

.	.	+CLB

By piping that into constraint grammar file with this:

cat test.txt | lookup src/analyser-gt-desc.xfst | cg-conv | vislcg3 -g src/syntax/disambiguation.cg3

We get:

"<бур>"
	"бур" A
	"бур" A Attr
"<лун>"
	"лун" N Sg Nom
"<дона>"
	"дона" A Attr
	"дона" A
"<ёртъяс>"
	"ёрт" N Pl Nom
"<да>"
	"да" CC
	"да" CS
"<тӧдсаяс>"
	"тӧдса" N Pl Nom
"<.>"
	"." <W:0>

Much better! We can also see that the output format is somewhat different, but this should not be a problem.

 echo " Бун лун, дона ёртъяс да тӧдсаяс." \
 | hfst-tokenize --giella-cg tools/tokenisers/tokeniser-disamb-gt-desc.pmhfst \
 | vislcg3 -g src/syntax/disambiguation.cg3

This produces:

: 
"<Бун>"
	"Бун" N Prop Sem/Sur Sg Nom <W:0.0000000000>
: 
"<лун>"
	"лун" N Sg Nom <W:0.0000000000>
"<,>"
	"," CLB <W:0.0000000000>
: 
"<дона>"
	"дона" A Attr <W:0.0000000000>
	"дона" A <W:0.0000000000>
: 
"<ёртъяс>"
	"ёрт" N Pl Nom <W:0.0000000000>
: 
"<да>"
	"да" CS <W:0.0000000000>
	"да" CC <W:0.0000000000>
: 
"<тӧдсаяс>"
	"тӧдса" N Pl Nom <W:0.0000000000>
"<.>"
	"." CLB <W:0.0000000000>
:\n

Which also seems to resolve lots of problematic issues related to parsing the output.

Clone this wiki locally