Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect lemmas for verbs #39

Open
rhdunn opened this issue Nov 28, 2023 · 11 comments
Open

Incorrect lemmas for verbs #39

rhdunn opened this issue Nov 28, 2023 · 11 comments

Comments

@rhdunn
Copy link

rhdunn commented Nov 28, 2023

ERROR: Sentence w01105055 token 18 -- VBD lemma 'based' does not match past-tense-verb applied to form 'based', expected 'base'
ERROR: Sentence n01032032 token 9 -- VBN lemma 'held' does not match lemma-exception applied to form 'held', expected 'hold'
ERROR: Sentence n01043005 token 7 -- VBN lemma 'called' does not match past-participle-verb applied to form 'called', expected 'call'
ERROR: Sentence n01071009 token 4 -- VBN lemma 'fired' does not match past-participle-verb applied to form 'fired', expected 'fire'
ERROR: Sentence n01077030 token 7 -- VBN lemma 'called' does not match past-participle-verb applied to form 'called', expected 'call'
ERROR: Sentence n01119012 token 17 -- VBN lemma 'made' does not match lemma-exception applied to form 'made', expected 'make'
ERROR: Sentence w01018101 token 18 -- VBN lemma 'aged' does not match past-participle-verb applied to form 'aged', expected 'age'
ERROR: Sentence w01085005 token 18 -- VBN lemma 'prepared' does not match past-participle-verb applied to form 'prepared', expected 'prepare'
ERROR: Sentence w01094066 token 8 -- VBN lemma 'sized' does not match past-participle-verb applied to form 'sized', expected 'size'
ERROR: Sentence w01130101 token 5 -- VBN lemma 'cowritten' does not match lemma-exception applied to form 'cowritten', expected 'cowrite'

Modals

ERROR: Sentence n01036020 token 2 -- MD lemma 'would' does not match lemma-exception applied to form ''d', expected 'will'
ERROR: Sentence n01080042 token 13 -- MD lemma 'would' does not match lemma-exception applied to form '’d', expected 'will'
ERROR: Sentence n01091017 token 14 -- MD lemma 'would' does not match lemma-exception applied to form '’d', expected 'will'
ERROR: Sentence n01121032 token 12 -- MD lemma 'would' does not match lemma-exception applied to form '’d', expected 'will'

UK vs US

In the UK and Commonwealth, the lemma ends in "l", but in the US it ends in "ll":

ERROR: Sentence w01111021 token 7 -- VBD lemma 'enrol' does not match past-tense-verb applied to form 'enrolled', expected 'enroll'
ERROR: Sentence w01115023 token 2 -- VBD lemma 'enrol' does not match past-tense-verb applied to form 'enrolled', expected 'enroll'
ERROR: Sentence w01125037 token 3 -- VBN lemma 'appal' does not match past-participle-verb applied to form 'appalled', expected 'appall'

Note: My validator cannot differentiate these variations yet to be able to report UK vs US English lemmas. As such, there may be other instances/examples I haven't spotted in the validation output.

@AngledLuffa
Copy link
Contributor

EWT treats enrolled and appalled the same way

UniversalDependencies/UD_English-EWT#480

@AngledLuffa
Copy link
Contributor

Regarding the modals, I'm not so sure about that. Both EWT and GUM treat it as would

@nschneid
Copy link
Contributor

Re: lemmas of modal auxes, see UniversalDependencies/UD_English-EWT#450

@rhdunn
Copy link
Author

rhdunn commented Nov 28, 2023

Looks like the linked EWT issue is preserving the form of the lemma without converting it to the base form like with other verbs. I'll update my validator to follow this.

@nschneid
Copy link
Contributor

The question is whether we should annotate modal auxiliaries as having tense at all. If not, then "will" and "would" are morphologically unrelated words and it makes sense that their lemmas are different.

@rhdunn
Copy link
Author

rhdunn commented Nov 29, 2023

If modals are to currently preserve the form, then "wo" in "won't" needs to be "would" as well as the "'d" in "he'd" etc.:

ERROR: Sentence n01123024 token 3 -- MD lemma 'will' does not match lemma-exception applied to form 'wo', expected 'would'
ERROR: Sentence n01123024 token 8 -- MD lemma 'will' does not match lemma-exception applied to form 'wo', expected 'would'
ERROR: Sentence n01150051 token 3 -- MD lemma 'will' does not match lemma-exception applied to form 'wo', expected 'would'

@dan-zeman
Copy link
Member

Isn't won't a short form of will not?

@nschneid
Copy link
Contributor

Yes: won't = will not, wouldn't = would not

@rhdunn
Copy link
Author

rhdunn commented Nov 29, 2023

Ah yes, you are right!

@AngledLuffa
Copy link
Contributor

Did the English spellings and the incorrect verbs. Anything else for this issue?

@AngledLuffa
Copy link
Contributor

@rhdunn call this complete?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants