Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dictionary Validation — Beyond Synonyms: Hyphens and Suffixes and Layman Terms, oh my! #87

Open
EmanuelFaria opened this issue Sep 17, 2020 · 0 comments

Comments

@EmanuelFaria
Copy link
Collaborator

@petermr et al.... In thinking about validating dictionaries, a bunch of questions popped into my head (right before bed, as usual):

1. How will AMI handle terms that are just as often often found with and without hyphens?
Currently, our dictionaries include hyphenated and non-hyphenated terms as entries/rows. But now that we're talking about collecting synonyms in a new "field" in the same record, how do we treat these preferences/abberations?

I imagine we could decide to "go with hyphens" as the default and hard-code AMI to handle replacements automatically by treating each occurrence having a hyphen:

  • a) with the hyphen,
  • b) replacing the hyphen with a space, and finally
  • c) deleting the hyphen — essentially replacing it with "nospace".

This could work, but...
a) we'd have to be pretty confident we pasted in the default hyphen everywhere they could/should be one, and
b) we'd also need to ensure that all "hyphen-having chemical compounds" are treated in accordingly... er, respectfully... like the lady or gentlemen molecule worthy of respect they no doubt they are.

As a side-note, I believe EUPMC's browser search treats quoted+hyphenated "multi-word terms" (kind of like that one right there) the same. That is, within quotes, it treats words separated by a hyphen the same as those separated by a space — but it treats terms with no space as different terms altogether. So the question is: W.W.A.D.? (Would Will AMI do?)

2. Suffixes as synonyms?
Should we account for all possible word endings? What about plural versions? Will the addition of an "s" or "es" at the end of a term affect the results? (@petermr, If you want to code AMI to handle affixes automatically, I have a clean list of them ready to go for you! And — oh boy! — it would feel great to know that time I spent collecting, cleaning, and organizing them wasn't "wasted" on learning something new again). 🤓

3. Layman's Terms/Names... synonyms or not?
Using plant names as an example, will we be treating plant common names (Ceder Leaf Oil) as a synonym for their botanical names (Thuja Occidentalis)? If not, we'll also need to consider that some common names (fruits for example), will be different among countries or regions ... and then there's the whole "many fruits vs. single fruits" issue ... and hyphenated fruits too, I suppose... 🤔🤯

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant