Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

📕 Documentation: Dictionary.xml and DictionaryDescription.md of: eoActivityAgents #88

Open
EmanuelFaria opened this issue Feb 10, 2021 · 0 comments

Comments

@EmanuelFaria
Copy link
Collaborator

I've assembled a list of 8850 "activity agents" (I don't know what else to call them), that will need to normalized against either Wikidata or perhaps chebi.

I created this list by doing a GREP query on the almost 250,000 articles I pulled down with GetPapers last year. The query included up to four words before the term "agent" or "agents". Then with a LOT of cleaning, I trimmed the leading words and got this list down from about 50,000 to its present state (there were a lot of duplicates).

All the articles I pulled all had to do with various terms describing for the two main themes: Plant Extracts (or essential oils, etc.) AND Activities (medicinal, pharmacological, phyto-medicinal, etc.) NOT (petrol, shale, "oil", ... nothing "animal feed-related") etc.,

I ran the cleanest getpapers queries I could. Overall, there are very few terms that are out of the ballpark. Some of them have to do with what I consider "formulation" terms, (excipients, adhesives, abrasives, etc..,) but for the most part, these would be useful for any biomedical project, including Covid.

I did most of the work months ago, as a way to see what the literature had in it, and flex my growing GREP skills. But I pulled it out a couple of days ago and decided to do a bunch of find (junk/stop words) and replace them with , and it came out really nice. I wish I could have kept the discarded words separately (turns out scientists use a lot of puffery in their descriptions, just like marketers do!), but I couldn't think of a way of doing that that would have been practical or efficient.

Anyhow, it's still useful even without further disambiguation or descriptions, but adding those would definitely make it more useful — especially, if we could split them up into different dictionaries, for example, having to do with different pathways. But that's different kettle of fish.

EDIT: Also, I ran some random tests by pulling out multi-word terms I'd never heard of, and putting them — in quotes — in EUPMC searches, and all of them had a decent number of hits.

EDIT 2: Plus, I never would otherwise have found so many different ways (synonym terms) to find things I'm actually interested in. For example:

  • anti-oxidant agent
  • anti-oxidants agent
  • anti-oxidation agent
  • anti-oxidative agent
  • anti-oxidative protecting agent
  • anti-oxidative stress agent
  • anti-oxidizing agent
  • anti-oxygenic agent

Who knew? 😀🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant