You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've assembled a list of 8850 "activity agents" (I don't know what else to call them), that will need to normalized against either Wikidata or perhaps chebi.
I created this list by doing a GREP query on the almost 250,000 articles I pulled down with GetPapers last year. The query included up to four words before the term "agent" or "agents". Then with a LOT of cleaning, I trimmed the leading words and got this list down from about 50,000 to its present state (there were a lot of duplicates).
All the articles I pulled all had to do with various terms describing for the two main themes: Plant Extracts (or essential oils, etc.) AND Activities (medicinal, pharmacological, phyto-medicinal, etc.) NOT (petrol, shale, "oil", ... nothing "animal feed-related") etc.,
I ran the cleanest getpapers queries I could. Overall, there are very few terms that are out of the ballpark. Some of them have to do with what I consider "formulation" terms, (excipients, adhesives, abrasives, etc..,) but for the most part, these would be useful for any biomedical project, including Covid.
I did most of the work months ago, as a way to see what the literature had in it, and flex my growing GREP skills. But I pulled it out a couple of days ago and decided to do a bunch of find (junk/stop words) and replace them with , and it came out really nice. I wish I could have kept the discarded words separately (turns out scientists use a lot of puffery in their descriptions, just like marketers do!), but I couldn't think of a way of doing that that would have been practical or efficient.
Anyhow, it's still useful even without further disambiguation or descriptions, but adding those would definitely make it more useful — especially, if we could split them up into different dictionaries, for example, having to do with different pathways. But that's different kettle of fish.
EDIT: Also, I ran some random tests by pulling out multi-word terms I'd never heard of, and putting them — in quotes — in EUPMC searches, and all of them had a decent number of hits.
EDIT 2: Plus, I never would otherwise have found so many different ways (synonym terms) to find things I'm actually interested in. For example:
anti-oxidant agent
anti-oxidants agent
anti-oxidation agent
anti-oxidative agent
anti-oxidative protecting agent
anti-oxidative stress agent
anti-oxidizing agent
anti-oxygenic agent
Who knew? 😀🎉
The text was updated successfully, but these errors were encountered:
I've assembled a list of 8850 "activity agents" (I don't know what else to call them), that will need to normalized against either Wikidata or perhaps chebi.
I created this list by doing a GREP query on the almost 250,000 articles I pulled down with GetPapers last year. The query included up to four words before the term "agent" or "agents". Then with a LOT of cleaning, I trimmed the leading words and got this list down from about 50,000 to its present state (there were a lot of duplicates).
All the articles I pulled all had to do with various terms describing for the two main themes: Plant Extracts (or essential oils, etc.) AND Activities (medicinal, pharmacological, phyto-medicinal, etc.) NOT (petrol, shale, "oil", ... nothing "animal feed-related") etc.,
I ran the cleanest getpapers queries I could. Overall, there are very few terms that are out of the ballpark. Some of them have to do with what I consider "formulation" terms, (excipients, adhesives, abrasives, etc..,) but for the most part, these would be useful for any biomedical project, including Covid.
I did most of the work months ago, as a way to see what the literature had in it, and flex my growing GREP skills. But I pulled it out a couple of days ago and decided to do a bunch of find (junk/stop words) and replace them with , and it came out really nice. I wish I could have kept the discarded words separately (turns out scientists use a lot of puffery in their descriptions, just like marketers do!), but I couldn't think of a way of doing that that would have been practical or efficient.
Anyhow, it's still useful even without further disambiguation or descriptions, but adding those would definitely make it more useful — especially, if we could split them up into different dictionaries, for example, having to do with different pathways. But that's different kettle of fish.
EDIT: Also, I ran some random tests by pulling out multi-word terms I'd never heard of, and putting them — in quotes — in EUPMC searches, and all of them had a decent number of hits.
EDIT 2: Plus, I never would otherwise have found so many different ways (synonym terms) to find things I'm actually interested in. For example:
Who knew? 😀🎉
The text was updated successfully, but these errors were encountered: