Releases: monarch-initiative/ontogpt
v0.3.3
This version includes a bugfix, an updated OpenAI model list, and more clearly delineated documents in YAML output. See more details below!
What's Changed
- Replace deprecated OpenAI models by @caufieldjh in #210
- Tidy up the CLI's
write_extraction
function by @caufieldjh in #213 - Fix for #215 by @caufieldjh in #216
Full Changelog: v0.3.2...v0.3.3
v0.3.2
This release primarily concerns bugfixes. Thanks to all users who have provided feedback!
What's Changed
- Add option to show prompt by @caufieldjh in #183
- Quick fix for type mismatch in embed command by @caufieldjh in #185
- Updates for pydantic 2 compatibility by @caufieldjh in #189
- Repairs for incorrect model specification by @caufieldjh in #192
- Add SPIRES logo by @caufieldjh in #196
- Improve intro based on more up-to-date main/docs/index.md by @nlharris in #194
- Fixed bug where named arguments did not match up in recurse function. by @cmungall in #198
- adding more prompts and example text to improve results by @diatomsRcool in #116
- Address errors in using gene requests cache by @caufieldjh in #203
- Make kgx tsv by @hrshdhgd in #149
- Fix #204 and the remainder of the HPOA evaluation by @caufieldjh in #207
- Run mypy through tox and address type check errors by @caufieldjh in #202
New Contributors
Full Changelog: v0.3.1...v0.3.2
v0.3.1
Highlights
Access to open models through the llm
package
llm provides easy access to LLMs from OpenAI and beyond, including the GPT4All set of open models.
You may now specify one of these models by using the -m
or --model
option with most commands.
When calling a model for the first time, llm
will download a local copy.
Example:
ontogpt extract -t mendelian_disease.MendelianDisease -i tests/input/cases/mendelian-disease-cmt2e.txt -m nous-hermes-13b
Or extract from PubMed abstracts:
ontogpt pubmed-annotate -t drug "propranolol mode of action" --model nous-hermes-13b --limit 5
Or generate clinical case report text:
ontogpt clinical-notes -d "patient with chronic muscle pain and hypoplastic toenails" --sections "Past Medical History" -m nous-hermes-13b
See the full list of model options with
ontogpt list-models
Updated dependency requirements
OntoGPT should now be compatible with Pydantic versions less than, equal to, or greater than 2. Many of these changes happened upstream within the broader LinkML ecosystem.
What's Changed
- Implement llm api by @caufieldjh in #167
- Update documentation by @caufieldjh in #170
- Fix #174 - add missing CLI parameters by @caufieldjh in #175
- A fix for encountering missing PubmedData fields by @caufieldjh in #178
- Dependency updates for llm (and pydantic compatibility, in particular) by @caufieldjh in #182
Full Changelog: v0.3.0...v0.3.1
v0.3.0
Highlights
Generate-and-Extract Command
This release adds a new command generate-extract
that composes two operations.
- generate a natural language description
- parse the NL description using SPIRES
Cell Type Use Case
(This use case based on a conversation with @dosumis)
For example, given a cell type such as Acinar Cell Of Salivary Gland
, generate a description using GPT describing many aspects of the cell type, from it's marker genes through to its function and diseases it is implicated in.
After that use the cell-type schema (https://w3id.org/ontogpt/cell_type) to extract this into structured form. As an optional next step use linkml-owl to generate OWL TBox axioms
Iterative generate-extract
The command can be executed in iterative mode - this will traverse the extracted subtypes with each iteration, gradually building up an ontology that is entirely generated from the "latent knowledge" in the LLM
Here is a screenshot of an ontology generated entirely using OntoGPT by traversing from "Interneuron" downwards:
There are many oddities about it, currently each iteration is independent so it has no way of knowing if it is has already made a concept, but an interesting proof of principle. The ugly pct-encoded labels indicate cases where it couldn't match to an existing concept in CL or other ontology, and may represent KB gaps to be filled
More thoughts here: cell type summaries
What's Changed
- Playing around: adding a phenotype extractor by @matentzn in #14
- add unit test to makefile by @cmungall in #16
- Linted and minor flake8 edits by @hrshdhgd in #15
- Add linter to workflow by @hrshdhgd in #17
- Improve dependencies, add a web optional by @vemonet in #21
- add recipe for test by @sierra-moxon in #23
- added pad krapow recipe by @justaddcoffee in #25
- Add recipe URL by @pkalita-lbl in #24
- Add Walforf Salad URL by @caufieldjh in #26
- Add rajma pulao to recipe-urls.csv by @turbomam in #27
- Adding gene set enrichment by @cmungall in #30
- enrichment by @cmungall in #31
- README updates; add project.Makefile by @caufieldjh in #32
- allow use of different models, entailing different API endpoints. extending enrichment comparison. by @cmungall in #34
- Add CITATION and version updater by @caufieldjh in #35
- Ingest and extract things from literature about inflammatory bowel disease by @justaddcoffee in #36
- eval enrich by @cmungall in #37
- Create dental-restoration-material-composite-polymer-1.txt by @wdduncan in #53
- Create dental-restoration-material-composite-resin-1.txt by @wdduncan in #52
- Create dental-restoration-material-ceramic-composite-1.txt by @wdduncan in #51
- Create dental-restoration-material-ceramic-composite-resin-1.txt by @wdduncan in #50
- Create dental-restoration-material-ceramic-composite-polymer-2.txt by @wdduncan in #49
- Create dental-restoration-material-ceramic-composite-polymer-1.txt by @wdduncan in #48
- Create dental-restoration-material-ceramic-composite-polymer-resin-2.txt by @wdduncan in #47
- Create dental-restoration-material-ceramic-composite-polymer-resin-1.txt by @wdduncan in #46
- Create dental-restoration-material-composite-2.txt by @wdduncan in #45
- Create dental-restoration-material-composite-1.txt by @wdduncan in #44
- Create dental-restoration-material-polymer-1.txt by @wdduncan in #42
- Create dental-restoration-material-resin-2.txt by @wdduncan in #41
- Create dental-restoration-material-ceramic-2.txt by @wdduncan in #39
- Create dental-restoration-material-ceramic-1.txt by @wdduncan in #38
- Create dental-restoration-material-resin-1.txt by @wdduncan in #40
- Create dental-restoration-material-polymer-2.txt by @wdduncan in #43
- similarity by @cmungall in #57
- Add option to provide path to input file by @caufieldjh in #56
- Bicluster enrichment by @realmarcin in #62
- Added command and code for computing euclidian distances between embeddings by @justaddcoffee in #58
- Flake8 fixes + lint by @hrshdhgd in #63
- enrichment changes by @cmungall in #65
- Missed parenthesis for random.SystemRamdom() by @hrshdhgd in #67
- Change citation updater in Makefile to get_version by @caufieldjh in #68
- Raise FileNotFoundError if filepath for extract is missing by @caufieldjh in #72
- Makefile uses all templates by @caufieldjh in #69
- interactive-mode by @cmungall in #71
- Added command to generate mock clinical notes by @justaddcoffee in #74
- Bump version of oaklib by @cmungall in #73
- msigdb hallmark gene sets by @realmarcin in #78
- use prompts for enrichment by @cmungall in #80
- fixing gene sets and updating analysis by @cmungall in #81
- Adding schema for ontology issues in github. refactor enrichment by @cmungall in #83
- p-value templates with edited end markers to run multiple independent… by @realmarcin in #82
- Add diagnostic_procedure template by @caufieldjh in #29
- Fix for web-ontogpt not working on new install by @caufieldjh in #85
- geneweaver format by @cmungall in #89
- re-ran notebook by @cmungall in #90
- re-ran notebooks for enrichGPT by @cmungall in #95
- Update documentation by @caufieldjh in #92
- Autogenerate docs by @caufieldjh in #98
- Fix for doc generation by @caufieldjh in #100
- Added streamlit app for spindoctor by @cmungall in #101
- Study class as tree root in environment_sample template by @sujaypatil96 in #104
- update sections in README by @sujaypatil96 in #105
- Fixed a bug where the 'skip_annotators' option was being ignored by @daikiad in #108
- first trait commits by @cmungall in #109
- first draft of biotic interaction template by @diatomsRcool in #107
- One more fix for biotic interaction template by @caufieldjh in #113
- Adding a GPT-based reasoner, for evaluation purposes. by @cmungall in #112
- more prompt language and adding ENVTHES ontology by @realmarcin in #118
- Add general framework for specifying models by name and source by @caufieldjh in #99
- Adding a MappingEngine by @cmungall in #121
- removing importlib dependency by @cmungall in #122
- very small typo fix by @PR0CK0 in https://github.c...
v0.2.11
Highlights
Generate-Extract
ontogpt generate-extract -m gpt-4 -t cell_type "Acinar Cell Of Salivary Gland"
This does two things
- asks GPT to generate a summary of the cell type
- parses/extracts knowledge from that cell type
This rescuscitates the original HALO idea. We could in principle directly generate an entire knowledgebase in structured form from the latent GPT KB
Example output:
extracted_object:
cell_type: Acinar cell of a salivary gland
parents:
- CL:0000066
subtypes:
- CL:0000313
- CL:0000319
localizations:
- UBERON:0001044
- UBERON:0009842
diseases:
- AUTO:Sj%C3%B6gren%27s%20syndrome
- MONDO:0021357
named_entities:
- id: CL:0000066
label: Epithelial cell
- id: CL:0000313
label: Serous cells
- id: CL:0000319
label: Mucous cells
- id: UBERON:0001044
label: Salivary gland
- id: UBERON:0009842
label: Acinus
- id: AUTO:Sj%C3%B6gren%27s%20syndrome
label: Sjögren's syndrome
- id: MONDO:0021357
label: Salivary gland tumors
Cell Type Templates
This PR also demonstrates using subclasses for more refined subtypes
Compare the two:
ontogpt generate-extract -m gpt-4 -t cell_type "L2/3 Intratelencephalic Projecting Glutamatergic Neuron Of The Primary Motor Cortex"
- 1ontogpt generate-extract -m gpt-4 -t cell_type.InterneuronDocument "L2/3 Intratelencephalic Projecting Glutamatergic Neuron Of The Primary Motor Cortex"`
The first uses the generic base class. the second uses a subclass designed for interneurons, which has an extra slot for projection fields
Example output:
extracted_object:
cell_type: L2/3 Intratelencephalic Projecting Glutamatergic Neuron of the Primary
Motor Cortex
range: Not mentioned
parents:
- AUTO:excitatory%20neuron
subtypes:
- AUTO:Not%20mentioned
localizations:
- UBERON:0000956
- UBERON:0001384
genes:
- AUTO:Not%20mentioned
diseases:
- MONDO:0005180
- MONDO:0020128
projects_to_or_from:
- UBERON:0001893
named_entities:
- id: UBERON:0001893
label: telencephalon
- id: AUTO:excitatory%20neuron
label: excitatory neuron
- id: AUTO:Not%20mentioned
label: Not mentioned
- id: UBERON:0000956
label: cerebral cortex
- id: UBERON:0001384
label: primary motor cortex
- id: MONDO:0005180
label: Parkinson's disease
- id: MONDO:0020128
label: motor neuron disease
What's Changed
Full Changelog: v0.2.10...v0.2.11
v0.2.10
What's Changed
- very small typo fix by @PR0CK0 in #124
- Init environmental metadata template by @caufieldjh in #117
- use latest rueaml. Avoids problems like this: monarch-initiative/talisman-paper#4 by @cmungall in #126
- Add a command 'pubmed-annotate' to retrieve PMIDs for a search term, then apply a template to all of them to extract info by @justaddcoffee in #127
- relaxing pinning by @cmungall in #129
- reasoner gpt changes by @cmungall in #128
- Retrieve remote models for local use and pass extract prompt to them by @caufieldjh in #123
- First pass at PhenoEngine by @cmungall in #130
- New PubMed eutil functions by @caufieldjh in #131
- Cleanup and documentation updates by @caufieldjh in #115
- Start of PR for IBD literature project by @justaddcoffee in #120
- Fixes for #139 by @caufieldjh in #142
- Add interface for HuggingFace Hub by @caufieldjh in #145
- Dependency updates by @caufieldjh in #151
New Contributors
Full Changelog: v0.2.9...v0.2.10
v0.2.9
v0.2.8
What's Changed
- first draft of biotic interaction template by @diatomsRcool in #107
- One more fix for biotic interaction template by @caufieldjh in #113
- Adding a GPT-based reasoner, for evaluation purposes. by @cmungall in #112
- more prompt language and adding ENVTHES ontology by @realmarcin in #118
- Add general framework for specifying models by name and source by @caufieldjh in #99
- Adding a MappingEngine by @cmungall in #121
New Contributors
- @diatomsRcool made their first contribution in #107
Full Changelog: v0.2.7...v0.2.8
v0.2.7
v0.2.6
What's Changed
- Added streamlit app for spindoctor by @cmungall in #101
- Study class as tree root in environment_sample template by @sujaypatil96 in #104
- update sections in README by @sujaypatil96 in #105
New Contributors
- @sujaypatil96 made their first contribution in #104
Full Changelog: v0.2.5...v0.2.6