Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Browse files
Browse the repository at this point in the history
) This PR does two things: - Add a combined generate-extract command, fixes #158 - Adds cell type templates, fixes #159 ## Generate-Extract `ontogpt generate-extract -m gpt-4 -t cell_type "Acinar Cell Of Salivary Gland"` This does two things 1. asks GPT to generate a summary of the cell type 2. parses/extracts knowledge from that cell type This rescuscitates the original HALO idea. We could in principle **directly generate an entire knowledgebase in structured form from the latent GPT KB** Example output: ```yaml extracted_object: cell_type: Acinar cell of a salivary gland parents: - CL:0000066 subtypes: - CL:0000313 - CL:0000319 localizations: - UBERON:0001044 - UBERON:0009842 diseases: - AUTO:Sj%C3%B6gren%27s%20syndrome - MONDO:0021357 named_entities: - id: CL:0000066 label: Epithelial cell - id: CL:0000313 label: Serous cells - id: CL:0000319 label: Mucous cells - id: UBERON:0001044 label: Salivary gland - id: UBERON:0009842 label: Acinus - id: AUTO:Sj%C3%B6gren%27s%20syndrome label: Sjögren's syndrome - id: MONDO:0021357 label: Salivary gland tumors ``` ## Cell Type Templates This PR also demonstrates using subclasses for more refined subtypes Compare the two: 1. `ontogpt generate-extract -m gpt-4 -t cell_type "L2/3 Intratelencephalic Projecting Glutamatergic Neuron Of The Primary Motor Cortex"` 2. 1ontogpt generate-extract -m gpt-4 -t cell_type.InterneuronDocument "L2/3 Intratelencephalic Projecting Glutamatergic Neuron Of The Primary Motor Cortex"` The first uses the generic base class. the second uses a subclass designed for interneurons, which has an extra slot for projection fields Example output: ```yaml extracted_object: cell_type: L2/3 Intratelencephalic Projecting Glutamatergic Neuron of the Primary Motor Cortex range: Not mentioned parents: - AUTO:excitatory%20neuron subtypes: - AUTO:Not%20mentioned localizations: - UBERON:0000956 - UBERON:0001384 genes: - AUTO:Not%20mentioned diseases: - MONDO:0005180 - MONDO:0020128 projects_to_or_from: - UBERON:0001893 named_entities: - id: UBERON:0001893 label: telencephalon - id: AUTO:excitatory%20neuron label: excitatory neuron - id: AUTO:Not%20mentioned label: Not mentioned - id: UBERON:0000956 label: cerebral cortex - id: UBERON:0001384 label: primary motor cortex - id: MONDO:0005180 label: Parkinson's disease - id: MONDO:0020128 label: motor neuron disease ```
- Loading branch information