From 6d9ceae6fcd9bd8d1a9aa81eec6a151f309eee9f Mon Sep 17 00:00:00 2001 From: caufieldjh Date: Thu, 30 May 2024 13:52:49 -0400 Subject: [PATCH 1/4] Update pages on custom schemas and operation --- docs/custom.md | 23 ++++++----------------- docs/operation.md | 21 +++------------------ 2 files changed, 9 insertions(+), 35 deletions(-) diff --git a/docs/custom.md b/docs/custom.md index 648c5b6f7..d9a151714 100644 --- a/docs/custom.md +++ b/docs/custom.md @@ -457,25 +457,14 @@ Note that everything in {curly brackets} is a template of some kind. The Jinja t See also: the [documentation page on OWL exports](owl_exports.md) and the [linkml-owl documentation](https://linkml.io/linkml-owl/). -## Install a custom schema +## Using a custom schema -If you have installed OntoGPT directly from its GitHub repository, then you may install a custom schema like this: +You may then use the schema like any other. Pass the path to your schema to the `--template/-t` option. -1. Move the schema file to the `src/ontogpt/templates` directory. -2. Run `make` from the root of the repository to generate Pydantic versions of the schema. - -If you have installed OntoGPT from `pip`, *or* if you can't use the `make` command, the process is similar, though it will depend on where the package is installed. - -1. Use the LinkML `gen-pydantic` tool to generate Pydantic classes. If your schema is named `alfred.yaml`, then run the following: - - ```bash - gen-pydantic --pydantic_version 2 alfred.yaml > alfred.py - ``` - -2. Move both the .yaml and the .py versions of your schema to the `templates` directory of wherever OntoGPT is installed. In a virtual environment named `temp` that may be something like `/temp/lib/python3.9/site-packages/ontogpt/templates`. - -You may then use the schema like any other. For example, if your schema is named `albatross.yaml`, then an extract command is: +For example, if your schema is named `albatross.yaml`, then an extract command is: ```bash -ontogpt extract -t albatross -i input.txt +ontogpt extract -t albatross.yaml -i input.txt ``` + +Running this (or any other command including your custom schema) will install it for future use with OntoGPT, so in subsquent commands it can be referred to by its name (e.g., `albatross`, without the file extension or a full filepath). diff --git a/docs/operation.md b/docs/operation.md index 76384713c..ce4010d22 100644 --- a/docs/operation.md +++ b/docs/operation.md @@ -32,7 +32,7 @@ Perhaps the squid crossed the coral reef for a variety of reasons: OntoGPT is intended to be used for information extraction. The following examples show how to accomplish this. -### Strategy 1: Knowledge extraction using SPIRES +### Knowledge extraction using SPIRES #### Working Mechanism @@ -72,6 +72,8 @@ ontogpt extract -t gocam.GoCamAnnotations -i ~/path/to/abstract.txt Note: The value accepted by the `-t` / `--template` argument is the base name of one of the LinkML schema / data model which can be found in the [templates](src/ontogpt/templates/) folder. +Or, if you create your own schema (see the page on [custom schemas](custom.md)), you may pass the path to the .yaml file. + #### Output The output returned from the above command can be optionally redirected into an output file using the `-o` / `--output`. @@ -117,20 +119,3 @@ ontogpt list-models ``` When specifying a local model for the first time, it will be downloaded to your local system. - -## Strategy 2: Gene Enrichment using SPINDOCTOR - -Given a set of genes, OntoGPT can find similarities among them. - -Ex.: - -```bash -ontogpt enrichment -U tests/input/genesets/sensory-ataxia.yaml -``` - -The default is to use ontological gene function synopses (via the Alliance API). - -* To use narrative/RefSeq summaries, use the `--no-ontological-synopses` flag -* To run without any gene descriptions, use the `--no-annotations` flag - -This strategy does not currently support using local models. From ca2cb05c8d5ec87e4775be60f3809e037cea64af Mon Sep 17 00:00:00 2001 From: caufieldjh Date: Thu, 30 May 2024 13:55:27 -0400 Subject: [PATCH 2/4] Update docs index --- docs/index.md | 11 ++++------- 1 file changed, 4 insertions(+), 7 deletions(-) diff --git a/docs/index.md b/docs/index.md index 958c10e5b..b0e0fcbeb 100644 --- a/docs/index.md +++ b/docs/index.md @@ -1,18 +1,15 @@ # Introduction -_OntoGPT_ is a Python package for extracting structured information from text with large language models (LLMs), _instruction prompts_, and ontology-based grounding. It works well with OpenAI's GPT-3.5 and GPT-4 models as well as a selection of other LLMs. OntoGPT's output can be used for general-purpose natural language tasks (e.g., named entity recognition and relation extraction), summarization, knowledge base and knowledge graph construction, and more. +_OntoGPT_ is a Python package for extracting structured information from text with large language models (LLMs), _instruction prompts_, and ontology-based grounding. It works well with OpenAI's GPT models as well as a selection of other LLMs. OntoGPT's output can be used for general-purpose natural language tasks (e.g., named entity recognition and relation extraction), summarization, knowledge base and knowledge graph construction, and more. ## Methods -Two different strategies for knowledge extraction are currently implemented in OntoGPT: +The primary extraction method currently implemented in OntoGPT is SPIRES: -* SPIRES: *Structured Prompt Interrogation and Recursive Extraction of Semantics* +* SPIRES: _Structured Prompt Interrogation and Recursive Extraction of Semantics_ * A Zero-shot learning (ZSL) approach to extracting nested semantic structures from text * This approach takes two inputs - 1) LinkML schema 2) free text, and outputs knowledge in a structure conformant with the supplied schema in JSON, YAML, RDF or OWL formats - * Uses GPT-3.5-turbo, GPT-4, or one of a variety of open LLMs on your local machine -* SPINDOCTOR: *Structured Prompt Interpolation of Narrative Descriptions Or Controlled Terms for Ontological Reporting* - * Summarizes gene set descriptions (pseudo gene-set enrichment) - * Uses GPT-3.5-turbo or GPT-4 + * Uses OpenAI GPT models through their API, or one of a variety of LLMs on your local machine ## Quick Start From f4784eeebfb41f57acf38f7806cc952add053da4 Mon Sep 17 00:00:00 2001 From: caufieldjh Date: Thu, 30 May 2024 13:56:46 -0400 Subject: [PATCH 3/4] Update citation in docs --- docs/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/index.md b/docs/index.md index b0e0fcbeb..14ead8c9e 100644 --- a/docs/index.md +++ b/docs/index.md @@ -65,7 +65,7 @@ NOTE: We do not recommend hosting this webapp publicly without authentication. ## Citation -SPIRES is described further in: Caufield JH, Hegde H, Emonet V, Harris NL, Joachimiak MP, Matentzoglu N, et al. Structured prompt interrogation and recursive extraction of semantics (SPIRES): A method for populating knowledge bases using zero-shot learning. arXiv publication: +SPIRES is described further in: Caufield JH, Hegde H, Emonet V, Harris NL, Joachimiak MP, Matentzoglu N, et al. Structured Prompt Interrogation and Recursive Extraction of Semantics (SPIRES): a method for populating knowledge bases using zero-shot learning. Bioinformatics. 2024;40. doi:[10.1093/bioinformatics/btae104](http://dx.doi.org/10.1093/bioinformatics/btae104) ## Contributing From cc8ccf1586bfe452f04b5162cb5663a5baef3676 Mon Sep 17 00:00:00 2001 From: caufieldjh Date: Thu, 30 May 2024 14:00:12 -0400 Subject: [PATCH 4/4] Update template docs re custom templates --- docs/functions.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/functions.md b/docs/functions.md index 3661e6a45..79e700863 100644 --- a/docs/functions.md +++ b/docs/functions.md @@ -44,11 +44,11 @@ In the latter case, all .txt files will be assumed to be input, and the path wil Use the option `--template` to specify a template to use. This is a required parameter. -Only the name is required, without any filename suffix. +Only the name is required, without any filename suffix, unless you are using a custom schema for the first time. In that case, provide the path to the schema, including the .yaml file extension. To use the `gocam` template, for example, the parameter will be `--template gocam` -This may be one of the templates included with OntoGPT or a custom template, but in the latter case, the schema, generated Pydantic classes, and any imported schemas should be present in the same location. +Or, for a custom template, the parameter may be `--template custom.yaml` ### target-class