Skip to content

Commit

Permalink
Updates for docs (#389)
Browse files Browse the repository at this point in the history
  • Loading branch information
caufieldjh committed May 30, 2024
2 parents 8d189d1 + cc8ccf1 commit 3f40826
Show file tree
Hide file tree
Showing 4 changed files with 16 additions and 45 deletions.
23 changes: 6 additions & 17 deletions docs/custom.md
Original file line number Diff line number Diff line change
Expand Up @@ -457,25 +457,14 @@ Note that everything in {curly brackets} is a template of some kind. The Jinja t

See also: the [documentation page on OWL exports](owl_exports.md) and the [linkml-owl documentation](https://linkml.io/linkml-owl/).

## Install a custom schema
## Using a custom schema

If you have installed OntoGPT directly from its GitHub repository, then you may install a custom schema like this:
You may then use the schema like any other. Pass the path to your schema to the `--template/-t` option.

1. Move the schema file to the `src/ontogpt/templates` directory.
2. Run `make` from the root of the repository to generate Pydantic versions of the schema.

If you have installed OntoGPT from `pip`, *or* if you can't use the `make` command, the process is similar, though it will depend on where the package is installed.

1. Use the LinkML `gen-pydantic` tool to generate Pydantic classes. If your schema is named `alfred.yaml`, then run the following:

```bash
gen-pydantic --pydantic_version 2 alfred.yaml > alfred.py
```

2. Move both the .yaml and the .py versions of your schema to the `templates` directory of wherever OntoGPT is installed. In a virtual environment named `temp` that may be something like `/temp/lib/python3.9/site-packages/ontogpt/templates`.

You may then use the schema like any other. For example, if your schema is named `albatross.yaml`, then an extract command is:
For example, if your schema is named `albatross.yaml`, then an extract command is:

```bash
ontogpt extract -t albatross -i input.txt
ontogpt extract -t albatross.yaml -i input.txt
```

Running this (or any other command including your custom schema) will install it for future use with OntoGPT, so in subsquent commands it can be referred to by its name (e.g., `albatross`, without the file extension or a full filepath).
4 changes: 2 additions & 2 deletions docs/functions.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,11 +44,11 @@ In the latter case, all .txt files will be assumed to be input, and the path wil

Use the option `--template` to specify a template to use. This is a required parameter.

Only the name is required, without any filename suffix.
Only the name is required, without any filename suffix, unless you are using a custom schema for the first time. In that case, provide the path to the schema, including the .yaml file extension.

To use the `gocam` template, for example, the parameter will be `--template gocam`

This may be one of the templates included with OntoGPT or a custom template, but in the latter case, the schema, generated Pydantic classes, and any imported schemas should be present in the same location.
Or, for a custom template, the parameter may be `--template custom.yaml`

### target-class

Expand Down
13 changes: 5 additions & 8 deletions docs/index.md
Original file line number Diff line number Diff line change
@@ -1,18 +1,15 @@
# Introduction

_OntoGPT_ is a Python package for extracting structured information from text with large language models (LLMs), _instruction prompts_, and ontology-based grounding. It works well with OpenAI's GPT-3.5 and GPT-4 models as well as a selection of other LLMs. OntoGPT's output can be used for general-purpose natural language tasks (e.g., named entity recognition and relation extraction), summarization, knowledge base and knowledge graph construction, and more.
_OntoGPT_ is a Python package for extracting structured information from text with large language models (LLMs), _instruction prompts_, and ontology-based grounding. It works well with OpenAI's GPT models as well as a selection of other LLMs. OntoGPT's output can be used for general-purpose natural language tasks (e.g., named entity recognition and relation extraction), summarization, knowledge base and knowledge graph construction, and more.

## Methods

Two different strategies for knowledge extraction are currently implemented in OntoGPT:
The primary extraction method currently implemented in OntoGPT is SPIRES:

* SPIRES: *Structured Prompt Interrogation and Recursive Extraction of Semantics*
* SPIRES: _Structured Prompt Interrogation and Recursive Extraction of Semantics_
* A Zero-shot learning (ZSL) approach to extracting nested semantic structures from text
* This approach takes two inputs - 1) LinkML schema 2) free text, and outputs knowledge in a structure conformant with the supplied schema in JSON, YAML, RDF or OWL formats
* Uses GPT-3.5-turbo, GPT-4, or one of a variety of open LLMs on your local machine
* SPINDOCTOR: *Structured Prompt Interpolation of Narrative Descriptions Or Controlled Terms for Ontological Reporting*
* Summarizes gene set descriptions (pseudo gene-set enrichment)
* Uses GPT-3.5-turbo or GPT-4
* Uses OpenAI GPT models through their API, or one of a variety of LLMs on your local machine

## Quick Start

Expand Down Expand Up @@ -68,7 +65,7 @@ NOTE: We do not recommend hosting this webapp publicly without authentication.

## Citation

SPIRES is described further in: Caufield JH, Hegde H, Emonet V, Harris NL, Joachimiak MP, Matentzoglu N, et al. Structured prompt interrogation and recursive extraction of semantics (SPIRES): A method for populating knowledge bases using zero-shot learning. arXiv publication: <http://arxiv.org/abs/2304.02711>
SPIRES is described further in: Caufield JH, Hegde H, Emonet V, Harris NL, Joachimiak MP, Matentzoglu N, et al. Structured Prompt Interrogation and Recursive Extraction of Semantics (SPIRES): a method for populating knowledge bases using zero-shot learning. Bioinformatics. 2024;40. doi:[10.1093/bioinformatics/btae104](http://dx.doi.org/10.1093/bioinformatics/btae104)

## Contributing

Expand Down
21 changes: 3 additions & 18 deletions docs/operation.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ Perhaps the squid crossed the coral reef for a variety of reasons:

OntoGPT is intended to be used for information extraction. The following examples show how to accomplish this.

### Strategy 1: Knowledge extraction using SPIRES
### Knowledge extraction using SPIRES

#### Working Mechanism

Expand Down Expand Up @@ -72,6 +72,8 @@ ontogpt extract -t gocam.GoCamAnnotations -i ~/path/to/abstract.txt

Note: The value accepted by the `-t` / `--template` argument is the base name of one of the LinkML schema / data model which can be found in the [templates](src/ontogpt/templates/) folder.

Or, if you create your own schema (see the page on [custom schemas](custom.md)), you may pass the path to the .yaml file.

#### Output

The output returned from the above command can be optionally redirected into an output file using the `-o` / `--output`.
Expand Down Expand Up @@ -117,20 +119,3 @@ ontogpt list-models
```

When specifying a local model for the first time, it will be downloaded to your local system.

## Strategy 2: Gene Enrichment using SPINDOCTOR

Given a set of genes, OntoGPT can find similarities among them.

Ex.:

```bash
ontogpt enrichment -U tests/input/genesets/sensory-ataxia.yaml
```

The default is to use ontological gene function synopses (via the Alliance API).

* To use narrative/RefSeq summaries, use the `--no-ontological-synopses` flag
* To run without any gene descriptions, use the `--no-annotations` flag

This strategy does not currently support using local models.

0 comments on commit 3f40826

Please sign in to comment.