Skip to content

Commit

Permalink
Prep for v1.0.0 release (#418)
Browse files Browse the repository at this point in the history
Update documentation and do general linting.
Also add a loading animation to the webapp.
  • Loading branch information
caufieldjh committed Jul 30, 2024
2 parents 60e1c44 + d48b8ff commit b3f347d
Show file tree
Hide file tree
Showing 13 changed files with 176 additions and 144 deletions.
60 changes: 51 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,48 @@ web-ontogpt

NOTE: We do not recommend hosting this webapp publicly without authentication.

## Model APIs

OntoGPT uses the `litellm` package (<https://litellm.vercel.app/>) to interface with LLMs.

This means most APIs are supported, including OpenAI, Azure, Anthropic, Mistral, Replicate, and beyond.

The model name to use may be found from the command `ontogpt list-models` - use the name in the first column with the `--model` option.

In most cases, this will require setting the API key for a particular service as above:

```bash
runoak set-apikey -e anthropic-key <your anthropic api key>
```

Some endpoints, such as OpenAI models through Azure, require setting additional details. These may be set similarly:

```bash
runoak set-apikey -e azure-key <your azure api key>
runoak set-apikey -e azure-base <your azure endpoint url>
runoak set-apikey -e azure-version <your azure api version, e.g. "2023-05-15">
```

These details may also be set as environment variables as follows:

```bash
export AZURE_API_KEY="my-azure-api-key"
export AZURE_API_BASE="https://example-endpoint.openai.azure.com"
export AZURE_API_VERSION="2023-05-15"
```

## Open Models

Open LLMs may be retrieved and run through the `ollama` package (<https://ollama.com/>).

You will need to install `ollama` (see the [GitHub repo](https://github.com/ollama/ollama)), and you may need to start it as a service with a command like `ollama serve` or `sudo systemctl start ollama`.

Then retrieve a model with `ollama pull <modelname>`, e.g., `ollama pull llama3`.

The model may then be used in OntoGPT by prefixing its name with `ollama/`, e.g., `ollama/llama3`, along with the `--model` option.

Some ollama models may not be listed in `ontogpt list-models` but the full list of downloaded LLMs can be seen with `ollama list` command.

## Evaluations

OntoGPT's functions have been evaluated on test data. Please see the full documentation for details on these evaluations and how to reproduce them.
Expand All @@ -71,15 +113,15 @@ OntoGPT's functions have been evaluated on test data. Please see the full docume
## Tutorials and Presentations
- Presentation: "Staying grounded: assembling structured biological knowledge with help from large language models" - presented by Harry Caufield as part of the AgBioData Consortium webinar series (September 2023)
- [Slides](https://docs.google.com/presentation/d/1rMQVWaMju-ucYFif5nx4Xv3bNX2SVI_w89iBIT1bkV4/edit?usp=sharing)
- [Video](https://www.youtube.com/watch?v=z38lI6WyBsY)
- Presentation: "Transforming unstructured biomedical texts with large language models" - presented by Harry Caufield as part of the BOSC track at ISMB/ECCB 2023 (July 2023)
- [Slides](https://docs.google.com/presentation/d/1LsOTKi-rXYczL9vUTHB1NDkaEqdA9u3ZFC5ANa0x1VU/edit?usp=sharing)
- [Video](https://www.youtube.com/watch?v=a34Yjz5xPp4)
- Presentation: "OntoGPT: A framework for working with ontologies and large language models" - talk by Chris Mungall at Joint Food Ontology Workgroup (May 2023)
- [Slides](https://docs.google.com/presentation/d/1CosJJe8SqwyALyx85GWkw9eOT43B4HwDlAY2CmkmJgU/edit)
- [Video](https://www.youtube.com/watch?v=rt3wobA9hEs&t=1955s)
* Presentation: "Staying grounded: assembling structured biological knowledge with help from large language models" - presented by Harry Caufield as part of the AgBioData Consortium webinar series (September 2023)
* [Slides](https://docs.google.com/presentation/d/1rMQVWaMju-ucYFif5nx4Xv3bNX2SVI_w89iBIT1bkV4/edit?usp=sharing)
* [Video](https://www.youtube.com/watch?v=z38lI6WyBsY)
* Presentation: "Transforming unstructured biomedical texts with large language models" - presented by Harry Caufield as part of the BOSC track at ISMB/ECCB 2023 (July 2023)
* [Slides](https://docs.google.com/presentation/d/1LsOTKi-rXYczL9vUTHB1NDkaEqdA9u3ZFC5ANa0x1VU/edit?usp=sharing)
* [Video](https://www.youtube.com/watch?v=a34Yjz5xPp4)
* Presentation: "OntoGPT: A framework for working with ontologies and large language models" - talk by Chris Mungall at Joint Food Ontology Workgroup (May 2023)
* [Slides](https://docs.google.com/presentation/d/1CosJJe8SqwyALyx85GWkw9eOT43B4HwDlAY2CmkmJgU/edit)
* [Video](https://www.youtube.com/watch?v=rt3wobA9hEs&t=1955s)
## Citation
Expand Down
2 changes: 1 addition & 1 deletion docs/index.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Introduction

_OntoGPT_ is a Python package for extracting structured information from text with large language models (LLMs), _instruction prompts_, and ontology-based grounding. It works well with OpenAI's GPT models as well as a selection of other LLMs. OntoGPT's output can be used for general-purpose natural language tasks (e.g., named entity recognition and relation extraction), summarization, knowledge base and knowledge graph construction, and more.
_OntoGPT_ is a Python package for extracting structured information from text with large language models (LLMs), _instruction prompts_, and ontology-based grounding. It works well with OpenAI's GPT models as well as a selection of other LLMs. OntoGPT's output can be used for general-purpose natural language tasks (e.g., named entity recognition and relation extraction), summarization, knowledge base and knowledge graph construction, and more.

## Methods

Expand Down
22 changes: 13 additions & 9 deletions docs/operation.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ OntoGPT is intended to be used for information extraction. The following example

#### Working Mechanism

1. You provide an arbitrary data model, describing the structure you want to extract text into. This can be nested (but see limitations below). The predefined [templates](src/ontogpt/templates/) may be used.
1. You provide an arbitrary data model, describing the structure you want to extract text into. This can be nested (but see limitations below). The predefined templates may be used.
2. Provide your preferred annotations for grounding `NamedEntity` fields
3. OntoGPT will:
* Generate a prompt
Expand All @@ -46,7 +46,7 @@ OntoGPT is intended to be used for information extraction. The following example

#### Input

Consider some text from one of the input files being used in the OntoGPT test suite. You can find the text file [here](tests/input/cases/gocam-betacat.txt). You can download the raw file from the GitHub link to that input text file, or copy its contents over into another file, say, `abstract.txt`. An excerpt:
Consider some text from one of the input files being used in the OntoGPT test suite. You can find the text file [here](https://github.com/monarch-initiative/ontogpt/blob/main/tests/input/cases/gocam-betacat.txt). You can download the raw file from the GitHub link to that input text file, or copy its contents over into another file, say, `abstract.txt`. An excerpt:

> The cGAS/STING-mediated DNA-sensing signaling pathway is crucial
for interferon (IFN) production and host antiviral
Expand All @@ -62,15 +62,17 @@ Consider some text from one of the input files being used in the OntoGPT test su
> ...
> ...
We can extract knowledge from the above text this into the [GO pathway datamodel](src/ontogpt/templates/gocam.yaml) by running the following command:
We can extract knowledge from the above text this into the [GO pathway datamodel](https://github.com/monarch-initiative/ontogpt/blob/main/src/ontogpt/templates/gocam.yaml) by running the following command:

#### Command

```bash
ontogpt extract -t gocam.GoCamAnnotations -i ~/path/to/abstract.txt
```

Note: The value accepted by the `-t` / `--template` argument is the base name of one of the LinkML schema / data model which can be found in the [templates](src/ontogpt/templates/) folder.
Note: The value accepted by the `-t` / `--template` argument is the base name of one of the LinkML schema / data model available to OntoGPT.

Use the command `ontogpt list-templates` to see all templates. Use the name in the first column with the `--template` option.

Or, if you create your own schema (see the page on [custom schemas](custom.md)), you may pass the path to the .yaml file.

Expand Down Expand Up @@ -104,18 +106,20 @@ gene_functions:

#### Local Models

To use a local model, specify it with the `-m` or `--model` option.
To use a local model, download it through `ollama` (see the setup page for more details: <https://monarch-initiative.github.io/ontogpt/setup/>)

Then specify it with the `-m` or `--model` option.

Example:

```bash
ontogpt extract -t drug -i ~/path/to/abstract.txt -m nous-hermes-13b
ontogpt extract -t drug -i ~/path/to/abstract.txt -m ollama/llama3
```

See the list of all available models with this command:
See the list of all downloaded models with this command:

```bash
ontogpt list-models
ollama list
```

When specifying a local model for the first time, it will be downloaded to your local system.
Note that models can and will vary in performance and larger models will not always perform more accurately or more efficiently.
2 changes: 2 additions & 0 deletions docs/setup.md
Original file line number Diff line number Diff line change
Expand Up @@ -97,3 +97,5 @@ This may require a command like `ollama serve` or `sudo systemctl start ollama`.
Then retrieve a model with `ollama pull <modelname>`, e.g., `ollama pull llama3`.

The model may then be used in OntoGPT by prefixing its name with `ollama/`, e.g., `ollama/llama3`, along with the `--model` option.

See the list of all downloaded LLMs with the `ollama list` command.
4 changes: 4 additions & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ nav:
- Troubleshooting: troubleshooting.md
- SPIRES Templates: # Note these are autogenerated
- Core Upper Level Schema: core/index.md
- Alzheimer's Disease: alzrd/index.md
- Biochemical Reactions: reaction/index.md
- Biological Processes: biological_process/index.md
- Biotic Interactions: biotic_interaction/index.md
Expand All @@ -49,6 +50,7 @@ nav:
- Drugs and Mechanisms: drug/index.md
- EMAPA (Mouse Developmental Anatomy Ontology): emapa_simple/index.md
- Environmental Samples: environmental_sample/index.md
- Error Analysis: error_analysis/index.md
- Figures: figure/index.md
- GO Terms: go_terms/index.md
- GO Terms (Entities Only): go_simple/index.md
Expand All @@ -64,7 +66,9 @@ nav:
- NMDC Schema Data: nmdc_schema_data/index.md
- Ontology Classes: ontology_class/index.md
- Ontology Issues: ontology_issue/index.md
- Ontology Usage: onto_usage/index.md
- Recipes: recipe/index.md
- STORMS Checklist: storms/index.md
- Traits of a Taxon: traits/index.md
- Treatments: treatment/index.md
- HALO Schema: halo/index.md
Expand Down
Loading

0 comments on commit b3f347d

Please sign in to comment.