-
Notifications
You must be signed in to change notification settings - Fork 4
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
14 changed files
with
128 additions
and
29 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,8 +1,27 @@ | ||
|
||
# Accessing GEO data through PEPhub | ||
|
||
Moreover, users can download all project as `tar file` from the GEO namespace using the link available on the geo namespace page. PEPhub doesn't store actual files in the database. Because of this, if you want to download files, there are two options: | ||
The Gene Expression Omnibus is a major source or biological sample data and metadata. However, accessing the metadata from GEO is challenging. Now, PEPhub provides an API-oriented access to processed tabular metadata from GEO. | ||
|
||
- Use links to the files that are stored in the project sample table. | ||
- Use geofetch on a local machine to download these files. | ||
Example: `geofetch -i GSE95654 --processed`, where `--processed` indicates that you want to download processed data, not SRA. More information about PEP can be found on the official website [GEOfetch](https://geofetch.databio.org/en/latest/). | ||
|
||
|
||
## Finding GEO data on PEPhub | ||
|
||
Lots of options to find GEO metadata on PEPhub: | ||
|
||
1. You can browse or search GEO repositories from the [GEO namespace](https://pephub.databio.org/geo), | ||
2. You can use the main PEPhub search interface. | ||
3. You can also just use the URL directly, of the form: `https://pephub.databio.org/geo/{gse_accession}` (with `gse` lowercase). For example: <https://pephub.databio.org/geo/gse211892> | ||
|
||
## Always up-to-date | ||
|
||
PEPhub has a weekly update that keeps the PEPhub's GEO namespace in sync. So, you can be sure you're getting the latest metadata from PEPhub. You can think of PEPhub as a convenient mirror to GEO metadata. We are using [geofetch](../../geofetch/README.md) to download any updated files, which processes the data into a more compact PEP sample table, which we then store in PEPhub. | ||
|
||
## Download all processed data from GEO | ||
|
||
If you want to do a metadata analysis project that uses *all* the metadata from GEO, we also provide a tar archive. Just find the *Download* link on the [GEO namespace page](https://pephub.databio.org/geo). This will provide processed PEPs of all GEO projects. | ||
|
||
If you are looking for the *raw* GEO metadata (not already processed into a PEP), then PEPhub can't really help; we process the data into PEP and discard the raw files, which are large. For most use cases, the processed PEP is a more convenient form. If you really need the raw SOFT files, there are two options: | ||
|
||
- Use links to the files that are stored in the project sample table to download the data directly. | ||
- Use geofetch yourself on a local machine to download these files. Example: `geofetch -i GSE95654 --processed`, where `--processed` indicates that you want to download processed data, not SRA. More information about PEP can be found on the [geofetch](../../geofetch/README.md). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
# PEPhub's semantic search | ||
|
||
PEPhub's main search box (accessible from the [home page](https://pephub.databio.org/)) provides a powerful semantic search. | ||
|
||
When a user provides a natural-language search query, PEPhub transforms the query using the same-sentence transformer in real time, then queries the Qdrant API to retrieve the most semantically similar PEP vectors. Qdrant identifies similar PEPs by calculating nearest neighbors in vector space. PEPhub then returns the results to the client with their associated description and registry path. PEPhub’s search engine uses a semantic approach, which provides several advantages: first, the system returns results with similar meaning whether or not they include the terms of the original query. Second, it is tolerant of misspellings and is not limited to any ontology or taxonomy. Finally, because each PEP is represented as a vector, we can use high-speed nearest-neighbor algorithms to identify relevant PEPs, making the search very fast. This method scales to millions of PEPs, and the speed is limited only by network speeds. Users may also tune results with limits, offsets, and relevance score cutoffs. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
# How to validate sample metadata | ||
|
||
PEPhub validates sample metadata with [eido](../../eido/README.md). Schemas can be added and edited on PEPhub directly. | ||
|
||
Schemas are particularly useful before running pipelines, as validation provides essential information about PEP compatibility with specific pipelines and highlights any errors in the PEP structure. | ||
|
||
There are two ways to use the interfaced to validate PEPs: From the main PEP interface, or from the universal validator. | ||
|
||
## Validating a PEP from the main PEP interface | ||
|
||
If you're editing a PEP, it's convenient to be able to validate it from the same interface. First, assign a schema to the PEP, and then validation will happen automatically, whenever you save the project. | ||
|
||
### Assign a schema to a PEP | ||
|
||
From the main table view, use the *Edit* menu to access the properties for a PEP: | ||
|
||
![alt text](../img/menu-edit.png) | ||
|
||
In this interface, you can select a schema for this PEP. | ||
|
||
### Validating | ||
|
||
Once a schema is assigned you'll see the validation results: | ||
|
||
![alt text](../img/validation-notice.png) | ||
|
||
If you click on this notice, you'll see more detailed information about what in the table is causing the validation to fail. This will allow you to validate metadata in real time, as you work on the table. | ||
|
||
## Using the universal validator | ||
|
||
Alternatively, for a more flexible approach, you can use the [Universal Validator](https://pephub.databio.org/validate). This provides a 2-step interface where you first provide a PEP, either by selecting one from PEPhub or by uploading it, and then a schema, which can be either selected from PEPhub, uploaded, or pasted. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
# How to version control metadata with PEPhub | ||
|
||
PEPhub table versions happen through two features: 1) history; and 2) tags. | ||
|
||
## History | ||
|
||
PEPhub automatically records a history of your files whenever changes are made. Any time you click "save", an entry is added to your history. You can view the history of table edits by selecting the `History` option from the `More` menu. | ||
|
||
![alt text](../img/menu-history.png) | ||
|
||
Selecting this option will bring up the *History Interface*, which will provide buttons allowing you to view or delete entries from your history table. If you choose the `View` button for an entry, it will show you the PEP at that point in history. It also opens a new interface that will allow you to click `Restore` to overwright your current PEP with the historical version you are currently viewing, or you can `Download` the table as it was at that point in history. | ||
|
||
![alt text](../img/history-interface.png) | ||
|
||
In PEPhub, old versions are kept automatically, and they are referenced by date. PEPhub does not automatically assign version numbers or other identifiers; the only way to identify the old versions is by timestamp. | ||
|
||
|
||
### History retention policy | ||
|
||
**Old versions of sample tables are kept for 30 days.** Once a history entry is more than 30 days old, it will be automatically purged. If you want to keep an old version for longer, then you will need to manually tag the version, thereby forking it into a new repository. | ||
|
||
## Tags | ||
|
||
The other versioning feature offered by PEPhub is to use tags. PEPhub tags are unique identifiers of repositories. Every repository has a tag. By default, the tag is simply *default*. The registry path of each PEP takes the form of: | ||
|
||
``` | ||
{namespace}/{repository}:{tag} | ||
``` | ||
|
||
For example, `nsheff/my_new_pep:v1` would be the `my_new_pep` repository in my user namespace (`nsheff`), and `v1` is the tag. You can use tags to version your own PEPs. When you're ready to declare a version, just fork the current PEP into a new PEP and name the version tag accordingly. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,21 @@ | ||
# How to use PEPhub views | ||
|
||
*Documentation pending* | ||
## What are views? | ||
|
||
Large tables (*e.g.* >5,000 rows) can be unweildy with PEPhup. It can be hard to find the elements you're looking for. To address this, PEPhub provides the *Views* feature. Views provide a way to look at a subset of a large table (basically, a filtered table). | ||
|
||
## How to create a view | ||
|
||
To create a new view, click the *Down Arrow* to access the filter menu, and set up a filter. This will change the table to display a subset of the rows. | ||
|
||
![alt text](../img/menu-filter.png) | ||
|
||
Then, you can use the View Settings menu (gear icon next to the view selector) to open the Views interface. | ||
|
||
![alt text](../img/select-view.png) | ||
|
||
This will allow you to save the view. You can then select it any time from the views menu. | ||
|
||
## Read-only limitation | ||
|
||
Views are currently read-only; you will not be able to make edits to the table while viewing a subset. We hope to remove this restriction in the future. |