Skip to content

Commit

Permalink
Merge pull request #47 from ncihtan/phase2_chgs
Browse files Browse the repository at this point in the history
Phase2 chgs
  • Loading branch information
ecerami authored Oct 17, 2024
2 parents ece5a61 + a8d85f5 commit 2a0fdcc
Show file tree
Hide file tree
Showing 17 changed files with 120 additions and 94 deletions.
Binary file modified .DS_Store
Binary file not shown.
5 changes: 1 addition & 4 deletions addtnl_info/tool_protocol.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,11 @@
order: 1000
---

# Tool and Protocol Curation
# Submitting Tools

Computational tools developed or used to support HTAN research projects can be added to the HTAN tool catalog by filling out the tool curation form available on [HTAN's Synapse Wiki page](https://www.synapse.org/#!Synapse:syn17022193/wiki/584990).


Information regarding how protocols are developed/shared is also available on [HTAN's Synapse Wiki page](https://www.synapse.org/#!Synapse:syn17022193/wiki/584990).


!!! Note

The HTAN Synapse Wiki page is restricted to HTAN members. Please contact htandcc@ds.dfci.harvard.edu if you are a member of HTAN and need access to the wiki.
Expand Down
67 changes: 0 additions & 67 deletions data_model/identifiers.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,70 +60,3 @@ If you will be creating HTAN identifiers for a HTAN Center or Trans Network Proj
## ID to ID linkages

Note that the explicit linking of participants to biospecimens to assays is not encoded in the HTAN Identifier. Rather, the linking is encoded in explicit metadata elements (see [Relationship Model](relationships.md)).

## Creating HTAN Identifiers

The following are step by step instructions for HTAN Centers and Trans Network Projects (TNPs) to create and manage HTAN identifiers. HTAN identifiers should be created for all entities (participants, biospecimens and data-files) within individual research projects.

### Step 1: Determine you HTAN Center ID

Please see [HTAN Centers](../overview/centers.md) to determine your HTAN Center ID. If the data are part of a Trans Network Project (TNP), use the HTAN Center ID assigned to the TNP.

### Step 2: Assign HTAN Identifiers for all Research Participants.

Create a unique HTAN Identifier for each research participant in the following format:

`<participant_id>`::= `<htan_center_id>_integer`

e.g. HTA3_1

Each HTAN Center/TNP controls their own namespaces, and therefore owns all identifiers that begins with their prefix. The integer value following `<htan_center_id>` is determined by the HTAN Center/TNP.

!!! Participant IDs do not need to be consecutive

HTAN Centers/TNPs may choose to use integer blocks to assign groups. For example, CHOP may have four clinical sites, and may wish to reserve HTA4_1 to HTA4_1000 for all patients from site 1, and HTA4_1001 to HTA4_2000 for all patients from site 2. These blocks are entirely up to the research project and not managed by the DCC. The assigned integers in a set of identifiers need not be consecutive.
!!!

!!!
[Leading zeros](https://en.wikipedia.org/wiki/Leading_zero) (e.g. HTA3_01) should **not** be used in the ID.
!!!

#### Step 2b [optional]: If needed, assign HTAN identifiers for external controls

Each external control participant, if present, in your atlas must also have a unique HTAN Identifier. These identifiers are meant only for participants without precancerous or cancerous lesions, and therefore explicitly indicate lack of HTAN-relevant clinical data within the identifier itself. These participant identifiers look like:

`<participant_id>` ::= `<htan_center_id>_EXTinteger`

For example, if you are part of the Duke research center, and you have three external control research participants, you will need to create three HTAN Identifiers. For example:

HTA6_EXT1\
HTA6_EXT2\
HTA6_EXT3

As with regular research participants, the HTAN Center/TNP controls their own namespace, and therefore owns all identifiers that begin with the prefix e.g. HTA6_EXT. The integer value following HTA6_EXT is determined entirely by the HTAN Center/TNP.

### Step 3: Assign HTAN Identifiers for all HTAN Biospecimen and Data Files

Derivative entities include anything derived from a research participant, including biospecimens such as samples, tissue blocks, slides, aliquots, analytes, and data files that result from assaying those biospecimens. Each derivative entity in your atlas must also have a unique HTAN Identifier. These identifiers look like:

`<derivative_entity_id>` ::= `<participant_id>_integer`

Analogous to research participant IDs, the unique integer value following `<participant_id>` is determined entirely by the source HTAN Center/TNP. The ID must not have [leading zeros](https://en.wikipedia.org/wiki/Leading_zero).

!!! Special Case Identifers
If a single data file is derived from multiple participants, the file identifier can contain a wildcard string, e.g. ‘0000’, after the HTAN center identifier. For example:

HTA4_0000_1\
HTA4_0000_2\
HTA4_0000_3

If a data file is derived from an external control participant, the biospecimen and file identifiers will contain the string ‘EXT’ before the external control participant integer (see Step 2b, above). For example:

HTA6_EXT1_1\
HTA4_EXT2_2\
HTA4_EXT3_3
!!!

### Step 4: Keep Track of all Metadata Associated with Entities

Complex relationships among entities can emerge in any research study. For example, one or more samples may be collected from a research participant at multiple times, and each of those samples processed through a variety of analytic workflows. It is recommended that each HTAN Center/TNP maintain their own mechanism for storing annotation of entities and relationships among those --- for example, many atlases already have in place LIMs systems or spreadsheet-based systems.
2 changes: 1 addition & 1 deletion data_submission/Data_Deidentification.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
order: 998
order: 997
---

# Data De-identification
Expand Down
2 changes: 1 addition & 1 deletion data_submission/Data_Liaisons.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
order: 997
order: 998
---

# Data Liaisons
Expand Down
2 changes: 1 addition & 1 deletion data_submission/Information_New_Centers.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
order: 999
order: 1000
---

# Information for New HTAN Centers
Expand Down
2 changes: 1 addition & 1 deletion data_submission/checklist.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
order: 1001
order: 999
---

# HTAN Checklist for Acceptance of Data
Expand Down
4 changes: 2 additions & 2 deletions data_submission/clin_biospec_assay.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,14 +4,14 @@ order: 994

# Submitting Assay Data and Metadata

As stated in [Data Submission Overview](../data_submission/overview.md), data submission involves two key steps:
As stated in [Data Submission Introduction](../data_submission/overview.md), data submission involves two key steps:
1. Uploading assay data files to Synapse; and
2. Completing and validating metadata using the Data Curator App (DCA).

!!! Once assay data files are submitted to Synapse, the files will have entityIDs (e.g. syn12345670) assigned to them. These can then be prepopulated into the manifests on the DCA. For this reason, assay files should be submitted before generating the associated manifests.
!!!

This page provides details regarding those steps.
This page provides details regarding those steps. Please note that the manual currently reflects the data submission process used in HTAN Phase 1. Changes may be implemented for HTAN Phase 2.

![HTAN Data Submission Process](../img/Data_submission.svg)

Expand Down
73 changes: 73 additions & 0 deletions data_submission/creating_ids.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
---
order: 997
---

# Creating HTAN Identifiers

The following are step by step instructions for HTAN Centers and Trans Network Projects (TNPs) to create and manage HTAN identifiers. HTAN identifiers should be created for all entities (participants, biospecimens and data-files) within individual research projects.

## Step 1: Determine you HTAN Center ID

Please see [HTAN Centers](../overview/centers.md) to determine your HTAN Center ID. If the data are part of a Trans Network Project (TNP), use the HTAN Center ID assigned to the TNP.

## Step 2: Assign HTAN Identifiers for all Research Participants.

Create a unique HTAN Identifier for each research participant in the following format:

`<participant_id>`::= `<htan_center_id>_integer`

e.g. HTA3_1

Each HTAN Center/TNP controls their own namespaces, and therefore owns all identifiers that begins with their prefix. The integer value following `<htan_center_id>` is determined by the HTAN Center/TNP.

!!! Participant IDs do not need to be consecutive

HTAN Centers/TNPs may choose to use integer blocks to assign groups. For example, CHOP may have four clinical sites, and may wish to reserve HTA4_1 to HTA4_1000 for all patients from site 1, and HTA4_1001 to HTA4_2000 for all patients from site 2. These blocks are entirely up to the research project and not managed by the DCC. The assigned integers in a set of identifiers need not be consecutive.
!!!

!!!
[Leading zeros](https://en.wikipedia.org/wiki/Leading_zero) (e.g. HTA3_01) should **not** be used in the ID.
!!!

## Step 2b [optional]: If needed, assign HTAN identifiers for external controls

Each external control participant, if present, in your atlas must also have a unique HTAN Identifier. These identifiers are meant only for participants without precancerous or cancerous lesions, and therefore explicitly indicate lack of HTAN-relevant clinical data within the identifier itself. These participant identifiers look like:

`<participant_id>` ::= `<htan_center_id>_EXTinteger`

For example, if you are part of the Duke research center, and you have three external control research participants, you will need to create three HTAN Identifiers. For example:

HTA6_EXT1\
HTA6_EXT2\
HTA6_EXT3

As with regular research participants, the HTAN Center/TNP controls their own namespace, and therefore owns all identifiers that begin with the prefix e.g. HTA6_EXT. The integer value following HTA6_EXT is determined entirely by the HTAN Center/TNP.

## Step 3: Assign HTAN Identifiers for all HTAN Biospecimen and Data Files

Derivative entities include anything derived from a research participant, including biospecimens such as samples, tissue blocks, slides, aliquots, analytes, and data files that result from assaying those biospecimens. Each derivative entity in your atlas must also have a unique HTAN Identifier. These identifiers look like:

`<derivative_entity_id>` ::= `<participant_id>_integer`

Analogous to research participant IDs, the unique integer value following `<participant_id>` is determined entirely by the source HTAN Center/TNP. The ID must not have [leading zeros](https://en.wikipedia.org/wiki/Leading_zero).

!!! Special Case Identifers
If a single data file is derived from multiple participants, the file identifier can contain a wildcard string, e.g. ‘0000’, after the HTAN center identifier. For example:

HTA4_0000_1\
HTA4_0000_2\
HTA4_0000_3

If a data file is derived from an external control participant, the biospecimen and file identifiers will contain the string ‘EXT’ before the external control participant integer (see Step 2b, above). For example:

HTA6_EXT1_1\
HTA4_EXT2_2\
HTA4_EXT3_3
!!!

## Step 4: Keep Track of all Metadata Associated with Entities

Complex relationships among entities can emerge in any research study. For example, one or more samples may be collected from a research participant at multiple times, and each of those samples processed through a variety of analytic workflows. It is recommended that each HTAN Center/TNP maintain their own mechanism for storing annotation of entities and relationships among those --- for example, many atlases already have in place LIMs systems or spreadsheet-based systems.



19 changes: 19 additions & 0 deletions data_submission/dashboard.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
---
order: 992
---

# HTAN Dashboard

The HTAN Data Coordinating Center (DCC) hosts the [HTAN Dashboard](https://hdash.website-us-east-1.linodeobjects.com/) to help centers track submitted data, data completeness and data submission errors.

The main page of the dashboard provides an overview of submitted assay and metadata as well as total number of submission errors. The Atlas links on the main page provide additional details for each Atlas. The Synapse project links will take you directly to the Atlas' project in Synapse is you have Synapse access to the project.

![HTAN Dashboard Main Page](../img/hdash_main_page.svg)

On the Atlas-specific pages, there are several tables and visuals to help you assess the type of data available and any metadata validation errors. Examples include an expandable metadata validaton errors table, metadata submission matrices and a summary of available longitudinal data. Please see the figures below for examples.

![Metadata Validation Errors Table](../img/hdash_metadata_validation.png)

![Clinical Data Matrix, Tier 1 and 2 Clinical Data](../img/hdash_clindata_1_2_matrix.png)

![Longitudinal Data](../img/hdash_longitudinal_data.png)
8 changes: 4 additions & 4 deletions data_submission/overview.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
---
order: 1000
order: 1001
---

# Data Submission Overview
Only HTAN Centers and Associate Members can submit data to the HTAN Network's repositories. The Data Submission Section of this Manual is intended as a guide for HTAN Centers and Associate Members.
# Introduction
Only HTAN Centers and Associate Members can submit data to the HTAN Network's repositories. The Data Submission Section of this Manual is intended as a guide for HTAN Centers and Associate Members. Please note that the manual currently reflects the data submission process used in HTAN Phase 1. Changes may be implemented for HTAN Phase 2.

:exclamation: *Prior to submitting data, all data must be de-identified. Please see [Data De-identification](../data_submission/Data_Deidentification.md) for more information.*

Expand All @@ -18,4 +18,4 @@ Data Submission involves two key steps:

Specific details regarding data submission and the DCA are included in later sections of this manual. Please contact your [Data Liaison](../data_submission/Data_Liaisons.md) if you have any questions or issues. Please also keep your data liaison informed of any data submissions.

The current status of data uploads (refreshed every 4 hours) is available on the [HTAN Dashboard](http://hdash.website-us-east-1.linodeobjects.com/index.html).
The current status of data uploads (refreshed every 4 hours) is available on the [HTAN Dashboard](dashboard.md).
Binary file added img/hdash_clindata_1_2_matrix.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/hdash_longitudinal_data.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions img/hdash_main_page.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/hdash_metadata_validation.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
28 changes: 15 additions & 13 deletions overview/centers.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,21 +4,23 @@ order: 999

# HTAN Centers

HTAN currently consists of ten research centers, and two pilot projects. There are also multiple trans-network projects, referred to as TNPs. Each research center or TNP Project is identified with a unique HTAN prefix.
HTAN is currently in Phase 2, which includes five pre-cancer atlases and five tumor atlases.

Phase 1 of HTAN included two pilot projects, five pre-cancer atlases, five tumor atlases, and multiple trans-network projects, referred to as TNPs.

## Phase 2 Centers
| Prefix | Contact Institution or Project Name | Atlas Type | Area of Focus |
| ------ | --------------------------------------- | ---------------- | --------------------------------- |
| HTA200 | University of California San Francisco | Pre-Cancer Atlas | Skin Cancer |
| HTA201 | Oregon Health & Science University (OHSU) | Pre-Cancer Atlas | Pancreatic Cancer |
| HTA202 | California Institute of Technology (CalTech) | Pre-Cancer Atlas | Low Grade Glioma |
| HTA203 | MD Anderson (MDA)| Pre-Cancer Atlas | Gastric Cancer |
| HTA204 | Dana-Farber Cancer Institute (DFCI) | Pre-Cancer Atlas | Myeloma |
| HTA205 | Children's Hosptital of Los Angeles | Tumor Atlas | Pediatric Cancers |
| HTA206 | Washington University in St. Louis | Tumor Atlas | Prostate Cancer |
| HTA207 | Vanderbilt University | Tumor Atlas | Colorectal Cancer |
| HTA208 | MD Anderson (MDA) | Tumor Atlas | Ovarian Cancer |
| HTA209 | Yale University | Tumor Atlas | Lymphoma |
| Prefix | Contact Institution or Project Name | Project Number | Atlas Type | Area of Focus |
| ------ | --------------------------------------- | ---------------|---------------- | --------------------------------- |
| HTA200 | University of California San Francisco | [1U01CA294536-01](https://reporter.nih.gov/search/dC4XUlx4NUCtn4cO72jXMg/project-details/10995082) | Pre-Cancer Atlas | Skin Cancer |
| HTA201 | Oregon Health & Science University | [1U01CA294548-01](https://reporter.nih.gov/search/dC4XUlx4NUCtn4cO72jXMg/project-details/10995215) | Pre-Cancer Atlas | Pancreatic Cancer |
| HTA202 | California Institute of Technology | [1U01CA294551-01](https://reporter.nih.gov/search/dC4XUlx4NUCtn4cO72jXMg/project-details/10995229) | Pre-Cancer Atlas | Low Grade Glioma |
| HTA203 | MD Anderson| [U01CA294518-01](https://reporter.nih.gov/search/dC4XUlx4NUCtn4cO72jXMg/project-details/10994921) | Pre-Cancer Atlas | Gastric Cancer |
| HTA204 | Dana-Farber Cancer Institute | [1U01CA294507-01](https://reporter.nih.gov/search/dC4XUlx4NUCtn4cO72jXMg/project-details/10994712) | Pre-Cancer Atlas | Myeloma |
| HTA205 | Children's Hosptital of Los Angeles | [1U01CA294552-01](https://reporter.nih.gov/search/dC4XUlx4NUCtn4cO72jXMg/project-details/10995230) | Tumor Atlas | Pediatric Cancers |
| HTA206 | Washington University in St. Louis | [1U01CA294532-01](https://reporter.nih.gov/search/dC4XUlx4NUCtn4cO72jXMg/project-details/10995034) | Tumor Atlas | Prostate Cancer |
| HTA207 | Vanderbilt University | [1U01CA294527-01](https://reporter.nih.gov/search/dC4XUlx4NUCtn4cO72jXMg/project-details/10994992) | Tumor Atlas |Colorectal Cancer |
| HTA208 | MD Anderson | [1U01CA294459-01](https://reporter.nih.gov/search/dC4XUlx4NUCtn4cO72jXMg/project-details/10994265) | Tumor Atlas | Ovarian Cancer |
| HTA209 | Yale University | [1U01CA294514-01](https://reporter.nih.gov/search/dC4XUlx4NUCtn4cO72jXMg/project-details/10994872) | Tumor Atlas |Lymphoma |

## Phase 1 Centers

Expand Down
1 change: 1 addition & 0 deletions readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ If you have feedback for this manual, including broken links or incorrect inform

| Date | Comment | Changes summary |
|------------|--------------------------|-----------------|
| 2024-10-17 | | Added Phase 2 centers and data liaisons, HTAN Dashboard Page |
| 2024-10-04 | | Added FAQ, Governance, HTAN Usage Statistics |
| 2024-07-12 | | Updates to Data Access to reflect CDS/CGC changes |
| 2024-04-01 | Third version of manual | Simplified Data Model section; added "Submitting Data" and "Additional Information" Sections |
Expand Down

0 comments on commit 2a0fdcc

Please sign in to comment.