Skip to content

Latest commit

 

History

History
202 lines (154 loc) · 10.5 KB

README.md

File metadata and controls

202 lines (154 loc) · 10.5 KB

SomaLogic-Data SomaLogic-Data GitHub

The ADAT files included in this repository are intended to provide existing and prospective SomaLogic customers an example data file to enable analysis preparation prior to receipt of SomaScan data, and also for those generally curious about the SomaScan data deliverable. Data in this file is not intended for biological analysis purposes or to provide any metrics for SomaScan data in general.


Files

  • example_data.adat
  • example_data_v4.1_plasma.adat
  • example_data_v5.0_plasma.adat

Installation

The example ADAT files in this repository can be retrieved in one of two ways:

  1. Cloning the repository to your local machine

  2. Using wget to retrieve individual ADAT files from the repository; see examples below

# Retrieve just the 5k (v4.0) ADAT
wget https://github.com/SomaLogic/SomaLogic-Data/raw/main/example_data.adat

# Retrieve just the 7k (v4.1) ADAT
wget https://github.com/SomaLogic/SomaLogic-Data/raw/main/example_data_v4.1_plasma.adat

# Retrieve just the 11k (v5.0) ADAT
wget https://github.com/SomaLogic/SomaLogic-Data/raw/main/example_data_v5.0_plasma.adat

SomaScan Versions

commercial name menu version size example file
5k V4 5284 example_data.adat
7k v4.1 7596 example_data_v4.1_plasma.adat
11k v5.0 11083 example_data_v5.0_plasma.adat

ADAT File Format

The ADAT file format is a SomaLogic-specific, tab-delimited text file designed to store SomaScan study data. This format is intended to be flexible and self-describing. The fields in this example file may be different than the fields in the *.adat file for your study. However, all *.adat files are comprised of four main sections arranged in the following order:

  • HEADER - Study-level information about the SomaScan experiment and how the data was processed.
  • COL_DATA - Field names and type associated with the SOMAmer reagents (columns).
  • ROW_DATA - Field names and type associated with sample information (rows).
  • TABLE_BEGIN - This section contains the experimental data organized into a data matrix of SOMAmer Reagents (columns) by samples (rows). SomaScan measurements are in relative flourescent units (RFU). The data block directly above the measurement matrix describes the SOMAmer reagents and the data block to its left contains sample-specific (e.g. clinical) information.

Example File Description

The file, example_data.adat, contains a SomaScan V4.0 study from a set of human samples. The RFU measurements themselves and other identifiers have been altered to protect personally identifiable information (PII), but also retain underlying biological signal as much as possible. There are 192 total EDTA-plasma samples from four (4) plate runs which are broken down by the following types:

  • 170 clinical samples
  • 10 calibrators (replicate controls for combining data across runs)
  • 6 QC samples (replicate controls used to assess run quality)
  • 6 Buffer samples (no protein controls)

The second file, example_data_v4.1_plasma.adat, contains a SomaScan v4.1 study from the same set of human samples. RFU measurements have been altered protect PII in this file as well. There are 163 EDTA Plasma samples from four (4) 96-well plate runs which include the following:

  • 163 clinical samples
  • 20 calibrators
  • 12 QC samples
  • 12 Buffer samples

The third file, example_data_v5.0_plasma.adat, contains a SomaScan v5.0 study from the same set of human samples. RFU measurements have been altered protect PII in this file as well. There are 163 EDTA Plasma samples from twelve (12) 96-well plate runs which include the following:

  • 163 clinical samples
  • 60 calibrators
  • 36 QC samples
  • 36 Buffer samples

Data Processing

The standard data normalization procedure for EDTA-plasma samples was applied to all three (3) datasets.

Sample and Analyte Annotations

In a standard SomaLogic ADAT, the section of information that sits directly above the measurement data (RFU data matrix) is the column meta data, which contains detailed information and annotations about the analytes, SeqIds, and their targets. See section below for further information about available fields and their descriptions.

Analyte Annotations

Information describing the analytes is found to the above the data matrix in a standard SomaLogic ADAT. This information may consist of the any or all of the following:

Field Description Example
SeqId SomaLogic sequence identifier 2182-54_1
SeqidVersion Version of SOMAmer sequence 2
SomaId Target identifier, of the form SLnnnnnn (8 characters in length) SL000318
TargetFullName Target name curated for consistency with UniProt name Complement C4b
Target SomaLogic Target Name C4b
UniProt UniProt identifier(s) P0C0L4 P0C0L5
EntrezGeneID Entrez Gene Identifier(s) 720 721
EntrezGeneSymbol Entrez Gene Symbol names C4A C4B
Organism Protein Source Organism Human
Units Relative Fluorescence Units RFU
Type SOMAmer target type Protein
Dilution Dilution mix assignment 0.01%
PlateScale_Reference PlateScale reference value 1378.85
CalReference Calibration sample reference value 1378.85
medNormRef_ReferenceRFU Median normalization reference value 490.342
Cal_V4_<YY>_<SSS>_<PPP> Calibration scale factor (for given Year_Study_Plate) 0.64
ColCheck QC acceptance criteria across all plates/sets PASS
QcReference_<LLLLL> QC sample reference value (for given QC lot) PASS
CalQcRatio_V4_<YY>_<SSS>_<PPP> Post calibration median QC ratio to reference (for given Year_Study_Plate) 1.04

Sample Annotations

Information describing the samples is typically found to the left of the data matrix in a standard SomaLogic ADAT. This information may consist of clinical information provided by the client, or run-specific diagnostic information included for assay quality control. Below are some examples of what may be present in this section:

Field Description Examples
PlateId Plate identifier V4-18-004_001, V4-18-004_002
ScannerID Scanner used to analyze slide SG12064173, SG14374437
PlatePosition Location on 96 well plate (A1-H12) A1, H12
SlideId Agilent slide barcode 258495800001
Subarray Agilent subarray (1 – 8) 1,8
SampleId 1st form is Subject Identifier, 2nd form (calibrators, buffers) 2031
SampleType 1st form for clinical samples (Sample), 2nd form as above Sample, QC, Calibrator, Buffer
PercentDilution Highest concentration the SOMAmer dilution groups 20
SampleMatrix Sample matrix Plasma-PPT
Barcode 1D Barcode of aliquot S622225
Barcode2d 2D Barcode of aliquot 9876543210
SampleNotes Assay team sample observation Cloudy, Low sample volume, Reddish
SampleDescription Supplemental sample information Plasma QC 1
AssayNotes Assay team run observation Beads aspirated, Leak/Hole, Smear
TimePoint Sample time point Baseline
ExtIdentifier Primary key for Subarray EXID40000000032037
SsfExtId Primary key for sample EID102733
SampleGroup Sample group A, B
SiteId Collection site SomaLogic
TubeUniqueID Unique tube identifier 2031
CLI Cohort definition identifier CLI6006F001
HybControlNormScale Hybridization control scale factor 0.948304
RowCheck Normalization acceptance criteria for all row scale factors PASS, FLAG
NormScale_0_5 Median signal normalization scale factor (0.5% mix) 1.02718
NormScale_0_005 Median signal normalization scale factor (0.005% mix) 1.119754
NormScale_20 Median signal normalization scale factor (20% mix) 0.996148

Parsers and Programatic Tools for ADAT Files


SomaLogic-Data was developed by the Bioinformatics Dept. at SomaLogic Operating Co., Inc.