README.Rmd

---
title: ""
output: github_document
---

```{r, setup, include=FALSE}
library(tidyverse)
library(knitr)
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "README-",
  echo = FALSE
)

associations <- read_csv("df_all_annotated.csv")
```

## Purpose
The purpose of this Shiny application is to interactively and quickly review previously found genetic Inflammatory Bowel Disease associations identified by the Cedars-Sinai F. Widjaja Foundation Inflammatory Bowel and Immunobiology Research Institute.

## Functionality
### Search
Users can search associations by **Gene** (e.g. NOD2), **Position** (e.g. Chr. 16, between 50731050-50766987), **SNP** (e.g. rs5743293 or imm_2_204278337), or by **File** of either a list of genes or SNPs. Searching by position queries by the position of the SNP not by the position of the Gene. 

Users can search by multiple Genes or SNPs at a time by selecting multiple Genes or SNPs respectively. If a user has a large list of Genes or SNPs they can copy and paste them into the search box but the Genes/SNPs must be in comma separated format (e.g. IL1,IL2,IL3,IL4,IL5,IL6,IL7,IL8,IL9,IL10,IL11). Due to the large marker list it is possible that your SNP or gene might not be loaded into memory. Typing your SNP out manually will ensure it is correctly visualized.

If you are going to upload a list of Genes or SNPs, the file **must** consist of a single column of Genes or SNPs in a standard format. The column can be named or un-named. 

### Filtering
Users can filter their query to only include SNP associations that reach user defined thresholds. Users can filter by:

1. P Value
2. Minor Allele Frequency (in the future)
3. SNP Location (e.g. intron, coding, complex...)
4. Overall Missingness (in the future)
5. Overall Hardy-Weinberg Assumptions (in the future)

Once a table of results is returned, a user can sub-filter the table by the output to further narrow down the results. 

### Download Data
For users who are interested in downloading their query, a **Download** button can be found at the bottom of every table. The downloaded data will reflect the filters applied from the query, but the "sub-filters" will not be applied. 

## Data 
In total, `r length(unique(associations$PHENOTYPE))` unique phenotypes generated by `r length(unique(associations$Analyst))` analysts are deposited into this repository (`r Sys.Date()`). Please contact the analyst listed if you have any additional questions about the significance of their result. 

In cases when permutation was performed, the permuted P value replaces the queryable P value and the original (non-permutted) P value is displayed next to the permutted value on he Gene-Pheno-SNP tab. 

Markers have been annotated with cis-eQTL data from the Cedars-Sinai Small Bowel 139 dataset. If the relevant loci is also a cis-eQTL in the small bowel dataset, the relevant eGENE, betas and p vaues are also listed next to the marker on the Geno-Pheno-SNP tab. Markers are designated as priority if the eQTL is positioned inside the eGENE or is flanked by the eGENE. eQTL corresponding to insertions and deletions are not included in the application. 

```{r echo = FALSE, warning=FALSE, message=FALSE}
associations <- read_csv("df_all_annotated.csv")
associations %>%
  group_by(Analyst, PHENOTYPE, Population, Notes, Year) %>%
  summarise()%>%
  kable()
```

### Guidance for Analysts Submitting Association Data
The backbone of the application is genetic association tests performed by scientists at the Cedars-Sinai Inflammatory Bowel and Immunobiology Research Institute. To have your association stored in this application the following information is needed in a CSV file per association performed.

1. Illumina Chip ID
2. P Value
3. Odds Ratio or B Value 
4. Confidence Intervals (if performed)
5. The number of patients in the association (total number in association, both cases and controls)
6. A1

In addition to the above the analyst will need to submit information on:

1. The name of the analyst who performed the association
2. The year the analysis was run
3. The sample of patients the association was performed on (e.g. Disease Sub-Type (CD or UC), Race, Ethnicity, Jewish, other unique criteria of your analysis)
4. The phenotype tested


### Annotations
All annotation data was produced by Alka Potdar, PhD and correspond to hg19 genomic locations. The following annovar databases were used dbSNP = avsnp147, UCSC knownGene (Gene in the application) and Functional location, and NCBI RefGene.

Hardy-Weinberg, minor allele frequency, and missingness will be performed globally yet might not be applicable to all sub-populations studied. 

For markers without an assigned known functional location, they have been assigned UNK. For markers withut an assigned chromosome they have been asigned Chr 999. For markers without an assigned SNP base pair location, they have been assigned location 0 on the corresponding chromosome. 

## Created By
The Cedars-Sinai F. Widjaja Foundation Inflammatory Bowel and Immunobiology Research Institute Translational Genomics Group.