Skip to content

Latest commit

 

History

History
107 lines (77 loc) · 9.74 KB

README.md

File metadata and controls

107 lines (77 loc) · 9.74 KB

UCSC Analysis

Table of contents

Brief description of the scripts used to prepare the RDS from UCSC

The RDS from UCSC were filtered according to the following steps:

  • excluded projects that did not contain data from both sexes (at least 2-3/samples per sex) -> Velmeshev 4-10 years excluded, and Eze-Nowakowski integrated
  • for each dataset, kept only the cell types that had samples from both sexes
  • lastly, cell types with less than 100 cells per project and sex were not analyzed

Bash scripts

Bash command to filter the expression matrix

Before we ran this bash script for the Velmeshev dataset, we first removed all rows which contained only 0s, which represent genes with no expression across all cells. We did it by running the following command:

gunzip -c exprMatrix_Velmeshev_2022.tsv.gz | awk '{s=0; for (i=2;i<=NF;i++) s+=$i; if (s!=0)print}' | gzip > exprMtx_filt_Velmeshev_2022.tsv.gz

The following script does not mantain the header; however, we can easily retrieve it from the metadata in R, since the order of the samples in the column in the expression matrix and in the rows in the metadata cell column is the same.

split_big_ExprMtx

Arguments of the script

The script takes the following arguments:

  • -i - input expression matrix to be split in multiple subsets (extract different columns)
  • -b - input directory where the indexes of the columns to be extracted for each subset are saved (separate .TXT files, indexes listed in one line with commas as separator -> see the object split_indexes in Velmeshev_rds.R
  • -o - the output directory where to save the split expression matrices

Example

An example of how to run the script can be found below:

./split_big_ExprMtx.sh -i exprMtx_filt_Velmeshev_2022.tsv.gz -b Velmeshev_split -o Velmeshev_outs

Notes

  • Please note that if no argument is given, the script may raise error or overwrite files
  • Please note that the script may raise errors if the indexes are not sorted in ascending order
  • In our case, we had to split the indexes of three subsets in two since the list of columns was too long
  • Please note that for 3 subsets (0_1_years_1, 2nd_trimester_2, 3rd_trimester_2), the expression matrices were obtained in R instead of bash, since the script was still raising errors - we ackowledge this is a limitation of the bash script and if this step becomes a routine step in the workflow, improvements in the script will be tested

Dataset sources for second trimester integration

From the UCSC Cell Browser, we used the following dataset for the second trimester integration:

  1. Nowakowski et al. 2017 - Spatiotemporal Gene Expression Trajectories Reveal Developmental Hierarchies of the Human Cortex (paper, UCSC dataset)
  2. Eze et al. 2021 - Heterogeneity of Human Neuroepithelial Cells and Early Radial Glia (paper, UCSC dataset)

These two datasets all contained fetal samples, with both female and male samples. However, since the studies were likely not designed with a sex comparison in mind, we could find little number of age-matching samples between females and males within the same dataset. Therefore, we decided to integrate the two datasets and group the samples according to the gestational trimester, instead of by gestational week. This strategy also allowed for better comparison with the results from the Velmeshev analysis. The scripts for the RDS and integration are found above.

Brief description of the DEGs scripts

Workflow

The scripts should be run in the following order:

  1. UCSC_metadata_parsing.R
  2. Eze_Nowa_rds.R BEFORE Eze_Nowa_integration.R
  3. Velmeshev_split.R
  4. split_big_ExprMtx.sh
  5. Velmeshev_furu_rds.R, Velmeshev_rds.R
  6. UCSC_integration_2nd_trimester_all.R
  7. UCSC_Velmeshev_all_ages_furu.R
  8. all_scripts_Velmeshev_2nd_trim.R, all_scripts_Velmeshev_3rd_trim.R, all_scripts_Velmeshev_0_1_years.R, all_scripts_Velmeshev_1_2_years.R, all_scripts_Velmeshev_2_4_years.R, all_scripts_Velmeshev_10_20_years.R, all_scripts_Velmeshev_Adult.R, DEGs_Eze_Nowa.R - in no particular order