PseudoBulk with condition-aware replicates #1260
Replies: 4 comments 8 replies
-
add a new column to your cellColData that combines sample and condition |
Beta Was this translation helpful? Give feedback.
-
Hi @rcorces, I was hoping to explain my question better. In the pseudobulking procedure you highlight as a flow chart in chapter 9.1,
Please let me know if my understanding is correct. Thanks very much! |
Beta Was this translation helpful? Give feedback.
-
What are the recommendations of minCells and minRep to capture biological variability in case of a cluster with more than 50k cells and 50 samples? In your examples, you illustrate that 9.1.1 - sample C is left-out since there were enough to make 5 replicates - is this truly desirable to leave one sample out? And in cases where there are more than 50 samples per cluster - should one make atleast 40 replicates? Also, how are the pseudobulk replicates used for downstream analysis? This is not clear from documentation. Is it going to be used only for peak calling per replicate per cluster and to add reproducible peak set based on the presence of a peak per replicate? Or will these pseudobulk replicates will be used for any other analysis as well? If peaks are called per pseudobulk replicate - it is desirable to have as many cells as possible per replicate (maxCells) and include all the samples (maxRep) if possible, correct? Thanks for your time and help. |
Beta Was this translation helpful? Give feedback.
-
Hi @rcorces, I am having some difficulty understanding all the underlying concepts of pseudo-bulk replication. Let's say, I make two pseudobulk replicates per cluster (one for all cases and other for the control sample). In this case, I had to specify sampleLabels ="condition" during addGroupCoverages. Then when iterative peak merging procedure is performed - the documentation says - "It is important to note that ArchR uses a normalized metric of significance for peaks to compare the significance of peaks called across different samples. This is because the reported MACS2 significance is proportional to the sequencing depth so peak significance is not immediately comparable across samples." So in this case, would ArchR still consider per-sample sequencing depth (from Sample column), or would it consider per-condition sequencing depth per cluster? (since I have changed the SampleLabels parameter). |
Beta Was this translation helpful? Give feedback.
-
Hi @rcorces,
In the creation of pseudobulk replicates, it can happen in a sample-aware manner, which is great. However, it seems like in certain cases where minCells per sample may be insufficient, it creates a pseudobulk across different samples. This can be problematic if the cells from condition A (disease) and mixed with cells from condition B (healthy). This can be a common scenario especially when groupBy="SubClusters", where sub-clusters may not have sufficent cells contributing from every sample. Is it possible to do this analysis in a sample and condition aware manner? Or if you think it would not matter - please do let me know.
In addition, for my data, arrow files were created per-sample basis which referred to a library and each sample has 2 multiplexed subjects which were later demultiplexed and assigned to their identity in metadata. Is it possible to specify the etadata column corresponding to the demplultiplexed subjects for subject-aware pseudobulking? Can I possibly change the 'Sample' column with my subjects ID for this? Or would I need to change the rows of the metadata/ArchR.arrow files? (since arrow files were created per-sample basis and not subject-basis).
Thanks for your help. Would really appreciate your input on this.
Anjali
Beta Was this translation helpful? Give feedback.
All reactions