scregclust takes too long time #5

Pang-Ka · 2024-11-21T11:18:32Z

Dear authors,
I used a dataset with 20541 features and 19840 samples to run scregclust with the following codes:
fit <- scregclust( z, genesymbols, is_regulator, penalization = seq(0.1, 0.5, 0.05), n_modules = 10L, n_cycles = 50L, noise_threshold = 0.05)

It took about 18 hours to complete one cycle of one penalization score. Is there any way to speed up this process?
Many thanks!

The text was updated successfully, but these errors were encountered:

cyianor · 2024-11-21T20:25:36Z

Hi,
How many tentative regulators are you considering, i.e. what is sum(is_regulator)?

Also, what kind of computer are you running this on?

Pang-Ka · 2024-11-25T01:37:14Z

Hi,
I found the sum(is_regulator) is 1, is it right?
My code is as follows:
z <- GetAssayData(sc, layer = "scale.data")
dim(z) #[1] 20527 19840
out <- scregclust_format(z, mode = "TF")
is_regulator <- out$is_regulator

I run the code with my MacBook Pro (Apple M1 Pro, MacOS 14.6.1)

idacharlottalarsson · 2024-11-25T16:40:07Z

Hi,

Adding to Felix's answer above, I would strongly recommend that you include a feature selection step before running scregclust, e.g. only including the top XX most variable genes in your gene expression matrix. For sc-data generated from a UMI-based platform, e.g. 10X Genomics, I usually normalize my data using sctransform which by default returns the 3000 most variable genes (but this can be of course be changed to return more or less genes). Usually, including all 20000+ genes is unnecessary since most of them have zero variance and are therefore uninformative (as you can see already in your first run more than 16000 genes were placed in the noise cluster). Hopefully this will speed things up.

Let us know how it goes!

Pang-Ka · 2024-11-26T02:20:13Z

Hi,

Adding to Felix's answer above, I would strongly recommend that you include a feature selection step before running scregclust, e.g. only including the top XX most variable genes in your gene expression matrix. For sc-data generated from a UMI-based platform, e.g. 10X Genomics, I usually normalize my data using sctransform which by default returns the 3000 most variable genes (but this can be of course be changed to return more or less genes). Usually, including all 20000+ genes is unnecessary since most of them have zero variance and are therefore uninformative (as you can see already in your first run more than 16000 genes were placed in the noise cluster). Hopefully this will speed things up.

Let us know how it goes!

Thank you for your suggestions. I tried to use top 2000 variable genes to run scregclust, but I still got some ERROR:
out <- scregclust_format(z, mode = "TF")
genesymbols <- VariableFeatures(sc)
sample_assignment <- out$sample_assignment
is_regulator <- out$is_regulator
mouseTF <- read.table('/Volumes/Data/allTFs_mm.txt')
ix<-which(genesymbols %in% mouseTF$V1)
is_regulator[ix]<-1
fit <- scregclust(z, genesymbols, is_regulator, penalization = seq(0.1, 0.5, 0.05), n_modules = 10L, n_cycles = 50L, noise_threshold = 0.05)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scregclust takes too long time #5

scregclust takes too long time #5

Pang-Ka commented Nov 21, 2024

cyianor commented Nov 21, 2024

Pang-Ka commented Nov 25, 2024 •

edited

Loading

idacharlottalarsson commented Nov 25, 2024

Pang-Ka commented Nov 26, 2024

scregclust takes too long time #5

scregclust takes too long time #5

Comments

Pang-Ka commented Nov 21, 2024

cyianor commented Nov 21, 2024

Pang-Ka commented Nov 25, 2024 • edited Loading

idacharlottalarsson commented Nov 25, 2024

Pang-Ka commented Nov 26, 2024

Pang-Ka commented Nov 25, 2024 •

edited

Loading