Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TMT data analysis with DEqMS - with multiple set of TMT data sets #9

Open
seongminlab opened this issue Nov 2, 2022 · 3 comments
Open

Comments

@seongminlab
Copy link

Dear Assume DEqMS developer teams!

how can i running DEqMS with multiple set of TMT dat sets??

i have 55 sample that multiplexed TMT-11plex with 5 set of data sets

in this case, how can i running DEqMS?

Thank you

@yafeng
Copy link
Owner

yafeng commented Nov 3, 2022

Hi! To analyze multiple TMT sets data using DEqMS, you need to first know what the experiment design is.
There are two different situations.

  1. multiple TMT sets without internal standards.
    In this case, combine samples from all TMT sets and obtain one large protein intensity matrix including all samples(remember to do log2 transform before next steps).
    At the design matrix step, add an extra factor:
design = model.matrix(~0+group+TMTset)

"group" represent the factor which tells which group the sample belongs to, such as "ctrl" or "treated".
"TMTset" represent the factor which tells which TMT set the sample analyzed, such as "set1" , "set2", ...,"set5".

The "TMTset" factor is supposed to account for batch difference between different TMT sets. However, it relies on reasonable experiment design.

Well, you can also use other approach such as ComBat (from sva package ) to remove batch difference between TMT sets instead of linear regression mentioned above.

  1. multiple TMT sets with internal standards.
    Internal standards are aliquots of a pooled sample, it is normally used to account for variations between different TMT sets.
    In this case, you can first calculate protein ratios using the intensity of internal standards as denominator.
    Then combine protein ratios matrix in different TMT sets together and do log2 transform.

The next step is to make a PSM count table of different proteins. This is same for the above two situations.
Extract PSM count from different TMT sets, and use the minimum counts of different sets to assign it as the PSM count of each protein.

@seongminlab
Copy link
Author

Hi!

Yes. We using secondary experiment design!

now i have normalized intensity tables by pooling samples ( called IRS normalization, generally!)

but, this case i cannot using DEqMS with my PSM tables (psm.tsv from fragpipes (5 files from each experimental sets ) or PSM table results from PD) ?

and other case, called "Differential protein expression analysis with DEqMS using a protein table" in DEqMS tutorial

there are import single PSM count table
'''
psm.count.table = data.frame(count = rowMins(
as.matrix(df.prot[,count_columns])), row.names = df.prot$Protein.accession)
fit3$count = psm.count.table[rownames(fit3$coefficients),"count"]
fit4 = spectraCounteBayes(fit3)
'''
in fit3$count (This is PSM count table, right?)

to import function spectraCounteBayes()

but i have 5 PSM count table with different TMT sets.

you mention

"Extract PSM count from different TMT sets, and use the minimum counts of different sets to assign it as the PSM count of each protein."

this means import minimun PSM counts for each TMT sets to fit3$count ?
is it OK? becaus some proteins were 0 PSM counts in some sets of TMT

and this PSM counts dosen't have problems for statistical analysis?

Thank you!!

@yafeng
Copy link
Owner

yafeng commented Nov 3, 2022

If you have five PSMs table from five TMT sets. You can try to get a protein table separately following the tutorial.
"DEqMS analysis using a PSM table".
https://bioconductor.org/packages/release/bioc/vignettes/DEqMS/inst/doc/DEqMS-package-vignette.html#deqms-analysis-using-a-psm-table-isobaric-labelled-data

First log2 transform PSM tables and then use medianSummary to get protein matrix for each TMT set separately.

dat.gene.nm = medianSummary(dat.psm.log, group_col = 2, ref_col = c(4, 5) )

"group_col" refers to the column of protein IDs.
"ref_col" refers to the column of your internal standards.
The PSM table "dat.psm.log" should be organized as "Sequence", "Proteins", "reporter intensity 1", "reporter intensity 2" ...

After you get the protein matrix for each TMT set, combine them into one matrix.

this means import minimun PSM counts for each TMT sets to fit3$count ?
is it OK? becaus some proteins were 0 PSM counts in some sets of TMT.

if the minimum PSM count is 0, you can add a pseudo count 1 to it.
fit3$count = fit3$count +1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants