-
Notifications
You must be signed in to change notification settings - Fork 0
/
all.1000.100.Rmd
36 lines (30 loc) · 1.29 KB
/
all.1000.100.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
---
title: "Clustering of 1000 latest projects"
author: "Jose A. Dianes"
date: "20 November 2014"
output: html_document
---
```{r, echo=FALSE, message=FALSE, echo=FALSE,render='asis'}
require(prider)
require(prideRcompare)
```
We can cluster lists of `ProteinDetail` and `ProjectSummary` instances, although
in the end the clustering is always done on the protein details and therefore
the later method uses the former one through the PRIDE Archive web service. In
order to be a good citizen, the recommended usage is through `ProteinDetail`
obtaining first the list of lists of protein details for each project.
```{r, cache=TRUE}
projects.1000 <- get.list.ProjectSummary(1000)
projects.1000.protein.details.100 <- lapply(projects.1000, function(x) {get.list.ProteinDetail(accession(x), 100)})
clusters.1000.100 <- cluster.ProteinDetails(projects.1000.protein.details.100)
```
That will give us a hierarchical cluster objects (as generated by `hclust`) that
we can use to find out clusters (e.g. 5 clusters) using:
```{r}
cancer.projects.100.accessions <- sapply(cancer.projects.100, accession)
cutree.labels(cancer.clusters.100.100, 5, cancer.projects.100.accessions)
```
Or just plot:
```{r}
plot(cancer.clusters.100.100, cancer.projects.100.accessions, main="Clustering of latest 100 cancer projects")
```