all.1000.100.Rmd

---
title: "Clustering of 1000 latest projects"
author: "Jose A. Dianes"
date: "20 November 2014"
output: html_document
---
```{r, echo=FALSE, message=FALSE, echo=FALSE,render='asis'}
require(prider)
require(prideRcompare)
```

We can cluster lists of `ProteinDetail` and `ProjectSummary` instances, although
in the end the clustering is always done on the protein details and therefore 
the later method uses the former one through the PRIDE Archive web service. In
order to be a good citizen, the recommended usage is through `ProteinDetail` 
obtaining first the list of lists of protein details for each project.  

```{r, cache=TRUE}
projects.1000 <- get.list.ProjectSummary(1000)
projects.1000.protein.details.100 <- lapply(projects.1000, function(x) {get.list.ProteinDetail(accession(x), 100)})
clusters.1000.100 <- cluster.ProteinDetails(projects.1000.protein.details.100)
```

That will give us a hierarchical cluster objects (as generated by `hclust`) that
we can use to find out clusters (e.g. 5 clusters) using:

```{r}
cancer.projects.100.accessions <- sapply(cancer.projects.100, accession)
cutree.labels(cancer.clusters.100.100, 5, cancer.projects.100.accessions)
```

Or just plot:

```{r}
plot(cancer.clusters.100.100, cancer.projects.100.accessions, main="Clustering of latest 100 cancer projects")
```