-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using sparse or distance matrices as the input data yields unexpected clustering #59
Comments
Hello, unfortunately the sparse matrix support is really rudimentary at the moment. The main problem you are running into is that the sparse matrix format needs to be symmetric, not upper or lower triangular. I don't know if this matches your real data, but in the example you provide, the distance matrix isn't sparse in the way uwot is expecting, so just to be clear: the idea is that the distance matrix contains a small number (at least Of course that sparse format still shouldn't require that that the sparse matrix be symmetric: that's a complete waste of memory and the lack of a check as to whether the supplied sparse matrix is triangular (and whether it's upper or lower) is a testament to how little thought I have given this code path. Let's leave this issue open as a bug, which I will endeavor to fix. Anyway, if you can drop the uninteresting distances, then you may be able to proceed with I think there is also a separate problem with the example you provided, which is the large number of tied distances, and which may be confusing matters. If you set Another way forward is if you can do the nearest neighbor search on the sparse matrix outside of uwot. If you can kind the k-nearest neighbors of each point (or an approximation), you can then provide the indices of the neighbors and their distances as a list of two (dense) N x k matrices, as described at https://github.com/jlmelville/uwot#nearest-neighbor-data-format. For example, row 1 of the index matrix Hope some of this helps. |
If installing from github, there is now support for passing a lower or upper triangular sparse distance matrix to Before hitting 1.0, this should probably be moved to being a supported input for |
Hello, James.
First of all, thank you very much for your awesome implementation - it is the only one to date (in R, at least) that could handle sparse matrices as an input. So I have this pre-calculated (sparse) matrix of distances that I want to use as the input to the uwot::umap function.
The function doesn't seem to accept the classical matrix of pairwise distances between a set of vectors, not in sparse form, nor in the form of a
dist
class object. The first is very relevant to me, as I have a huge dataset that I want to embed into 2d and it just so happens that I was able to calculate its distance matrix.Consider example - a pre-calculated sparse distance matrix of 10 binary vectors (hence the range from 0 to 1):
X <- matrix(data = 1, nrow = 10, ncol = 10); X[2,1] <- 0.01; X[6,1] <- 0.01; X[6,2] <- 0.01; X[upper.tri(X)] <- 0; X <- as(X, "dgCMatrix"); diag(X) <- 0
.When I pass it to the umap function
uwot::umap(X, 2)
I get the error messageI can pass the
X
as a distance matrix:u <- uwot::umap(as.dist(X), 2)
and then plot the resulting embeddings:plot(x = u[,1], y = u[,2]); text(x = u[,1], y = u[,2]+0.1, labels = c(1:10))
. But they are clearly off - from the planted elements of the input matrix, we would expect observations 1,2 and 6 to be clustered closely together, while everything else being scattered uniformly on the 2d plane:What is the format of the input distance matrix that the function expects?
Please don't tell me that by sparse matrix you meant the output of knn clustering of observations...
The text was updated successfully, but these errors were encountered: