Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validating distances against reference implementations #245

Open
14 of 19 tasks
sdmccabe opened this issue Aug 23, 2019 · 10 comments
Open
14 of 19 tasks

Validating distances against reference implementations #245

sdmccabe opened this issue Aug 23, 2019 · 10 comments
Milestone

Comments

@sdmccabe
Copy link
Collaborator

sdmccabe commented Aug 23, 2019

For each distance we should check that either (i) netrd is the only public implementation of the distance, or (ii) that netrd's implementation of the distance produces similar outputs given the same inputs. We've done this for a bunch of them already, typically when originally implementing the distance, but we should make this process more explicit so that we don't accidentally overlook one.

  • Communicability JSD
  • Degree Divergence
  • Deltacon
  • Distributional NBD
  • DK-series distance
  • Frobenius
  • Hamming
  • Hamming-Ipsen-Mikhailov
  • Ipsen-Mikhailov
  • Jaccard
  • Laplacian Spectral
  • NBD
  • NetLSD
  • NetSimile
  • Onion Divergence
  • Polynomial Dissimilarity
  • Portrait Divergence
  • Quantum JSD
  • Resistance Perturbation

I'll start by checking off the ones I think are novel; I'm reasonably certain that there are more we know are validated.

@sdmccabe
Copy link
Collaborator Author

There are differences in output between our distance and the reference implementation of Portrait Divergence, but the differences are consistently small (the largest I've seen is 0.005, and it's usually more like 0.001). I'll keep investigating but I'd guess it's nothing.

@sdmccabe
Copy link
Collaborator Author

We should bump the PyPI version after finishing this.

@sdmccabe
Copy link
Collaborator Author

@leotrs I've checked off NBD because I assume the implementations are the same.

@sdmccabe
Copy link
Collaborator Author

HIM is producing different outputs from the R NetworkDistance implementation for RGGs (N=200, p=0.26, using the edgelists from the graphwend repo); will need to investigate further.

@leotrs
Copy link
Collaborator

leotrs commented Aug 26, 2019

@leotrs I've checked off NBD because I assume the implementations are the same.

At this point I wouldn't be surprised if netrd's implementation is more updated than mine. However, you can forget about NBD as I am the maintainer of the other one. If the outputs from the two different repos are different, then probably netrd's are correct...

@leotrs
Copy link
Collaborator

leotrs commented Aug 26, 2019

For NetSimile, I found this and this. Haven't compared them yet tho.

@sdmccabe
Copy link
Collaborator Author

NetSimile is a frustrating one since there isn't a reference implementation in the sense of author's code, so we're assuming the other independent implementations are correct. When I was debugging some NetSimile issues back in the spring I remember comparing the outputs to those from the netcomp library; I don't know if anything has changed since but I believe they were producing similar or identical outputs.

@leotrs
Copy link
Collaborator

leotrs commented Aug 26, 2019

We could use it as a touchstone only then. As long as we're in their ballpark, we're good.

sdmccabe added a commit that referenced this issue Sep 23, 2019
Harrison pointed out in a comment on our paper that our Hamming implementation
has an implicit $N^2$ instead of $N(N-1)$ normalization, so it's wrong for
graphs without selfloops. This corrects that, similar to #242.

A couple of notes:

1. I think this could be a little cleaner; the fact that `np.triu_indices()` et
al return 2-tuples cramped my style a bit.
2. The fact that this and #242 exist raise concern that this normalization issue
may be present elsewhere. Perhaps we should open a checklist issue, like we have
for #245?
3. I have not applied the same correction to `HammingIpsenMikhailov`, on the
grounds that: (i) it's sufficiently different from regular `Hamming` to consider
separately, and (ii) it probably deserves a more thorough cleanup.
@leotrs
Copy link
Collaborator

leotrs commented Oct 9, 2019

Frobenius and Jaccard depend on row ordering, yes?

Unrelatedly, they both seem to be simple enough that we can just check them off?

@sdmccabe
Copy link
Collaborator Author

sdmccabe commented Oct 9, 2019

They should depend on row ordering, @jkbren would be able to confirm from his experiments.

They're probably simple enough to check off, but simplicity can be deceiving; see the issues we had with Jaccard before in #180.

@sdmccabe sdmccabe added this to the 1.0 milestone Oct 29, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants