-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Validating distances against reference implementations #245
Comments
There are differences in output between our distance and the reference implementation of Portrait Divergence, but the differences are consistently small (the largest I've seen is 0.005, and it's usually more like 0.001). I'll keep investigating but I'd guess it's nothing. |
We should bump the PyPI version after finishing this. |
@leotrs I've checked off NBD because I assume the implementations are the same. |
HIM is producing different outputs from the R |
At this point I wouldn't be surprised if |
NetSimile is a frustrating one since there isn't a reference implementation in the sense of author's code, so we're assuming the other independent implementations are correct. When I was debugging some NetSimile issues back in the spring I remember comparing the outputs to those from the |
We could use it as a touchstone only then. As long as we're in their ballpark, we're good. |
Harrison pointed out in a comment on our paper that our Hamming implementation has an implicit $N^2$ instead of $N(N-1)$ normalization, so it's wrong for graphs without selfloops. This corrects that, similar to #242. A couple of notes: 1. I think this could be a little cleaner; the fact that `np.triu_indices()` et al return 2-tuples cramped my style a bit. 2. The fact that this and #242 exist raise concern that this normalization issue may be present elsewhere. Perhaps we should open a checklist issue, like we have for #245? 3. I have not applied the same correction to `HammingIpsenMikhailov`, on the grounds that: (i) it's sufficiently different from regular `Hamming` to consider separately, and (ii) it probably deserves a more thorough cleanup.
Frobenius and Jaccard depend on row ordering, yes? Unrelatedly, they both seem to be simple enough that we can just check them off? |
For each distance we should check that either (i)
netrd
is the only public implementation of the distance, or (ii) thatnetrd
's implementation of the distance produces similar outputs given the same inputs. We've done this for a bunch of them already, typically when originally implementing the distance, but we should make this process more explicit so that we don't accidentally overlook one.I'll start by checking off the ones I think are novel; I'm reasonably certain that there are more we know are validated.
The text was updated successfully, but these errors were encountered: