-
-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add example notebook #709
Add example notebook #709
Conversation
I'd love to see two things:
|
d73cd9d
to
1bc82d6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding a notebook pre-commit hook, also extending the existing seems to be the only thing that's left here.ruff
lint to work for notebooks
Edit: The ruff
pre-commit hook already includes notebooks.
Does the notebook have large datasets/arrays or benchmarks? If so, we should remove those or convert them to be smaller for demonstration purposes, with deterministic data. |
Can we run benchmarks on large data without storing it in the notebook? |
also these notebooks look great Mateusz! Thanks! |
There are several in the |
6efb792
to
871e965
Compare
871e965
to
655d7f8
Compare
Hmm. Seems like the notebook is flaky on CI. It failed in only one of the last two commits, with no changes. |
it looks like a very solid failure though, that csr is not meant to be (1, 0) ordered. |
Well, |
Ok, so we have this code snippet from the notebook that uses Finch backend: X = sparse.random((100, 5), density=0.08) # creates COO random matrix
X = sparse.asarray(X, format="csc") # converts to CSC format
X_X = sparse.permute_dims(X, (1, 0)) @ X # for me locally it densifies as the result is: SwizzleArray(Tensor(Dense{Int64}(Dense{Int64}(Element{0.0, Float64, Int64}...
X_X.get_order() # it gives (1, 0) order so I can only convert to CSR.
X_X = sparse.asarray(X_X, format="csr") # move back from dense to CSR format So it looks like in the CI the result order of |
Is this a finch version issue? |
While the indeterminism here is definitely an issue, one other issue is that |
We use the latest |
@willow-ahrens Could it be that the output order depends on the data? |
It is possible to set the finch version we use with an environment variable. I am wondering whether Mateusz has an env var set locally? |
Also, no, the output order and format should be reproducible. As far as I can tell, it always fails on CI so far? |
@willow-ahrens It's indeterministic as to whether it passes: passing, failing, diff. |
Could the difference between these commits cause the problem? |
Nope, it was just re-enabling a lint and doesn't affect running Python code. |
I can't reproduce locally (it always fails for me). I also get sparse(sparse format, I really do think something may also be up with our finch versions. |
0afaa8b
to
7e74dce
Compare
Hmm... now it passed with But I think once we merge finch-tensor/finch-tensor-python#75 we don't need to worry about it anymore as any format will be convertible to |
Ill work towards reproducing this nondeterminism. Maybe theres a similar kernel that behaves differently. Could we start by running in verbose mode? |
Would y’all be okay if I push a few test commits?On Jun 24, 2024, at 5:25 PM, Mateusz Sokół ***@***.***> wrote:
Hmm... now it passed with csr... I couldn't reproduce it locally on macos or remote linux machine.
But I think once we merge finch-tensor/finch-tensor-python#75 we don't need to worry about it anymore as any format will be convertible to csc/csr etc.
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: ***@***.***>
|
Sure! |
I've just sent a collaborator invitation, if you accept it you should be able to push to Mateusz's branch. |
I think stdout is being suppressed. Also, it's failing now saying tensordot is not defined. How are notebooks being tested, and can we get them to print stdout? |
The issue was that you defined
without calling
In the CI notebooks are tested by executing the whole notebook, if notebook passes then it's a ✅ without an output, if there's an error it reports stacktrace and points to the cell that failed. |
@hameerabbasi I'm not sure about using |
Yes let's do that for now. I'll look for alternatives. |
👍 Let me squash git history and hide that CI job for now. |
@hameerabbasi Ok, I figured out what was the issue. It was out of memory error and I commented out the last test configurations in MTTKRP (1000x1000x1000). Let me clean the notebook up and squash. I think we can stay with |
7d528bb
to
58921c0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all the work and investigations, @mtsokol!
@willow-ahrens You can force pull (I squashed commits) the branch and the notebook should run and print the execution plan in I released a new |
wait, I'm not sure why the oom on mttkrp is related to the nondeterministic transposition order of X |
Would you prefer that be resolved in this PR? AFAICT, the code here has no issues related to that. We can, of course, file an issue to track it if you prefer. |
I'm happy to resolve in another PR! I'd like to understand though whether we have an explanation for that behavior, and whether it is still observed here. Is it explained by the OOM, for example? Was it just the finch-tensor version all along? Perhaps I can try to reproduce the problem with a simple test in finch-tensor. |
I was confused because it didn't seem like we found the original source of nondeterminism, and I wanted to check whether y'all felt like you had found it. |
@willow-ahrens They're unrelated: There's an issue where This fixed calling |
feel free to merge though, you're right that this can go through. |
@mtsokol thanks! Yes, I think dropfills should probably support transposition like copyto does. I'll add a PR. |
Thank you! My two comments in finch-tensor/Finch.jl#609 describe these two issues. |
Also, it's good to know that the nondeterminism was related to the Finch version. I'll add a more direct test for this to finch-tensor so we can see if it ever fails again nondeterministically in CI. |
Merging, thanks for the follow-ups @willow-ahrens and @mtsokol! |
Hi @willow-ahrens @hameerabbasi,
Here I add a short example notebook with benchmarks+plots for the latest alpha version released.