Add example notebook #709

mtsokol · 2024-06-21T12:13:03Z

Here I add a short example notebook with benchmarks+plots for the latest alpha version released.

hameerabbasi · 2024-06-21T12:18:27Z

I'd love to see two things:

Testing that a notebook can be executed (without checking the output). We can use e.g. nbmake for this. This is so the notebook doesn't go obsolete, and should be done in this PR.
We can, as a follow-up, include notebooks in documentation. Maybe a task for the upcoming intern.

hameerabbasi

Adding a notebook pre-commit hook, also ~~extending the existing ruff lint to work for notebooks~~ seems to be the only thing that's left here.

Edit: The ruff pre-commit hook already includes notebooks.

ci/test_notebooks.sh

hameerabbasi · 2024-06-24T09:16:32Z

Does the notebook have large datasets/arrays or benchmarks? If so, we should remove those or convert them to be smaller for demonstration purposes, with deterministic data.

willow-ahrens · 2024-06-24T09:20:28Z

Can we run benchmarks on large data without storing it in the notebook?

willow-ahrens · 2024-06-24T09:21:02Z

also these notebooks look great Mateusz! Thanks!

hameerabbasi · 2024-06-24T09:22:24Z

Can we run benchmarks on large data without storing it in the notebook?

There are several in the examples/ folder, yes. Maybe we need a way to read the dimension length from an external env var so CI load is lower.

.pre-commit-config.yaml

examples/sparse_finch.ipynb

hameerabbasi · 2024-06-24T10:07:30Z

Hmm. Seems like the notebook is flaky on CI. It failed in only one of the last two commits, with no changes.

willow-ahrens · 2024-06-24T10:12:20Z

it looks like a very solid failure though, that csr is not meant to be (1, 0) ordered.

hameerabbasi · 2024-06-24T10:14:00Z

it looks like a very solid failure though, that csr is not meant to be (1, 0) ordered.

Well, asarray should convert to the right format if necessary.

mtsokol · 2024-06-24T10:29:26Z

Ok, so we have this code snippet from the notebook that uses Finch backend:

X = sparse.random((100, 5), density=0.08)  # creates COO random matrix
X = sparse.asarray(X, format="csc")  # converts to CSC format
X_X = sparse.permute_dims(X, (1, 0)) @ X  # for me locally it densifies as the result is: SwizzleArray(Tensor(Dense{Int64}(Dense{Int64}(Element{0.0, Float64, Int64}...

X_X.get_order()  # it gives (1, 0) order so I can only convert to CSR. 

X_X = sparse.asarray(X_X, format="csr")  # move back from dense to CSR format

So it looks like in the CI the result order of permute_dims(X, (1, 0)) @ X is sometimes (0, 1) as it's the only reason for this failure.

willow-ahrens · 2024-06-24T10:31:07Z

Is this a finch version issue?

hameerabbasi · 2024-06-24T10:32:21Z

Ok, so we have this code snippet from the notebook that uses Finch backend:

X = sparse.random((100, 5), density=0.08)  # creates COO random matrix
X = sparse.asarray(X, format="csc")  # converts to CSC format
X_X = sparse.permute_dims(X, (1, 0)) @ X  # for me locally it densifies as the result is: SwizzleArray(Tensor(Dense{Int64}(Dense{Int64}(Element{0.0, Float64, Int64}...

X_X.get_order()  # it gives (1, 0) order so I can only convert to CSR. 

X_X = sparse.asarray(X_X, format="csr")  # move back from dense to CSR format

So it looks like in the CI the result order of permute_dims(X, (1, 0)) @ X is sometimes (0, 1) as it's the only reason for this failure.

While the indeterminism here is definitely an issue, one other issue is that asarray should convert the format if necessary, similar to the order= or dtype= kwargs.

hameerabbasi · 2024-06-24T10:34:21Z

Is this a finch version issue?

We use the latest finch-tensor version -- but I'm unsure if that's behind as it pins Finch.jl.

hameerabbasi · 2024-06-24T10:36:31Z

@willow-ahrens Could it be that the output order depends on the data? sparse.random would produce different data/coordinates each time.

willow-ahrens · 2024-06-24T10:39:23Z

It is possible to set the finch version we use with an environment variable. I am wondering whether Mateusz has an env var set locally?

willow-ahrens · 2024-06-24T10:40:30Z

Also, no, the output order and format should be reproducible. As far as I can tell, it always fails on CI so far?

hameerabbasi · 2024-06-24T10:42:46Z

@willow-ahrens It's indeterministic as to whether it passes: passing, failing, diff.

willow-ahrens · 2024-06-24T10:59:30Z

Could the difference between these commits cause the problem?

hameerabbasi · 2024-06-24T11:01:04Z

Could the difference between these commits cause the problem?

Nope, it was just re-enabling a lint and doesn't affect running Python code.

willow-ahrens · 2024-06-24T12:33:02Z

I can't reproduce locally (it always fails for me). I also get sparse(sparse format, I really do think something may also be up with our finch versions.

mtsokol · 2024-06-24T15:25:24Z

Hmm... now it passed with csr... I couldn't reproduce it locally on macos or remote linux machine.

But I think once we merge finch-tensor/finch-tensor-python#75 we don't need to worry about it anymore as any format will be convertible to csc/csr etc.

willow-ahrens · 2024-06-24T20:00:40Z

Ill work towards reproducing this nondeterminism. Maybe theres a similar kernel that behaves differently. Could we start by running in verbose mode?

willow-ahrens · 2024-06-24T20:04:35Z

Would y’all be okay if I push a few test commits?On Jun 24, 2024, at 5:25 PM, Mateusz Sokół ***@***.***> wrote: Hmm... now it passed with csr... I couldn't reproduce it locally on macos or remote linux machine. But I think once we merge finch-tensor/finch-tensor-python#75 we don't need to worry about it anymore as any format will be convertible to csc/csr etc. —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: ***@***.***>

mtsokol · 2024-06-24T20:36:08Z

Would y’all be okay if I push a few test commits?

Sure!

hameerabbasi · 2024-06-25T06:09:57Z

Would y’all be okay if I push a few test commits?

I've just sent a collaborator invitation, if you accept it you should be able to push to Mateusz's branch.

willow-ahrens · 2024-06-25T10:12:53Z

I think stdout is being suppressed. Also, it's failing now saying tensordot is not defined. How are notebooks being tested, and can we get them to print stdout?

mtsokol · 2024-06-25T10:27:00Z

I think stdout is being suppressed. Also, it's failing now saying tensordot is not defined. How are notebooks being tested, and can we get them to print stdout?

The issue was that you defined X as a lazy variable, used in sparse.compute(...) but then also in:

b_hat = (inverted @ sparse.permute_dims(X, (1, 0))) @ y

without calling lazy on inverted or y or calling compute. I updated it by defining X_lazy separately. Now you can execute the cell to get the execution plan.

tensordot error comes from the fact that args are mixed lazy and eager tensors here.

In the CI notebooks are tested by executing the whole notebook, if notebook passes then it's a ✅ without an output, if there's an error it reports stacktrace and points to the cell that failed.

mtsokol · 2024-06-25T12:02:24Z

@hameerabbasi I'm not sure about using nbmake. Right now it fails with an internal nbmake error https://github.com/pydata/sparse/actions/runs/9661787940/job/26650246800?pr=709 after upgrading finch-tensor with the latest patch (version that passes everywhere else). I tried to debug it but I didn't find anything informative. Is there nbmake alternative? Maybe we could merge this notebook without running in the CI.

hameerabbasi · 2024-06-25T13:39:41Z

@hameerabbasi I'm not sure about using nbmake. Right now it fails with an internal nbmake error https://github.com/pydata/sparse/actions/runs/9661787940/job/26650246800?pr=709 after upgrading finch-tensor with the latest patch (version that passes everywhere else). I tried to debug it but I didn't find anything informative. Is there nbmake alternative? Maybe we could merge this notebook without running in the CI.

Yes let's do that for now. I'll look for alternatives.

mtsokol · 2024-06-25T13:41:34Z

👍 Let me squash git history and hide that CI job for now.

mtsokol · 2024-06-25T14:12:55Z

@hameerabbasi Ok, I figured out what was the issue. It was out of memory error and I commented out the last test configurations in MTTKRP (1000x1000x1000). Let me clean the notebook up and squash.

I think we can stay with nbmake but it could have reported OOM more explicitly.

hameerabbasi

Thanks for all the work and investigations, @mtsokol!

mtsokol · 2024-06-25T14:55:38Z

@willow-ahrens You can force pull (I squashed commits) the branch and the notebook should run and print the execution plan in sparse.compute(..., verbose=True).

I released a new finch-tensor so you also need pip install --upgrade finch-tensor in your env to convert freely between formats.

willow-ahrens · 2024-06-26T09:01:28Z

wait, I'm not sure why the oom on mttkrp is related to the nondeterministic transposition order of X

hameerabbasi · 2024-06-26T09:05:53Z

wait, I'm not sure why the oom on mttkrp is related to the nondeterministic transposition order of X

Would you prefer that be resolved in this PR? AFAICT, the code here has no issues related to that. We can, of course, file an issue to track it if you prefer.

willow-ahrens · 2024-06-26T09:09:22Z

I'm happy to resolve in another PR! I'd like to understand though whether we have an explanation for that behavior, and whether it is still observed here. Is it explained by the OOM, for example? Was it just the finch-tensor version all along? Perhaps I can try to reproduce the problem with a simple test in finch-tensor.

willow-ahrens · 2024-06-26T09:10:28Z

I was confused because it didn't seem like we found the original source of nondeterminism, and I wanted to check whether y'all felt like you had found it.

mtsokol · 2024-06-26T09:12:23Z

wait, I'm not sure why the oom on mttkrp is related to the nondeterministic transposition order of X

@willow-ahrens They're unrelated: There's an issue where X.T @ X returned different resulting order locally on my machine and in the CI. This caused that sparse.asarray(arr, format="csr") failed in the CI as arr had different order than CSR required. I implemented a fix in finch-tensor that refines order handling and passes SwizzleArrays to copyto! when we change storage, therefore any format can now be changed to any format, as copyto! accepts SwizzleArrays as source and destination.
But there's and issue where copyto!(denese_source, any_target) copies zeros too as non-zero values, therefore for dense source formats I needed to add dropfills for this case, which creates another copy.

MAINT: Refine order support finch-tensor/finch-tensor-python#75

This fixed calling sparse.asarray(arr, format="csr") for any format of arr but in MTTKRP we call sparse.asarray(dense, format="csf") which for largest dimension configuration caused an OOM as dense was copied twice (in copyto! and dropfills). I disabled running all configurations in the CI (we only want to make sure the notebook generally runs) and now the job passes.

willow-ahrens · 2024-06-26T09:12:30Z

feel free to merge though, you're right that this can go through.

willow-ahrens · 2024-06-26T09:16:00Z

@mtsokol thanks! Yes, I think dropfills should probably support transposition like copyto does. I'll add a PR.

mtsokol · 2024-06-26T09:17:58Z

Thank you! My two comments in finch-tensor/Finch.jl#609 describe these two issues.

willow-ahrens · 2024-06-26T09:18:27Z

Also, it's good to know that the nondeterminism was related to the Finch version. I'll add a more direct test for this to finch-tensor so we can see if it ever fails again nondeterministically in CI.

hameerabbasi · 2024-06-26T09:26:29Z

Merging, thanks for the follow-ups @willow-ahrens and @mtsokol!

mtsokol added the type:docs label Jun 21, 2024

mtsokol self-assigned this Jun 21, 2024

mtsokol force-pushed the example-notebook branch from d73cd9d to 1bc82d6 Compare June 24, 2024 09:13

hameerabbasi requested changes Jun 24, 2024

View reviewed changes

ci/test_notebooks.sh Outdated Show resolved Hide resolved

This was referenced Jun 24, 2024

Include notebooks in documentation #710

Open

Reduce benchmark CI load #711

Open

mtsokol force-pushed the example-notebook branch from 6efb792 to 871e965 Compare June 24, 2024 09:35

hameerabbasi reviewed Jun 24, 2024

View reviewed changes

.pre-commit-config.yaml Outdated Show resolved Hide resolved

mtsokol force-pushed the example-notebook branch from 871e965 to 655d7f8 Compare June 24, 2024 09:36

hameerabbasi reviewed Jun 24, 2024

View reviewed changes

examples/sparse_finch.ipynb Outdated Show resolved Hide resolved

hameerabbasi mentioned this pull request Jun 24, 2024

asarray should convert format if necessary finch-tensor/finch-tensor-python#74

Closed

mtsokol force-pushed the example-notebook branch from 0afaa8b to 7e74dce Compare June 24, 2024 15:10

Add example notebook

58921c0

mtsokol force-pushed the example-notebook branch from 7d528bb to 58921c0 Compare June 25, 2024 14:41

Merge branch 'main' into example-notebook

c757175

hameerabbasi approved these changes Jun 25, 2024

View reviewed changes

hameerabbasi merged commit a73b20d into main Jun 26, 2024
14 checks passed

hameerabbasi deleted the example-notebook branch June 26, 2024 09:26

DeaMariaLeon added the tests label Aug 5, 2024

Add example notebook #709

Add example notebook #709

Conversation

mtsokol commented Jun 21, 2024

hameerabbasi commented Jun 21, 2024

hameerabbasi left a comment • edited Loading

Choose a reason for hiding this comment

hameerabbasi commented Jun 24, 2024

willow-ahrens commented Jun 24, 2024

willow-ahrens commented Jun 24, 2024

hameerabbasi commented Jun 24, 2024

hameerabbasi commented Jun 24, 2024

willow-ahrens commented Jun 24, 2024

hameerabbasi commented Jun 24, 2024

mtsokol commented Jun 24, 2024 • edited Loading

willow-ahrens commented Jun 24, 2024

hameerabbasi commented Jun 24, 2024

hameerabbasi commented Jun 24, 2024

hameerabbasi commented Jun 24, 2024

willow-ahrens commented Jun 24, 2024

willow-ahrens commented Jun 24, 2024 • edited Loading

hameerabbasi commented Jun 24, 2024 • edited Loading

willow-ahrens commented Jun 24, 2024

hameerabbasi commented Jun 24, 2024

willow-ahrens commented Jun 24, 2024

mtsokol commented Jun 24, 2024

willow-ahrens commented Jun 24, 2024

willow-ahrens commented Jun 24, 2024 via email

mtsokol commented Jun 24, 2024

hameerabbasi commented Jun 25, 2024

willow-ahrens commented Jun 25, 2024

mtsokol commented Jun 25, 2024

mtsokol commented Jun 25, 2024 • edited Loading

hameerabbasi commented Jun 25, 2024

mtsokol commented Jun 25, 2024

mtsokol commented Jun 25, 2024 • edited Loading

hameerabbasi left a comment

Choose a reason for hiding this comment

mtsokol commented Jun 25, 2024

willow-ahrens commented Jun 26, 2024

hameerabbasi commented Jun 26, 2024

willow-ahrens commented Jun 26, 2024

willow-ahrens commented Jun 26, 2024

mtsokol commented Jun 26, 2024 • edited Loading

willow-ahrens commented Jun 26, 2024

willow-ahrens commented Jun 26, 2024

mtsokol commented Jun 26, 2024

willow-ahrens commented Jun 26, 2024

hameerabbasi commented Jun 26, 2024

hameerabbasi left a comment •

edited

Loading

mtsokol commented Jun 24, 2024 •

edited

Loading

willow-ahrens commented Jun 24, 2024 •

edited

Loading

hameerabbasi commented Jun 24, 2024 •

edited

Loading

mtsokol commented Jun 25, 2024 •

edited

Loading

mtsokol commented Jun 25, 2024 •

edited

Loading

mtsokol commented Jun 26, 2024 •

edited

Loading