Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add persist argument to QuantileDeltaMapping train method #1697

Closed
2 tasks done
saschahofmann opened this issue Apr 3, 2024 · 2 comments
Closed
2 tasks done

Add persist argument to QuantileDeltaMapping train method #1697

saschahofmann opened this issue Apr 3, 2024 · 2 comments
Labels
enhancement New feature or request

Comments

@saschahofmann
Copy link
Contributor

Addressing a Problem?

In the case where you want to reuse the trained dataset for multiple adjustments the docs already mention that you can trigger the training by calling .load on the .ds dataset object of the class. This loads the trained model into the memory of the main thread. For bigger, datasets it might be required to leave the data on the worker but still only compute it once.

This can already be done by doing something like

qdm.set_dataset(qdm.ds.persist())

I could imagine that this is a common enough case to add an argument to the train method that does exactly that.

Potential Solution

Extend the train method by adding persist=False optional argument. That if true updates the trained dataset to be a persisted dask array.

Additional context

No response

Contribution

  • I would be willing/able to open a Pull Request to contribute this feature.

Code of Conduct

  • I agree to follow this project's Code of Conduct
@saschahofmann saschahofmann added the enhancement New feature or request label Apr 3, 2024
@aulemahal
Copy link
Collaborator

aulemahal commented Apr 16, 2024

Hi again,

I haven't tried that yet, maybe I should! In our latest large scale workflows, we wrote the training the dataset to disk as it was larger than memory anyway.

I find the qdm.set_dataset(qdm.ds.persist()) line to be simple enough, I'm not totally convinced this warrants an implementation in xclim ? However, that implementation would also be very simple, I suggest adding a persist method to ParametrizableWithDataset here. Would that solve the issue for you?

QDM = QuantileDeltaMapping.train(ref, hist, **kwargs)
QDM.persist()
QDM.adjust(sim, **kwargs)

@saschahofmann
Copy link
Contributor Author

Ah yes that would be another way to do it. I agree maybe it doesn't warrant an extra step especially if the more common use case is to persist on disk! Feel free to close

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants