-
-
Notifications
You must be signed in to change notification settings - Fork 366
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory Error while generating tiles using render_tiles in tilling.ipynb example #1039
Comments
Sounds like you already found a good way to encounter this problem, but I imagine you want to avoid this problem! :-) I would think that a dask dataframe without calling .persist() would avoid memory issues like this, at the cost of being slower than using a persisted Dask dataframe. But if you're using a non-persisted Dask dataframe and still running into issues, it's probably that we are keeping too many tiles in memory somehow, which I'd guess we could avoid. We'd need a reproducible example with runnable code and info about how much memory you have, and I can't promise when we'd be able to look into that, but I'd guess that it would be solvable if we did have time to look at it. |
yes, im not persisting dask dataframe
This is the trace of the error
what is the suggestable specs like ram and cpu cores to make it work? |
That's not a reproducible example; we'd need something that fully specifies what code to run and how to get the data it needs (typically by synthesizing it). In any case, I don't know precisely where the bottleneck is, so I can't speculate how much memory would be needed, other than "more" :-/. |
For a count aggregation, each super tile, the XArray data array returned from the rasterize_func, is roughly 64 MB in size. If you're doing a categorical aggregation, you can multiply this by the number of categories. The rendering process currently keeps the results for each super tile in memory until they're sliced into the individual TMS tiles. For zoom level 12, with a global dataset, there's 65792 supertiles, so roughly 4 TB of memory usage. There's currently a pull request #1024 that adds an option for a local_cache_path so these intermediate aggregates can be stored to disk, so the tile rendering process is bounded by disk space/io instead of RAM. You can get the number of super tiles for a specific zoom level for your dataset, using the below, and multiply this by 64 M to get an estimate on memory consumption.
|
Thanks, @hokieg3n1us ; that's super useful! |
@hokieg3n1us that clearly needs to be refactored if all supertiles are held in memory. I'm happy to take a look and suggest changes. We are also doing tile rendering in mapshader, but experimental. |
@brendancol I know where to make the change that I can make in my branch. It's an easy enough fix. The only trade off is that you'll be calling the rasterize_func twice for each super_tile, once when you're calculating the zoom level statistics to get the span, and a second time later when you go to render the sub tiles. |
@brendancol @jbednar I made the change on my branch for #1024, and pushed it. I included some notes in my latest commit that anyone using the render_tiles functionality should be careful to tune the Dask scheduler. Specifically, the Dask Bag default for 'processes' should be avoided if you persist your input DataFrame, since it'll get copied to each process during the load_data_func. And if you're using the MBTiles feature, you'll want to tune the num_workers to prevent locking of the SQLite file. |
Hi
Im trying to generate Tiles using datashader render_tiles.
Datashader Tilling Example
My dataset size is 8M records
In the process of aggregation facing memoryError
MemoryError: Unable to allocate 64.0 MiB for an array with shape (4096,4096) and data type uint32
Inside rasterize_func function at csv.points(df,'x,'y')
Am getting this error at zoom level 12 and above but till zoom 11 am able to successfully generate tile sets with same code snippet
I also tried with dask data frames to improve processing.
What is the best way to encounter this problem?
The text was updated successfully, but these errors were encountered: