Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MapboxTilesRenderer #1024

Open
wants to merge 13 commits into
base: main
Choose a base branch
from
Open

Conversation

hokieg3n1us
Copy link
Contributor

Implement MapboxTilesRenderer, which adds support for outputting TMS tilesets as a mbtiles file (a SQLite database following the MBTiles 1.3 schema. This file can then be easily used in GIS clients, such as QGIS, though the primary motivation was creating an output format that could easily be used by GeoServer (using the MBTiles community extension).

Example using data from ACLED (Armed Conflict Location & Event Data):

import dask.dataframe
import datashader
from datashader.tiles import render_tiles
from datashader.utils import lnglat_to_meters

import colorcet


def _get_extents():
    return df.x.min().compute(), df.y.min().compute(), df.x.max().compute(), df.y.max().compute()


def _load_data_func(x_range, y_range):
    return df.loc[df.x.between(*x_range) & df.y.between(*y_range)]


def _rasterize_func(df, x_range, y_range, height, width):
    cvs = datashader.Canvas(x_range=x_range, y_range=y_range,
                            plot_height=height, plot_width=width)
    agg = cvs.points(df, 'x', 'y')
    return agg


def _shader_func(agg, span=None):
    img = datashader.tf.dynspread(datashader.tf.shade(agg, cmap=colorcet.fire))
    return img


# Can be utilized to customize image with watermark, etc.
def _post_render_func(img, **kwargs):
    return img


if __name__ == '__main__':
    df = dask.dataframe.read_csv('ACLED.csv', usecols=['longitude', 'latitude']).persist()

    # Coordinates should be in Mercator format.
    df['x'], df['y'] = lnglat_to_meters(df['longitude'], df['latitude'])

    df.drop(['longitude', 'latitude'], axis=1)

    min_zoom = 0
    max_zoom = 8
    output_path = 'output/ACLED.mbtiles'

    render_tiles(_get_extents(),
                 range(min_zoom, max_zoom + 1),
                 load_data_func=_load_data_func,
                 rasterize_func=_rasterize_func,
                 shader_func=_shader_func,
                 post_render_func=_post_render_func,
                 output_path=output_path, num_workers=4)

ACLED.mbtiles loaded into QGIS:

image

The render_tiles function now has two big changes to it's behavior.

  1. It validates the output_path immediately. If the output_path is a directory, it will create it. If it's a file path that ends with mbtiles, it'll create the directory structure, then setup the tables & metadata. This should only be done a single time, so it's functionality is exposed by a static method in the MapboxTilesRenderer.
  2. It exposes the num_workers used by the Dask Bag, so that the SQLite file doesn't lock due to extremely high concurrency.

Create MapboxTilesRenderer for outputting TMS tile sets as a mbtiles file (a SQLite database following the MBTiles 1.3 specification).
Compute min/max of span in-place instead of building a list of all values, then computing using dask. Use dimensions of data array for label based indexing, instead of hard-coded x, y labels (which forced input coordinate columns to be named x, y).
Include capability to provide a local_cache_path. If provided, the aggregation for the super tiles will be stored locally as a NetCDF file (instead of keeping all these in memory, which eventually overflows memory at high zoom levels). This allows most of the processing to now be done completely out of core.
@hokieg3n1us
Copy link
Contributor Author

Included an additional capability for out-of-core processing. An optional parameter, local_cache_path, can be provided. If provided, the aggregates generated by Datashader for the super tiles will be persisted to that local cache as a NetCDF file, instead of keeping all these intermediate results in memory. These aggregates will then be individually loaded by Dask workers to generate the individual tiles. This allows tile generation for larger datasets at much higher zoom levels, being only dependent on your disk space.

Use the netCDF4 library for writing/loading XArray DataArrays from cache, since it properly handles unsigned data types. Optional dependency that will raise an error if not installed when the caching feature is used.
@jbednar
Copy link
Member

jbednar commented Oct 26, 2021

Very cool! We're working on fixing the tests, at which point we should be able to review this. Let me know if you're still planning to add more amazing features!

Allow updates to an existing mbtiles file, inserting or replacing rows in the SQLite database and make SQL statements consistent.
@hokieg3n1us
Copy link
Contributor Author

That's all the features I currently have planned. I'd considered supporting output to pmtiles, either directly, or as a conversion step from the mbtiles format, but that can wait until that format is more widely adopted.

calculate_zoom_level_stats moved to a parallel implementation with dask bags, that does not cache the super tiles in memory. If a local_cache_path is provided, the super tiles will be cached to disk as NetCDF files. Super tiles will then be either recomputed during the render_super_tiles function, or loaded from the local cache.

Note: If using the Datashader render_tiles, the scheduler for Dask should be configured to 'threads' (when running on local) to avoid Dask bag computations from copying the input DataFrame to multiple processes, which will increase memory overhead. If outputting tiles to the MBTiles format, the num_workers should be tuned to prevent the SQLite database from locking during transactions. Both of these can be configured using dask.config.set(scheduler='threads', num_workers=4).
Handle setup of output paths.
@ianthomas23 ianthomas23 assigned ianthomas23 and unassigned jbednar Jul 18, 2022
@ianthomas23 ianthomas23 modified the milestones: v0.14.2, v0.14.3 Jul 18, 2022
@codecov
Copy link

codecov bot commented May 19, 2023

Codecov Report

Merging #1024 (a0e6f1c) into main (8092f4d) will decrease coverage by 0.29%.
The diff coverage is 81.05%.

@@            Coverage Diff             @@
##             main    #1024      +/-   ##
==========================================
- Coverage   84.52%   84.24%   -0.29%     
==========================================
  Files          35       35              
  Lines        8369     8643     +274     
==========================================
+ Hits         7074     7281     +207     
- Misses       1295     1362      +67     
Impacted Files Coverage Δ
datashader/data_libraries/pandas.py 100.00% <ø> (ø)
datashader/tiles.py 58.30% <45.54%> (-11.21%) ⬇️
datashader/utils.py 80.09% <85.00%> (+0.84%) ⬆️
datashader/reductions.py 84.68% <93.15%> (+1.56%) ⬆️
datashader/compiler.py 91.07% <100.00%> (+0.50%) ⬆️
datashader/data_libraries/dask.py 95.23% <100.00%> (+0.07%) ⬆️
datashader/data_libraries/dask_xarray.py 98.95% <100.00%> (ø)

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants