Let geoms pass parameters to scales. ScaleContinuous methods to map values to colours #5031

zeehio · 2022-11-06T15:20:19Z

This is the base of one of the pillars of:

geom_raster() of a matrix: Performance analysis and improvements #4989

Currently:

geoms map data frame columns to aesthetics.
scales transform those column values into aesthetics values.
geoms use the transformed values into renderable objects (grobs)

The user can freely combine geoms with scales, and use multiple geoms with the same scale. The independence between geoms and scales lets most geoms work with most scales and that is a great core part of ggplot2.

While this independence is great, there are some implementation details left to ggplot2 that leave room for improvement in terms of performance. In particular, as shown on #4989, a plot that can be built and rendered in <6 seconds may take ~45 seconds in ggplot2.

One of the major bottlenecks is the mapping of column values into aesthetic values, done by the scale$map() method. What can we do to make it faster?

Make the underlying implementation of mapping values to colours faster
Letting the scales know how does the geom expect the aesthetic values to be.
Changing how values are mapped to colours

Make the underlying implementation of mapping values to colours faster

The first point is mostly taken care in other pull requests in the scales and farver packages and it covers things like:

Improving the missing value replacement
Reducing intermediate memory copies and sweeps on the data

Letting the scales know how does the geom expect the aesthetic values to be.

The second point needs some communication between geoms and scales. To give an example: when a geom maps a data to a fill or colour aesthetic, the scale will transform column values into a character vector ("#ff0000",...). Some geoms do not use character colours, but rather use native colours (for nativeRaster objects, in integer format) and they must do the format conversion when rendering (e.g. https://github.com/zeehio/ggmatrix/blob/98445bf28caaca1022c03a542b8b4541034566a2/R/geom_matrix_raster.R#L123). If the geom can tell the scale that it would rather have colours in native format, and if the scale can tell the same to the palette, the intermediate character representation of colours can be avoided with significant performance benefits. This pull request defines a way for geoms to communicate with scales, but the example described in this paragraph is tackled in a future pull request.

Changing how values are mapped to colours

The third point is how values are mapped to colours and it is what this pull request is concerned about. The pull request focuses on ScaleContinuous because it is one of the most common scales, but similar adjustments could be applied to other scales if desired.

ScaleContinous maps values to palette colours as follows:

unique values are found
unique values are mapped to colors
colors are matched to the original vector

ggplot2/R/scale-.r

Lines 608 to 610 in 63125db

    
           uniq <- unique0(x) 
        
           pal <- self$palette(uniq) 
        
           scaled <- pal[match(x, uniq)]

When most values are unique, this mapping could be faster by simply maping all values to colors,
without finding and matching unique values first. In some cases the geom can guess or know if that is going to be the case.

This pull request establishes a way for geoms to communicate parameters to scales, and specifically use those parameters to define three different mapping_methods. By default the current "unique" approach is used. The geom may specify "raw" or "binned" instead.

The geom defines a new method scale_params= that typically will be a list (or a function that takes the computed params and returns that list). The list is named with the aesthetics, and for each aesthetic it provides a list with options.

For instance, the geom may now specify scale_params = list(fill=list(mapping_method = "raw")) to tell the scale corresponding to the fill aesthetic to use a "raw" mapping method, this is without finding unique values first. The "raw" method is usually faster than the current "unique" method for instance when the data consists of doubles without duplicate values.

Besides the default "unique" and the new "raw" mapping methods, we also allow the geom to ask to use the "binned" mapping method where the geom specifies a number of intervals to use scale_params = list(fill=list(mapping_method = "binned", mapping_method_bins = 256)) and the mapping process is as follows:

values are binned in N intervals
intervals are mapped to colors

This approach is "lossy" (we have a maximum of N different colours), but this can be much faster and have almost no difference with respect to the other mapping methods.

Questions/Discussion

Shouldn't this "mapping_method" be just a scale argument?
Yes... with a "but maybe". Yes, that makes sense. If the "mapping_method" is a relevant argument for the scale it could be one of the scale_*_gradient(...) arguments. However it seems a rather "internal" argument and it won't be easy for a regular user to see its effect. An alternative could be to sample the vector we want to map and, based on the density of unique values in the sample, we could choose either "unique" or "raw". However, by letting the geom hint the scale we can let the scale use a more efficient default mapping method in some scenarios.

ScaleContinous maps values to palette colours as follows: - unique values are found - unique values are mapped to colors - colors are matched to the original vector If most values are unique, we can be faster by simply maping all values to colors, without finding and matching unique values first. In some scenarios the geom can guess or know if that is going to be the case. The goal of this commit is to let the geom tell the ScaleContinuous scale how the mapping from values to colours should be done. By default the existing "unique" approach is used. The geom may now specify `scale_params = list(fill=list(mapping_method = "raw"))` to tell the scale corresponding to the fill aesthetic to use a "raw" approach of mapping values to colours without finding unique values first. Besides the default "unique" and the new "raw" mapping methods, we also allow the geom to ask to use the "binned" approach, where the geom specifies a number of intervals to use and the mapping process is as follows: - values are binned in N intervals - intervals are mapped to colors This approach is "lossy" (we have a maximum of N different colours), but this can be much faster and have almost no difference with respect to the other mapping methods.

aphalo · 2022-12-14T23:40:49Z

An alternative is to limit the number of colours used to those that an observer can distinguish and automatically switch to binning when there are more distinct values to be mapped in the data. There is no point in using more hue or lightness values than those that can be perceived as different. (the basis of JPEG).

teunbrand · 2022-12-22T15:24:00Z

Hi Sergio,

I think having extra options to make colour mapping more efficient is a good thing. However, putting scale parameters under the control of geoms seems to me like it goes against the grammar of graphics. I think you're spot on with this point here:

Shouldn't this "mapping_method" be just a scale argument?

Is there a good reason to implement this at the geom level of things?

zeehio · 2022-12-24T15:45:34Z

An alternative is to limit the number of colours used to those that an observer can distinguish and automatically switch to binning when there are more distinct values to be mapped in the data. There is no point in using more hue or lightness values than those that can be perceived as different. (the basis of JPEG).

You are right, however the issue implementing this suggestion is that the mapping from numbers to colours is not always related just to hue, or just to brightness and it may be a combination of an arbitrary number of gradients, so it's not that easy to tell in advance the what's the threshold in your values where two numbers become the same colour, without mapping them. And if you have spent the time mapping all the values to colours already, then there is not much more to optimize... :-)

zeehio · 2022-12-24T15:52:12Z

Hi Sergio,

I think having extra options to make colour mapping more efficient is a good thing. However, putting scale parameters under the control of geoms seems to me like it goes against the grammar of graphics. I think you're spot on with this point here:

Shouldn't this "mapping_method" be just a scale argument?

Is there a good reason to implement this at the geom level of things?

To be honest, I thought about this when I was writing the pull request. I will have to change it so it becomes a scale argument.

There is a scenario (not related to this PR) where it makes sense for the geom to set a scale parameter: A geom may prefer to render colours in a character format (like "#FF0000") or it may prefer them in native format (integers, to be used by nativeRaster objects). In that case, it makes sense for the geom to tell the scale "hey, give me the colours as integers if possible" so the scale returns the colours as integers and an extra conversion is avoided. It's just an implementation detail, but has a significant impact in performance. If you keep on reviewing all my pull requests (sorry for the extra work) you will come across that one...

I will rewrite this pull request whenever I have time (probably in a couple of weeks)

zeehio · 2022-12-24T16:14:08Z

After further reading, most of what I suggest here can already be done with scale_fill_binned(), so I will close this pull request and clean up the pull requests that depend on this one.

This was referenced Nov 6, 2022

Extend palette contract: Palette capabilities #5032

Open

Scale capability: Use native colour format if the geom prefers it #5033

Open

zeehio force-pushed the scale-cont-mapping-method branch from d18d316 to 57997cc Compare November 6, 2022 15:59

zeehio added 4 commits November 6, 2022 17:04

FEAT: Let the scale mapping accept parameters

f995ea1

FEAT: Let geoms define scale parameters and pass those to the scales

7d1dc95

TEST: scale_params "binned" mapping method

3d394b0

zeehio force-pushed the scale-cont-mapping-method branch from 57997cc to 3d394b0 Compare November 6, 2022 16:06

zeehio mentioned this pull request Nov 6, 2022

geom_raster() of a matrix: Performance analysis and improvements #4989

Open

zeehio closed this Dec 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Let geoms pass parameters to scales. ScaleContinuous methods to map values to colours #5031

Let geoms pass parameters to scales. ScaleContinuous methods to map values to colours #5031

zeehio commented Nov 6, 2022

aphalo commented Dec 14, 2022

teunbrand commented Dec 22, 2022

zeehio commented Dec 24, 2022

zeehio commented Dec 24, 2022

zeehio commented Dec 24, 2022

	uniq <- unique0(x)
	pal <- self$palette(uniq)
	scaled <- pal[match(x, uniq)]

Let geoms pass parameters to scales. ScaleContinuous methods to map values to colours #5031

Let geoms pass parameters to scales. ScaleContinuous methods to map values to colours #5031

Conversation

zeehio commented Nov 6, 2022

Make the underlying implementation of mapping values to colours faster

Letting the scales know how does the geom expect the aesthetic values to be.

Changing how values are mapped to colours

Questions/Discussion

aphalo commented Dec 14, 2022

teunbrand commented Dec 22, 2022

zeehio commented Dec 24, 2022

zeehio commented Dec 24, 2022

zeehio commented Dec 24, 2022