Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve the computational complexity of fregrid. #202

Open
ngs333 opened this issue Dec 19, 2022 · 1 comment
Open

Improve the computational complexity of fregrid. #202

ngs333 opened this issue Dec 19, 2022 · 1 comment
Assignees

Comments

@ngs333
Copy link
Contributor

ngs333 commented Dec 19, 2022

The remapping weight generation of the fregrid app has a quadratic complexity of O( N1 * N2) , where N1 and N2 are the number of grid cells in the source and the target mosaics respectively. For current N1 or N2 already in the millions (e.g. C768 mosaics has about 1.4 Million cells) , it requires considerable compute time to generate the weights. There are already NCTools users with workflows with grids two orders of magnitude higher than C768 (issue 127) and even four orders of magnitude higher (issue #181). Regridding such large grids may be impossible without reducing the complexity.

The current fregrid makes an attempt to be efficient by putting what we will call a “min-max index gate” (or just a “min-max gate”) in the code to preclude some calculations. This gate allows the skipping of work in an inner loop if (roughly) the grid indexes of source grid cells are such that they are of cells outside the area of the target grid cell area. Unfortunately this gating scheme is quite suboptimal and does not change the overall quadratic complexity. To get an idea of the effectiveness of this gate relative to the quadratic complexity, we instrument function create_xgrid_2dx2d_order2 in file file create_xgrid.c by placing four counters, which below we call gate_none, gate_min_max, gate_clip and gate_poly_area. There are effectively six nested loops (over the tiles of the source and tiles of the destination, over the latitude and longitude indexes of the source grid, and over the latitude and longitude indexes of the destination grid). Gate_none is incremented when the innermost loop is entered; gate_min_max is incremented when the min-max gate is entered (i.e. when the min-max gate cannot immediately preclude the current pair of indexes for further computation); gate_clip is entered when the polygon clipping algorithm is called and returns a polygon; and gate_poly_area is entered when the polygon area is non-zero. Some of the code is below

// outermost loops over ntiles_in * ntiles_out=36 not shown
for (j1 = 0; j1 < ny1; j1++) //loop over input points.
for (i1 = 0; i1 < nx1; i1++)
  ...
for (ij = istart2[m]; ij <= iend2[m]; ij++) { // loop over all output points 
gate_none += 1;
 ...
i2 = ij % nx2;
j2 = ij / nx2;
...
!! *** This line is the min-max gate : ***
if (lat_out_min_list[ij] >= lat_in_max || lat_out_max_list[ij] <= lat_in_min) continue;
gate_min_max +=1;
  ...
if ((n_out = clip_2dx2d(x1_in, y1_in, n1_in, x2_in, y2_in, n2_in, x_out, y_out)) > 0) {
   gate_clip +=1;
   ...
   xarea = poly_area(x_out, y_out, n_out) * mask_in[j1 * nx1 + i1;
   ...
   if (xarea / min_area > AREA_RATIO_THRESH) {
        gate_poly_area += 1;
      ...
   }}}}}}

When fregrid is run as below:

fregrid --input_mosaic C192_mosaic.nc --output_mosaic C48_mosaic.nc \
--remap_file remap_c192-c48.nc --interp_method conserve_order2

For for a remapping from a source mosaic of 886K source grid cells to a target one of 55K cells, then for the counters mentioned above, then fregrid prints these values :
*** End of create_xgrid_2dx2d_order2 ***
*** gate_none gate_min_max gate_clip gate_poly_area : ***
***: 2592276480 53285186 341467 293376

That is, the min-max gate is effective at precluding 98% of the possible cell pairs from computation of the polygon clipping , but nonetheless in this small example, the polygon clipping is called about 53 million times, and then less than 1 percent of those result in positive clippings (polygons actually overlapping). Additionally, about 2.6 * 10^9 gates are evaluated to find 2.9*10^5 polygon overlaps. Overall the computational complexity of the problem was not really improved over brute force since the min-max gate is still determined N1 * N2 times, with formulas for i2 and j2 (in the code above) calculated that many times, the innermost loop counter iterated that many times, etc.

The complexity of the problem may be greatly improved by recognizing it as a search problem and using a search data structures that allows the efficient determination of which cells in one grid overlap the cells of another grid. For example, one can insert the source grid cells in a near-neighbors search data structure, and for each target grid cell the search data structure is used to determine its neighboring target grid cells in O(log(N1)) operations . Overall, in the current create_xgrid_2dx2d_order2 algorithm, the O(N1) outer loop over the source grid cells and also the min-max gating are removed and replaced with an O(log(N1)) efficient search over the source grid cells. Note that the identified near-neighbors cells will be a super-set that is slightly bigger than those that actually overlap the query (target) polygon as determined by the polygon clipping algorithm.

One data structure that can be used as the search data structure is the axis aligned bounding box binary search tree.
There are several variations possible, but particularly attractive for this problem should be those used in computer graphics for collision detection and ray tracing that include: A) tree construction that includes rotating through the different coordinates at the partitioning step of each tree level ( introduced in "Multidimensional binary search trees used for associative searching" , J. L. Bentley ,1975) and B) partitioning with spatial median and use of only the partitioning coordinate in the nodes axis-aligned bounding box tests (introduced in "Ray Queries with Wide Object Isolation and the DE-Tree", Zuniga & Uhlmann, 2006). Generally, N1 3D axis-aligned bounding boxes of the source cells are inserted into the search tree. Each tree node will have a pointer to one bounding box, the space requirement of the data structure is exactly N1 tree nodes, and its construction time is O(N1 * log (N1 )). The query that the search structure answers is the box query (using the bounding box of a target cell) to determine all source boxes overlapping the query box. The search data structure would perform the queries in O(log(N1)) operations. For the example above, instead of the 2.610^9 min_max gate preparations end evaluations, and 5.310^7 min-max gate passing and subsequent calculations, the problem is doable with less than ~ 2 * 55K * log (886,000) ~ 2*10^6 bounding box overlap tests in using the search tree, plus about the same number of polygon clippings and polygon area evaluations ( 341467 and 293376, respectively) as before.

Another possible data structure to use for this problem is the metric tree (also known as the vp-tree see “Satisfying general proximity/similarity queries with metric trees", Uhlmann, 1991; “Data structures and algorithms for nearest neighbor search in general metric spaces,” Yianilos, 1993). In this tree, points of each source grid cell are what the data structure holds. It may be set up so that only one point per source cell is contained in the data structure (possibly the center of the grid cell).
For each target cell, its center point is determined and a radius is chosen and the query is to find all points that are within a radius of that query point given some metric (e.g. the euclidean metric). A variation of this search is to find the nearest k points. The data structure can be coded to use only N1 tree nodes and O(N1 * log(N1)) construction time.

The above two data structures are optimal (or near optimal) in node visitations (search time), construction time, and space used. The application of multi-dimensional search data structures for the efficient regridding the problem is already in the published literature. See “Fast regridding of large, complex geospatial datasets”, J.D. Blower and A. Clegg; and also “Research on Searching Algorithms for Unstructured Grid Remapping Based on KD Tree”, Yu Cao et all.

Memory Considerations
Only O(N1 + N2) memory should be needed for remapping whether optimal search data structures or the current ad-hock technique is used.

GPU multi-core CPU considerations
If the current ad-hock methods is used in function create_xgrid_2dx2d_order2, then vectorization or parallelization of the simple but O(N1*N2) calculation before the min-max gate should be considered. But these augmentations or variations should be considered:
1a and 1b) Consider vectorizing only the code immediately prior to the polygon clipping call , i.e. the determination and use of the min-max gate. This is because only for a small percentage of the cases is the polygon clipping actually called, and the polygon clipping and polygon area functions appear less amenable to vectorization that the min-max gate.
2) Axis-aligned bounding box "gating test" (or AABB test) should be done prior calling the polygon clipping (function clip_2dx2d). AABB tests are extremely simple and should perform much better at precluding calls to polygon clipping than the min-max gates. Even a serial addition of AABB tests may improve the speed, however, this should be attempted in parallel (e.g. AABB tests of an entire source grid vs one target cell) prior to calling the polygon clippings. The variations of this are:
2a) Replace the min-max gating entirely with AABB tests
2b) Augment the min-max gating with AABB tests, so the coarser min-max gating is done, then AABB test are done
immediately before polygon clipping is done.

This entry was updated 12/13/2022.

@ngs333 ngs333 self-assigned this Nov 13, 2023
@ngs333
Copy link
Contributor Author

ngs333 commented Nov 13, 2023

cpp_fregrid_project_report.pdf
The computational complexity of fregrid was addressed in the fregrid C++ project. Both functions create_xgrid_2dx2d_order2 and create_xgrid_2dx2d_order1 are now merely call wrappers to the new function create_xgrid_2dx2d_st, which uses bounding boxes and a search ( DITree) supporting box queries. Overall, the computational complexity of the original the create_xgrid_2dx2d functions were reduced from quadratic (O(Nt x Ns), where Ns is the number of cells of the source grid, and Nt is the number of cells of the target grid) down to O(Nt x log Nt) + O(Ns x log Nt). The reduced complexity results in remapping weight generation that is faster by up to three orders of magnitude on todays large GFDL grids. Also, the number of times the clip_polygon function is evaluated should be proportional to the number of cells of the larger input grid. Further details are provided in the attached report "The C++ fregrid project - towards instantaneous regridding and exchange grid generation"

The code is currently available in the cpp_dev branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant