-
Notifications
You must be signed in to change notification settings - Fork 9
Home
In most Point Cloud neural networks, there are three major operations: Neighbor Search, Aggregation, and Feature Computation. Take the first layer of PointNet++ as an example:
Neighbor Search computes the neighbor information for each point, and generates a Neighbor Index Table.
Aggregation gathers select neighborhoods using the Neighbor Index Table. (Is simply takes in point indices, accesses the memory, and returns the accessed data). Each neighborhood consists of the central point's and all its neighbors' feature vectors. In this example, the Aggregation generates a 512 x 32 x 3 tensor (512: number of neighborhoods; 32: number of neighbors in each neighborhood; 3: the dimension of feature vectors).
Feature Computation consists of multiple MLPs.
At the end of the layer, there is usually a reduction operation. (In this example, it is a Max operation). It is worth noting that this reduction operation is sometimes referred to as an aggregation operation. This operation is not what we mean by Aggregation. The reduction is trivial from a compute perspective: the three steps above take nearly 100% of the execution time.
In most networks, e.g., PointNet++ and DGCNN, the three steps are done in order sequentially.
Delayed Aggregation does the Feature Computation first and does Aggregation after the entire Feature Computation is done. Meanwhile, Neighbor Search is done in parallel with Feature Computation. It is a form of approximation, but our experimental results indicate the accuracy does not drop (with fine-tuning).
-
Why do we delay Aggregation? It not only reduces the overall compute workload, but also introduces parallelism into the pipeline (now Neighbor Search can be done in parallel with MLPs).
-
What is changed from the algorithm's perspective? Now the Feature Computation step computes the feature of individual points instead of neighborhoods.
Below is how the first layer of PointNet++ looks like with Delayed Aggregation:
Limited delayed-aggregation is a variant of the full delayed-aggregation idea discussed above. It is inspired by some graph neural networks (GNN) implementations, most notably GCN and GraghSage. Point cloud networks and graph neural networks share many similarities. After all, they both deal with spatial data structures.
In short, GCN/GraphSage uses a limited form of delayed-aggregation, which is mathematically precise but has limited benefits. The full delayed-aggregation is approximate (retraining recovers accuracy) but is much more effective.
To understand the difference, consider a simple graph, where we have two vertices: , whose neighbors are and , and , whose neighbors are and .
Here is what the vanilla GCN does:
Clearly DensePoint: is computed twice.
Below is the optimized GCN in PyTorch Geometric, which is what the reviewer means by “delayed-aggregation”:
So, GCN’s “delayed-aggregation” is a form of memoization/common subexpression elimination, which is precise as only the MVM part (i.e., ) of a MLP is hoisted before aggregation.
Our delayed-aggregation is more general but approximate. Consider a point cloud, where we have two points: , whose neighbors are and , and , whose neighbors are and .
It’s possible to apply the same optimization above by hoisting and doing the MVM of just once. However, our delayed-aggregation goes a step beyond that: we hoist the entire MLP rather than just the MVM in the MLP:
This is approximate (because of the non-linear operation), but the accuracy can be recovered with fine-tuning (see Figure 16).
A critical advantage of our delayed-aggregation is that it can be applied to all MLP layers in a point cloud application, whereas the limited delayed-aggregation could be applied to only the first layer. To see why, consider a two-layer point cloud module:
Our delayed-aggregation could be easily applied:
But with the limited delayed-aggregation, the input feature vector after the first MLP layer will become and , which are two different vectors (i.e., no redundancy), so the same MVM hoisting can’t be applied in the next MLP layer.