Authors:
Jannis Unkrig
and
Markus Friedrich
Affiliation:
Department of Computer Science and Mathematics, Munich University of Applied Sciences, Munich, Germany
Keyword(s):
3D Point Cloud Processing, 3D Computer Vision, Deep Learning, Transformer Architecture.
Abstract:
The Point Transformer, and especially its successor Point Transformer V2, are among the state-of-the-art architectures for point cloud processing in terms of accuracy. However, like many other point cloud processing architectures, they suffer from the inherently irregular structure of point clouds, which makes efficient processing computationally expensive. Common workarounds include reducing the point cloud density, or cropping out partitions, processing them sequentially, and then stitching them back together. However, those approaches inherently limit the architecture by either providing less detail or less context. This work provides strategies that directly address efficiency bottlenecks in the Point Transformer architecture, and therefore allows processing larger point clouds in a single feed-forward operation. Specifically, we propose using uniform point cloud sizes in all stages of the architecture, a k-D tree-based k-nearest neighbor search algorithm that is not only efficie
nt on large point clouds, but also generates intermediate results that can be reused for downsampling, and a technique for normalizing local densities which improves overall accuracy. Furthermore, our architecture is simpler to implement and does not require custom CUDA kernels to run efficiently.
(More)