MV-JAR:用于基于激光雷达的自监督预训练的掩模体素拼图和重建 MV-JAR: Masked Voxel Jigsaw and Reconstruction for LiDAR-Based Self-Supervised Pre-Training

作者:Runsen Xu Tai Wang Wenwei Zhang Runjian Chen Jinkun Cao Jiangmiao Pang Dahua Lin

本文介绍了用于基于激光雷达的自监督预训练的掩蔽体素Jigsaw和重构(MV-JAR)方法,以及在Waymo数据集上精心设计的数据高效的3D对象检测基准。受下游3D对象检测器中场景体素点层次结构的启发,我们设计了掩蔽和重建策略,考虑场景中的体素分布和体素内的局部点分布。我们采用反向最远体素采样策略来解决激光雷达点的不均匀分布问题,并提出了MV-JAR,它结合了两种技术来建模上述分布,从而获得了优异的性能。我们的实验揭示了以前数据高效实验的局限性,这些实验从每个激光雷达序列中以不同的数据比例均匀地采样微调分裂,导致分裂之间的数据多样性相似。为了解决这一问题,我们提出了一个新的基准,该基准对场景序列进行采样,以进行不同的微调分割,从而确保

This paper introduces the Masked Voxel Jigsaw and Reconstruction (MV-JAR)method for LiDAR-based self-supervised pre-training and a carefully designeddata-efficient 3D object detection benchmark on the Waymo dataset. Inspired bythe scene-voxel-point hierarchy in downstream 3D object detectors, we designmasking and reconstruction strategies accounting for voxel distributions in thescene and local point distributions within the voxel. We employ aReversed-Furthest-Voxel-Sampling strategy to address the uneven distribution ofLiDAR points and propose MV-JAR, which combines two techniques for modeling theaforementioned distributions, resulting in superior performance. Ourexperiments reveal limitations in previous data-efficient experiments, whichuniformly sample fine-tuning splits with varying data proportions from eachLiDAR sequence, leading to similar data diversity across splits. To addressthis, we propose a new benchmark that samples scene sequences for diversefine-tuning splits, ensuring adequate model convergence and providing a moreaccurate evaluation of pre-training methods. Experiments on our Waymo benchmarkand the KITTI dataset demonstrate that MV-JAR consistently and significantlyimproves 3D detection performance across various data scales, achieving up to a6.3% increase in mAPH compared to training from scratch. Codes and thebenchmark will be available at https://github.com/SmartBot-PJLab/MV-JAR .

论文链接:http://arxiv.org/pdf/2303.13510v1

更多计算机论文:http://cspaper.cn/

Related posts