作者:Zeqi Xiao Wenwei Zhang Tai Wang Chen Change Loy Dahua Lin Jiangmiao Pang
DEtection-TRansformer(DETR)开创了一种趋势,使用一组可学习的查询来实现统一的视觉感知。这项工作首先将这一吸引人的范式应用于基于激光雷达的点云分割,并获得了一个简单而有效的基线。尽管天真的改编获得了公平的结果,但实际分割性能明显不如以前的作品。通过深入细节,我们观察到稀疏点云中的实例对于整个场景来说相对较小,并且通常具有相似的几何结构,但缺乏用于分割的独特外观,这在图像域中是罕见的。考虑到3D中的实例更多地以其位置信息为特征,我们强调了它们在建模过程中的作用,并设计了一个鲁棒混合参数化位置嵌入(MPE)来指导分割过程。它被嵌入到主干功能中,随后迭代地指导掩码预测和查询更新过程,从而导致位置等待分割(PA Seg)
DEtection TRansformer (DETR) started a trend that uses a group of learnablequeries for unified visual perception. This work begins by applying thisappealing paradigm to LiDAR-based point cloud segmentation and obtains a simpleyet effective baseline. Although the naive adaptation obtains fair results, theinstance segmentation performance is noticeably inferior to previous works. Bydiving into the details, we observe that instances in the sparse point cloudsare relatively small to the whole scene and often have similar geometry butlack distinctive appearance for segmentation, which are rare in the imagedomain. Considering instances in 3D are more featured by their positionalinformation, we emphasize their roles during the modeling and design a robustMixed-parameterized Positional Embedding (MPE) to guide the segmentationprocess. It is embedded into backbone features and later guides the maskprediction and query update processes iteratively, leading to Position-AwareSegmentation (PA-Seg) and Masked Focal Attention (MFA). All these designs impelthe queries to attend to specific regions and identify various instances. Themethod, named Position-guided Point cloud Panoptic segmentation transFormer(P3Former), outperforms previous state-of-the-art methods by 3.4% and 1.2% PQon SemanticKITTI and nuScenes benchmark, respectively. The source code andmodels are available at https://github.com/SmartBot-PJLab/P3Former .
论文链接:http://arxiv.org/pdf/2303.13509v1
更多计算机论文:http://cspaper.cn/