作者:Yuta Yoshitake Mai Nishimura Shohei Nobuhara Ko Nishino
我们提出了一种新的方法,用于从连续观察到的RGB-D图像中联合估计刚体的形状和姿态。与依赖于复杂非线性优化的topast方法形成鲜明对比的是,我们建议将其公式化为一种学习有效估计形状和姿态的神经优化。我们引入了深度方向距离函数(DeepDDF),这是一种神经网络,在给定摄像机视点和观看方向的情况下,直接输出对象的深度图像,用于二维图像空间中的有效误差计算。我们将联合估计本身公式化为Transformer,称为TransPoser。我们充分利用标记化和多头注意力来顺序处理不断增长的观测集,并分别用学习的动量有效地更新形状和姿势。在合成和真实数据上的实验结果表明,DeepDDF作为类别级的物体形状表示和TransPoselachieves状态-
We propose a novel method for joint estimation of shape and pose of rigidobjects from their sequentially observed RGB-D images. In sharp contrast topast approaches that rely on complex non-linear optimization, we propose toformulate it as a neural optimization that learns to efficiently estimate theshape and pose. We introduce Deep Directional Distance Function (DeepDDF), aneural network that directly outputs the depth image of an object given thecamera viewpoint and viewing direction, for efficient error computation in 2Dimage space. We formulate the joint estimation itself as a Transformer which werefer to as TransPoser. We fully leverage the tokenization and multi-headattention to sequentially process the growing set of observations and toefficiently update the shape and pose with a learned momentum, respectively.Experimental results on synthetic and real data show that DeepDDF achieves highaccuracy as a category-level object shape representation and TransPoserachieves state-of-the-art accuracy efficiently for joint shape and poseestimation.
论文链接:http://arxiv.org/pdf/2303.13477v1
更多计算机论文:http://cspaper.cn/