基于动作识别新基准的时空表征学习的大规模研究 A Large-scale Study of Spatiotemporal Representation Learning with a New Benchmark on Action Recognition

作者:Andong Deng Taojiannan Yang Chen Chen

建立基准(数据集套件)的目标是为公平评估提供统一的协议,从而促进特定领域的发展。尽管如此,我们指出,由于一些限制,现有的行动识别协议可能会产生部分评估。为了全面探讨时空表征学习的有效性,我们引入了一种新的视频动作识别技术BEEAR。BEAR是一个由18个视频数据集组成的集合,分为5类(异常、手势、日常、运动和结构化),涵盖了一组不同的现实世界应用程序。通过BEAR,我们彻底评估了通过监督和自我监督学习预先训练的6个常见时空模型。我们还报告了通过标准微调、少镜头微调和无监督领域自适应实现的传输性能。我们的观察结果表明,目前最先进的技术无法保证在接近现实世界应用程序的数据集上具有高性能,我们

The goal of building a benchmark (suite of datasets) is to provide a unifiedprotocol for fair evaluation and thus facilitate the evolution of a specificarea. Nonetheless, we point out that existing protocols of action recognitioncould yield partial evaluations due to several limitations. To comprehensivelyprobe the effectiveness of spatiotemporal representation learning, we introduceBEAR, a new BEnchmark on video Action Recognition. BEAR is a collection of 18video datasets grouped into 5 categories (anomaly, gesture, daily, sports, andinstructional), which covers a diverse set of real-world applications. WithBEAR, we thoroughly evaluate 6 common spatiotemporal models pre-trained by bothsupervised and self-supervised learning. We also report transfer performancevia standard finetuning, few-shot finetuning, and unsupervised domainadaptation. Our observation suggests that current state-of-the-art cannotsolidly guarantee high performance on datasets close to real-worldapplications, and we hope BEAR can serve as a fair and challenging evaluationbenchmark to gain insights on building next-generation spatiotemporal learners.Our dataset, code, and models are released at:https://github.com/AndongDeng/BEAR

论文链接:http://arxiv.org/pdf/2303.13505v1

更多计算机论文:http://cspaper.cn/

Related posts