作者:Jeya Maria Jose Valanarasu Rahul Garg Andeep Toor Xin Tong Weijuan Xi Andreas Lugmayr Vishal M. Patel Anne Menini
大多数视频恢复网络速度慢,计算量大,不能用于实时视频增强。在这项工作中,我们设计了一个高效快速的框架来执行实时视频增强,用于实际的用例,如实时视频通话和视频流。我们提出的方法,称为递归瓶颈混合器网络(ReBotNet),采用了一个分支框架。第一个分支通过使用基于ConvNext的编码器沿空间和时间维度对输入帧进行标记,并使用瓶颈混合器处理这些抽象标记来学习时空特征。为了进一步提高时间一致性,第二个分支直接对从单个帧中提取的令牌使用amixer。一种常见的解码器将两个分支的特征合并以预测增强的帧。此外,我们提出了一种递归训练方法,其中利用最后一帧的预测来有效地增强当前帧,同时提高时间一致性。到
Most video restoration networks are slow, have high computational load, andcan’t be used for real-time video enhancement. In this work, we design anefficient and fast framework to perform real-time video enhancement forpractical use-cases like live video calls and video streams. Our proposedmethod, called Recurrent Bottleneck Mixer Network (ReBotNet), employs adual-branch framework. The first branch learns spatio-temporal features bytokenizing the input frames along the spatial and temporal dimensions using aConvNext-based encoder and processing these abstract tokens using a bottleneckmixer. To further improve temporal consistency, the second branch employs amixer directly on tokens extracted from individual frames. A common decoderthen merges the features form the two branches to predict the enhanced frame.In addition, we propose a recurrent training approach where the last frame’sprediction is leveraged to efficiently enhance the current frame whileimproving temporal consistency. To evaluate our method, we curate two newdatasets that emulate real-world video call and streaming scenarios, and showextensive results on multiple datasets where ReBotNet outperforms existingapproaches with lower computations, reduced memory requirements, and fasterinference time.
论文链接:http://arxiv.org/pdf/2303.13504v1
更多计算机论文:http://cspaper.cn/