作者:Aneeshan Sain Ayan Kumar Bhunia Pinaki Nath Chowdhury Subhadeep Koley Tao Xiang Yi-Zhe Song
本文利用CLIP进行基于零样本草图的图像检索(ZS-SBIR)。我们在很大程度上受到了基础模型最新进展的启发,以及它们似乎提供的无与伦比的泛化能力,但这是第一次为草图社区量身定制。对于类别设置和细粒度设置(“全部”),我们提出了如何最好地实现这种协同作用的新颖设计。我们解决方案的核心是快速学习设置。首先,我们通过考虑特定于草图的提示,表明武器已经有了一个类别级的ZS-SBIR系统,它以很大的优势(24.8%)超过了所有现有技术——这是研究CLIP和ZS-SSBIR协同作用的一个很好的证明。然而,进入细粒度设置更为棘手,需要深入研究这种协同作用。为此,我们提出了两个具体的设计来解决问题的细粒度匹配性质:(i)额外的正则化损失,以确保草图和照片之间的相对分离
In this paper, we leverage CLIP for zero-shot sketch based image retrieval(ZS-SBIR). We are largely inspired by recent advances on foundation models andthe unparalleled generalisation ability they seem to offer, but for the firsttime tailor it to benefit the sketch community. We put forward novel designs onhow best to achieve this synergy, for both the category setting and thefine-grained setting (“all”). At the very core of our solution is a promptlearning setup. First we show just via factoring in sketch-specific prompts, wealready have a category-level ZS-SBIR system that overshoots all prior arts, bya large margin (24.8%) – a great testimony on studying the CLIP and ZS-SBIRsynergy. Moving onto the fine-grained setup is however trickier, and requires adeeper dive into this synergy. For that, we come up with two specific designsto tackle the fine-grained matching nature of the problem: (i) an additionalregularisation loss to ensure the relative separation between sketches andphotos is uniform across categories, which is not the case for the goldstandard standalone triplet loss, and (ii) a clever patch shuffling techniqueto help establishing instance-level structural correspondences betweensketch-photo pairs. With these designs, we again observe significantperformance gains in the region of 26.9% over previous state-of-the-art. Thetake-home message, if any, is the proposed CLIP and prompt learning paradigmcarries great promise in tackling other sketch-related tasks (not limited toZS-SBIR) where data scarcity remains a great challenge. Code and models will bemade available.
论文链接:http://arxiv.org/pdf/2303.13440v1
更多计算机论文:http://cspaper.cn/