ReVersion:基于扩散的图像关系反演 ReVersion: Diffusion-Based Relation Inversion from Images

作者:Ziqi Huang Tianxing Wu Yuming Jiang Kelvin C. K. Chan Ziwei Liu

扩散模型因其生成能力而越来越受欢迎。最近,通过从示例图像反转扩散模型来生成定制图像的需求激增。然而,现有的改造方法主要集中在捕捉物体的外观上。如何颠倒视觉世界的另一个重要支柱——对象关系,这一问题仍然悬而未决。在这项工作中,我们提出了关系反转任务的ReVersion,该任务旨在从示例图像中学习特定的关系(表示为“关系提示”)。具体来说,我们从预先训练的文本到图像的扩散模型中学习关系提示。然后,可以应用所学习的关系提示来生成具有新对象、背景和样式的特定于关系的图像。我们的关键见解是“介词优先”——现实世界中的关系词可以在一组基础介词词上稀疏地激活。具体而言,我们提出了一种新的关系导向对比学习方案,将两个关键的

Diffusion models gain increasing popularity for their generativecapabilities. Recently, there have been surging needs to generate customizedimages by inverting diffusion models from exemplar images. However, existinginversion methods mainly focus on capturing object appearances. How to invertobject relations, another important pillar in the visual world, remainsunexplored. In this work, we propose ReVersion for the Relation Inversion task,which aims to learn a specific relation (represented as “relation prompt”) fromexemplar images. Specifically, we learn a relation prompt from a frozenpre-trained text-to-image diffusion model. The learned relation prompt can thenbe applied to generate relation-specific images with new objects, backgrounds,and styles. Our key insight is the “preposition prior” – real-world relationprompts can be sparsely activated upon a set of basis prepositional words.Specifically, we propose a novel relation-steering contrastive learning schemeto impose two critical properties of the relation prompt: 1) The relationprompt should capture the interaction between objects, enforced by thepreposition prior. 2) The relation prompt should be disentangled away fromobject appearances. We further devise relation-focal importance sampling toemphasize high-level interactions over low-level appearances (e.g., texture,color). To comprehensively evaluate this new task, we contribute ReVersionBenchmark, which provides various exemplar images with diverse relations.Extensive experiments validate the superiority of our approach over existingmethods across a wide range of visual relations.

论文链接:http://arxiv.org/pdf/2303.13495v1

更多计算机论文:http://cspaper.cn/

Related posts