Diffree是一种基于扩散模型的图像编辑工具,专门用于在图像中通过文本描述添加对象。它不需要用户手动绘制任何遮罩或边界框,而是依靠模型自动预测对象的位置和形状,实现无缝融合的新对象添加。 <strong><span class="css-1jxf684 r-bcqeeo r-1ttztb7 r-qvutc0 r-poiln3" data-immersive-translate-walked="83a060d5-7691-47d2-aa53-bcd87d80cae5">- 与原始图像保持一致(光线、色调、颜色等)</span> </strong> <strong><span class="css-1jxf684 r-bcqeeo r-1ttztb7 r-qvutc0 r-poiln3" data-immersive-translate-walked="83a060d5-7691-47d2-aa53-bcd87d80cae5"> - 无需画框或遮罩 </span></strong> <strong><span class="css-1jxf684 r-bcqeeo r-1ttztb7 r-qvutc0 r-poiln3" data-immersive-translate-walked="83a060d5-7691-47d2-aa53-bcd87d80cae5">- 仅根据文字描述为图像添加对象 </span></strong> <strong><span class="css-1jxf684 r-bcqeeo r-1ttztb7 r-qvutc0 r-poiln3" data-immersive-translate-walked="83a060d5-7691-47d2-aa53-bcd87d80cae5">- 自动确定放置新对象的位置</span>。</strong> 例如:你只需提供描述性文本,如“添加一只狗”或“在桌子上放一个花瓶”,Diffree就能自动在图像中找到合适的位置并添加对象。 <p data-immersive-translate-walked="ef74dfdc-29b3-4269-8921-17776b30341f"><strong><img class="aligncenter size-full wp-image-11979" src="https://img.xiaohu.ai/2024/07/Jietu20240728-133116@2x.jpg" alt="" width="2100" height="962" />解决了什么问题?</strong></p> <ul data-immersive-translate-walked="ef74dfdc-29b3-4269-8921-17776b30341f"> <li data-immersive-translate-walked="ef74dfdc-29b3-4269-8921-17776b30341f"> <p data-immersive-translate-walked="ef74dfdc-29b3-4269-8921-17776b30341f"><strong data-immersive-translate-walked="ef74dfdc-29b3-4269-8921-17776b30341f">背景一致性问题</strong>:</p> <ul data-immersive-translate-walked="ef74dfdc-29b3-4269-8921-17776b30341f"> <li data-immersive-translate-walked="ef74dfdc-29b3-4269-8921-17776b30341f">现有的文本引导图像修补方法在添加对象时,往往会破坏图像的背景一致性。Diffree通过扩散模型和对象遮罩预测模块,确保新对象能够与背景无缝融合,保持视觉一致性。</li> </ul> </li> <li data-immersive-translate-walked="ef74dfdc-29b3-4269-8921-17776b30341f"> <p data-immersive-translate-walked="ef74dfdc-29b3-4269-8921-17776b30341f"><strong data-immersive-translate-walked="ef74dfdc-29b3-4269-8921-17776b30341f">人工干预问题</strong>:</p> <ul data-immersive-translate-walked="ef74dfdc-29b3-4269-8921-17776b30341f"> <li data-immersive-translate-walked="ef74dfdc-29b3-4269-8921-17776b30341f">传统的方法需要用户手动指定对象的位置和形状,比如绘制边界框或涂鸦遮罩,这既费时又需要一定的绘图技能。Diffree通过文本描述自动预测对象的位置和形状,完全不需要用户干预。</li> </ul> </li> <li data-immersive-translate-walked="ef74dfdc-29b3-4269-8921-17776b30341f"> <p data-immersive-translate-walked="ef74dfdc-29b3-4269-8921-17776b30341f"><strong data-immersive-translate-walked="ef74dfdc-29b3-4269-8921-17776b30341f">高质量对象添加</strong>:</p> <ul data-immersive-translate-walked="ef74dfdc-29b3-4269-8921-17776b30341f"> <li data-immersive-translate-walked="ef74dfdc-29b3-4269-8921-17776b30341f">Diffree使用了一个大规模的合成数据集(OABench)进行训练,能够在各种自然场景中添加高质量的对象,保证对象的相关性和图像质量。</li> </ul> </li> </ul> [video width="3640" height="2048" mp4="https://img.xiaohu.ai/2024/07/video_demo.mp4"][/video] <h3 data-immersive-translate-walked="ef74dfdc-29b3-4269-8921-17776b30341f">Diffree的主要功能</h3> <p data-immersive-translate-walked="ef74dfdc-29b3-4269-8921-17776b30341f">1. <strong data-immersive-translate-walked="ef74dfdc-29b3-4269-8921-17776b30341f">文本引导对象添加</strong></p> <p data-immersive-translate-walked="ef74dfdc-29b3-4269-8921-17776b30341f">Diffree允许用户通过简单的文本描述在图像中添加新对象。用户只需提供描述性文本,如“添加一只狗”或“在桌子上放一个花瓶”,Diffree就能自动在图像中找到合适的位置并添加对象。</p> <p data-immersive-translate-walked="ef74dfdc-29b3-4269-8921-17776b30341f">2. <strong data-immersive-translate-walked="ef74dfdc-29b3-4269-8921-17776b30341f">背景一致性维护</strong></p> <p data-immersive-translate-walked="ef74dfdc-29b3-4269-8921-17776b30341f">Diffree的一个重要功能是能够保持图像背景的一致性。在添加新对象时,Diffree会确保新对象与原始图像的背景无缝融合,从而避免背景的破坏或不自然的视觉效果。</p> <p data-immersive-translate-walked="ef74dfdc-29b3-4269-8921-17776b30341f"><img class="aligncenter size-full wp-image-11977" src="https://img.xiaohu.ai/2024/07/Jietu20240728-133140@2x.jpg" alt="" width="2366" height="740" />3. <strong data-immersive-translate-walked="ef74dfdc-29b3-4269-8921-17776b30341f">自动位置和形状预测</strong></p> <p data-immersive-translate-walked="ef74dfdc-29b3-4269-8921-17776b30341f">Diffree内置了一个对象遮罩预测模块,能够自动预测新对象的最佳位置和形状。这个功能免除了用户手动绘制遮罩或指定边界框的麻烦,使得对象添加过程更加简便和高效。</p> <p data-immersive-translate-walked="ef74dfdc-29b3-4269-8921-17776b30341f">4. <strong data-immersive-translate-walked="ef74dfdc-29b3-4269-8921-17776b30341f">高质量对象添加</strong></p> <p data-immersive-translate-walked="ef74dfdc-29b3-4269-8921-17776b30341f">通过训练在一个名为OABench的大规模合成数据集上,Diffree能够在各种自然场景中添加高质量的对象。这保证了新对象在图像中的相关性、合理性和视觉质量。</p> <p data-immersive-translate-walked="ef74dfdc-29b3-4269-8921-17776b30341f">5. <strong data-immersive-translate-walked="ef74dfdc-29b3-4269-8921-17776b30341f">多对象迭代添加</strong></p> <p data-immersive-translate-walked="ef74dfdc-29b3-4269-8921-17776b30341f">Diffree还支持多对象的迭代添加,即在一张图像中逐步添加多个对象,每个新对象的添加都会与之前的对象和背景保持一致。这对于需要复杂图像编辑的场景,如室内设计或广告创作,非常有用。</p> <p data-immersive-translate-walked="ef74dfdc-29b3-4269-8921-17776b30341f">例如</p> <ul> <li data-immersive-translate-walked="ef74dfdc-29b3-4269-8921-17776b30341f">用户输入:“在草地上添加一个小孩,再添加一个风筝。”</li> <li data-immersive-translate-walked="ef74dfdc-29b3-4269-8921-17776b30341f">效果:Diffree先在草地上添加一个小孩,然后在天空中添加一个风筝,两个对象与背景和彼此都能自然融合。</li> </ul> <strong data-immersive-translate-walked="ef74dfdc-29b3-4269-8921-17776b30341f"><img class="aligncenter size-full wp-image-11976" src="https://img.xiaohu.ai/2024/07/Jietu20240728-133159@2x.jpg" alt="" width="2334" height="1266" />5.与其他方法的结合</strong>:Diffree可以与现有的其他图像修补方法结合使用,进一步提升图像编辑的效果。例如,可以与AnyDoor结合,实现特定对象的添加。 <h3 data-immersive-translate-walked="ef74dfdc-29b3-4269-8921-17776b30341f"><img class="aligncenter size-full wp-image-11980" src="https://img.xiaohu.ai/2024/07/Jietu20240728-134004@2x.jpg" alt="" width="2298" height="644" />Diffree的技术方法</h3> <p data-immersive-translate-walked="ef74dfdc-29b3-4269-8921-17776b30341f">1. <strong data-immersive-translate-walked="ef74dfdc-29b3-4269-8921-17776b30341f">扩散模型(Diffusion Model)</strong></p> <p data-immersive-translate-walked="ef74dfdc-29b3-4269-8921-17776b30341f">Diffree采用了一种扩散模型(Diffusion Model),这是最近在图像生成和编辑领域表现出色的一种技术。扩散模型通过逐步添加和去除噪声来生成图像,在生成过程中能够保持高质量和一致性。</p> <img class="aligncenter size-full wp-image-11974" src="https://img.xiaohu.ai/2024/07/Jietu20240728-133221@2x.jpg" alt="" width="2322" height="1262" /> <p data-immersive-translate-walked="ef74dfdc-29b3-4269-8921-17776b30341f">2. <strong data-immersive-translate-walked="ef74dfdc-29b3-4269-8921-17776b30341f">对象遮罩预测模块(Object Mask Predictor Module)</strong></p> <p data-immersive-translate-walked="ef74dfdc-29b3-4269-8921-17776b30341f">Diffree的关键创新之一是引入了对象遮罩预测模块。这个模块可以预测新对象在图像中的最佳位置和形状,具体步骤如下:</p> <ul data-immersive-translate-walked="ef74dfdc-29b3-4269-8921-17776b30341f"> <li data-immersive-translate-walked="ef74dfdc-29b3-4269-8921-17776b30341f"><strong data-immersive-translate-walked="ef74dfdc-29b3-4269-8921-17776b30341f">噪声预测</strong>:在扩散模型的逆过程(去噪过程)中,预测添加对象区域的噪声分布。</li> <li data-immersive-translate-walked="ef74dfdc-29b3-4269-8921-17776b30341f"><strong data-immersive-translate-walked="ef74dfdc-29b3-4269-8921-17776b30341f">遮罩生成</strong>:根据预测的噪声分布生成对象遮罩,用于定位新对象的位置和形状。</li> <li data-immersive-translate-walked="ef74dfdc-29b3-4269-8921-17776b30341f"><strong data-immersive-translate-walked="ef74dfdc-29b3-4269-8921-17776b30341f">遮罩优化</strong>:在生成的初始步骤中就能预测遮罩,从而在每一步去噪过程中进行优化,保证对象和背景的一致性。</li> </ul> <p data-immersive-translate-walked="ef74dfdc-29b3-4269-8921-17776b30341f"><img class="aligncenter size-full wp-image-11973" src="https://img.xiaohu.ai/2024/07/Jietu20240728-133231@2x.jpg" alt="" width="2352" height="674" />3. <strong data-immersive-translate-walked="ef74dfdc-29b3-4269-8921-17776b30341f">OABench数据集</strong></p> <p data-immersive-translate-walked="ef74dfdc-29b3-4269-8921-17776b30341f">为了训练和评估Diffree,研究团队创建了一个名为OABench的数据集。该数据集包含了74K个三元组,每个三元组包括:</p> <ul data-immersive-translate-walked="ef74dfdc-29b3-4269-8921-17776b30341f"> <li data-immersive-translate-walked="ef74dfdc-29b3-4269-8921-17776b30341f"><strong data-immersive-translate-walked="ef74dfdc-29b3-4269-8921-17776b30341f">原始图像</strong>:包含待添加对象的完整图像。</li> <li data-immersive-translate-walked="ef74dfdc-29b3-4269-8921-17776b30341f"><strong data-immersive-translate-walked="ef74dfdc-29b3-4269-8921-17776b30341f">修补图像</strong>:去除了对象后的修补图像,用于训练模型保持背景一致性。</li> <li data-immersive-translate-walked="ef74dfdc-29b3-4269-8921-17776b30341f"><strong data-immersive-translate-walked="ef74dfdc-29b3-4269-8921-17776b30341f">对象描述和遮罩</strong>:描述对象的文本和对应的对象遮罩,用于训练模型预测对象位置和形状。</li> </ul> <p data-immersive-translate-walked="ef74dfdc-29b3-4269-8921-17776b30341f"><img class="aligncenter size-full wp-image-11975" src="https://img.xiaohu.ai/2024/07/Jietu20240728-133211@2x.jpg" alt="" width="2358" height="936" />4. <strong data-immersive-translate-walked="ef74dfdc-29b3-4269-8921-17776b30341f">模型训练</strong></p> <p data-immersive-translate-walked="ef74dfdc-29b3-4269-8921-17776b30341f">Diffree的训练过程包括以下几个步骤:</p> <ul data-immersive-translate-walked="ef74dfdc-29b3-4269-8921-17776b30341f"> <li data-immersive-translate-walked="ef74dfdc-29b3-4269-8921-17776b30341f"><strong data-immersive-translate-walked="ef74dfdc-29b3-4269-8921-17776b30341f">数据预处理</strong>:通过图像修补技术生成修补图像,确保背景一致性。</li> <li data-immersive-translate-walked="ef74dfdc-29b3-4269-8921-17776b30341f"><strong data-immersive-translate-walked="ef74dfdc-29b3-4269-8921-17776b30341f">模型初始化</strong>:使用预训练的Stable Diffusion模型作为初始模型,并结合对象遮罩预测模块进行联合训练。</li> <li data-immersive-translate-walked="ef74dfdc-29b3-4269-8921-17776b30341f"><strong data-immersive-translate-walked="ef74dfdc-29b3-4269-8921-17776b30341f">损失函数优化</strong>:结合扩散模型的损失函数和遮罩预测模块的损失函数进行联合优化</li> </ul> <h3 data-immersive-translate-walked="ef74dfdc-29b3-4269-8921-17776b30341f">实验结果结论</h3> <ol data-immersive-translate-walked="ef74dfdc-29b3-4269-8921-17776b30341f"> <li data-immersive-translate-walked="ef74dfdc-29b3-4269-8921-17776b30341f"> <p data-immersive-translate-walked="ef74dfdc-29b3-4269-8921-17776b30341f"><strong data-immersive-translate-walked="ef74dfdc-29b3-4269-8921-17776b30341f">成功率</strong>:Diffree在COCO和OpenImages数据集上的成功率分别达到了98.5%和98.0%,远高于现有方法的成功率。<img class="aligncenter size-full wp-image-11971" src="https://img.xiaohu.ai/2024/07/Jietu20240728-133249@2x.jpg" alt="" width="2014" height="960" /></p> </li> <li data-immersive-translate-walked="ef74dfdc-29b3-4269-8921-17776b30341f"> <p data-immersive-translate-walked="ef74dfdc-29b3-4269-8921-17776b30341f"><strong data-immersive-translate-walked="ef74dfdc-29b3-4269-8921-17776b30341f">背景一致性</strong>:Diffree在背景一致性上的表现显著优于现有方法,LPIPS得分更低,保持了图像背景的一致性。</p> </li> <li data-immersive-translate-walked="ef74dfdc-29b3-4269-8921-17776b30341f"> <p data-immersive-translate-walked="ef74dfdc-29b3-4269-8921-17776b30341f"><strong data-immersive-translate-walked="ef74dfdc-29b3-4269-8921-17776b30341f">位置合理性</strong>:Diffree在对象位置合理性评估中得分较高,表现优于现有方法,确保了对象的位置合理性。<img class="aligncenter size-full wp-image-11972" src="https://img.xiaohu.ai/2024/07/Jietu20240728-133239@2x.jpg" alt="" width="2354" height="648" /></p> </li> <li data-immersive-translate-walked="ef74dfdc-29b3-4269-8921-17776b30341f"> <p data-immersive-translate-walked="ef74dfdc-29b3-4269-8921-17776b30341f"><strong data-immersive-translate-walked="ef74dfdc-29b3-4269-8921-17776b30341f">对象相关性和质量</strong>:Diffree在对象描述相关性和生成质量上的得分较高,Local CLIP Score和Local FID指标优于现有方法。</p> </li> <li data-immersive-translate-walked="ef74dfdc-29b3-4269-8921-17776b30341f"> <p data-immersive-translate-walked="ef74dfdc-29b3-4269-8921-17776b30341f"><strong data-immersive-translate-walked="ef74dfdc-29b3-4269-8921-17776b30341f">统一指标</strong>:综合成功率和多个定量指标,Diffree的统一指标得分显著高于现有方法,展示了全面的优越性。</p> </li> </ol> 项目地址:<a href="https://opengvlab.github.io/Diffree/" target="_blank" rel="noopener">https://opengvlab.github.io/Diffree/</a> GitHub:<a href="https://github.com/OpenGVLab/Diffree" target="_blank" rel="noopener">https://github.com/OpenGVLab/Diffree</a> 论文:<a href="https://arxiv.org/pdf/2407.16982" target="_blank" rel="noopener">https://arxiv.org/pdf/2407.16982</a> 在线演示:<a href="https://huggingface.co/spaces/LiruiZhao/Diffree" target="_blank" rel="noopener">https://huggingface.co/spaces/LiruiZhao/Diffree</a>