Omost：自动帮你完善提示词并生成符合你预期的图像

by 小互
6月 ago

Omost 是一个使用大语言模型（LLM）生成图像的项目。它通过虚拟画布代理来合成图像，旨在将模型的代码能力转化为图像生成能力。

具体来说，它通过使用语言模型编写代码，描述图像的各个部分及其布局，然后利用这些描述生成实际的图像。

Omost 使用经过训练的大型语言模型（如 Llama3 和 Phi3）来生成描述图像的代码。这些模型可以理解并生成自然语言描述和编程代码。
模型生成的代码不是直接的图像，而是对图像内容的详细描述。这些描述包括图像中的各个元素（如人物、物体、背景等）的特征、位置和样式。
Omost 的模型通过生成图像描述代码来实现图像合成。这些描述代码包括全局描述和局部描述：
- 全局描述：用于描述整个图像的整体场景和氛围。
- 局部描述：用于描述图像中各个部分的细节和位置。
例如，生成一幅勇士与龙战斗的图像时，模型会生成描述勇士、龙、战场和背景天空等部分的代码。这些代码包含了各部分的位置、样式、细节描述和相对深度等信息。
Omost 提供一个虚拟画布代理，用于接收和理解这些描述代码。画布代理根据代码将图像的各个部分组合起来，生成一个完整的图像描述。
虚拟画布代理将这些描述传递给具体的图像生成器。图像生成器根据描述生成最终的图像。生成器可以是任何能够理解这些描述并生成图像的工具或模型。

通俗解释Omost 是怎么工作的？

文字描述：你首先用文字描述你想要的图像内容。例如，你可能会描述：“一群勇士在与一条龙战斗，背景是闪电和暴风雨的天空。”
生成代码： Omost 利用大语言模型会把你的文字描述转换成一种“代码”。这个代码不是普通的计算机代码，而是一种详细的描述，告诉电脑如何在画布上绘制出你想要的图像。
虚拟画布：这个代码会传递给一个虚拟画布。虚拟画布就像是一个画图的软件，它会根据代码里的描述来安排图像的各个部分。
图像生成器：最后，虚拟画布把这些安排传递给一个图像生成器，图像生成器会根据这些安排绘制出最终的图像。

举个例子

假设你想要一幅“勇士与龙战斗”的图像：

你告诉 Omost：“我想要一幅勇士与龙战斗的图像，背景是闪电和暴风雨的天空。”
Omost 的模型会生成代码，详细描述勇士的位置、龙的样子、背景的天空等等。
虚拟画布根据这些描述安排图像的各个部分。
图像生成器根据这些安排绘制出一幅完整的图像。

场景：生成勇士与龙战斗的图像

用户和助手之间的对话示例展示了如何一步一步生成一个复杂场景的图像。具体步骤如下：

1、用户请求生成图像：用户请求生成一个“勇士与龙战斗”的图像。

用户：
生成一个勇士与龙激烈战斗的图像

2、助手生成全局描述：助手首先初始化一个画布，并设置整个图像的全局描述，描述了场景的总体情况和氛围。

# 初始化画布
canvas = Canvas()
# 设置全局描述
canvas.set_global_description(
description=’A fierce battle between warriors and a dragon.’,
detailed_descriptions=[
‘In this intense scene, a group of fierce warriors is engaged in an epic battle with a mighty dragon.’,
‘The warriors, clad in armor and wielding swords and shields, are positioned on the left side of the image.’,
‘The dragon, with its massive wings spread wide and its fiery breath illuminating the scene, dominates the center of the image.’,
‘The background is a dark, stormy sky with flashes of lightning, adding to the drama and tension of the battle.’
],
tags=’battle, warriors, dragon, fierce, armor, swords, shields, stormy sky, lightning, intense’,
HTML_web_color_name=’darkslategray’
)

3、助手生成局部描述：助手接下来为图像中的各个部分添加详细描述。例如，描述勇士和龙的具体细节和位置。

# 添加勇士的描述
canvas.add_local_description(
location=’on the left’,
offset=’no offset’,
area=’a large horizontal area’,
distance_to_viewer=5.0,
description=’A group of fierce warriors.’,
detailed_descriptions=[
‘The warriors, clad in gleaming armor, are positioned on the left side of the image.’,
‘They are armed with swords, shields, and spears, ready for battle.’,
‘Their faces are set with determination and focus, reflecting their resolve to defeat the dragon.’
],
tags=’warriors, armor, swords, shields, determined, focused, battle’,
atmosphere=’Determined and focused, ready for the fierce battle.’,
style=’Highly detailed and dynamic, capturing the intensity of the warriors.’,
quality_meta=’High resolution with intricate details and dynamic poses.’,
HTML_web_color_name=’darkgoldenrod’
)

# 添加龙的描述
canvas.add_local_description(
location=’in the center’,
offset=’no offset’,
area=’a large square area’,
distance_to_viewer=7.0,
description=’A mighty dragon.’,
detailed_descriptions=[
‘The dragon is a massive creature, dominating the center of the image with its wide-spread wings and fiery breath.’,
‘Its scales glisten with a metallic sheen, reflecting the light from its fiery breath.’,
“The dragon’s eyes burn with a fierce intensity, and its teeth are sharp and menacing.”,
‘The wings of the dragon are powerful and spread wide, casting shadows over the battlefield.’,
‘The dragon’s body is muscular and covered in protective scales, with a long, sinewy tail that adds to its formidable appearance.’,
‘The fiery breath of the dragon illuminates the scene, casting a reddish glow over the warriors and the battlefield.’
],
tags=’dragon, massive, wings, fiery breath, glistening scales, metallic sheen, fierce eyes, sharp teeth, powerful wings, shadows, battlefield, muscular body, protective scales, sinewy tail, formidable, reddish glow, illumination’,
atmosphere=’Intense and menacing, with a powerful presence.’,
style=’Epic and dramatic, emphasizing the grandeur and danger of the dragon.’,
quality_meta=’High resolution with dramatic lighting and detailed textures.’,
HTML_web_color_name=’firebrick’
)

4、背景描述：助手为图像的背景添加描述，如风暴天空和散落的战场。

# 添加风暴天空的描述
canvas.add_local_description(
location=’on the top’,
offset=’no offset’,
area=’a large horizontal area’,
distance_to_viewer=10.0,
description=’A stormy sky with flashes of lightning.’,
detailed_descriptions=[
‘The background of the image is a dark, stormy sky filled with swirling clouds and flashes of lightning.’,
‘The sky is turbulent, with clouds dark and foreboding, adding to the dramatic tension of the battle.’,
‘The lightning flashes illuminate the scene, casting sharp, brief lights over the warriors and the dragon.’,
‘The stormy sky creates a sense of chaos and unpredictability, heightening the intensity of the battle below.’,
‘The overall atmosphere is one of impending doom and relentless conflict, with the storm mirroring the fierce struggle between the warriors and the dragon.’
],
tags=’stormy sky, dark clouds, lightning, turbulent, foreboding, dramatic tension, illumination, chaos, unpredictability, intensity, impending doom, relentless conflict, battle, warriors, dragon, swirling clouds, sharp lights, brief lights’,
atmosphere=’Chaotic and intense, mirroring the fierce battle below.’,
style=’Dramatic and turbulent, emphasizing the conflict and tension.’,
quality_meta=’High resolution with dynamic lighting and detailed cloud textures.’,
HTML_web_color_name=’midnightblue’
)

# 添加战场描述
canvas.add_local_description(
location=’on the bottom’,
offset=’no offset’,
area=’a large horizontal area’,
distance_to_viewer=5.0,
description=’A debris-covered battlefield.’,
detailed_descriptions=[
‘The ground of the battlefield is littered with debris, remnants of previous battles.’,
‘Broken weapons, shattered shields, and scattered armor pieces are strewn across the battlefield.’,
‘The terrain is uneven, with patches of mud and dirt, adding to the sense of a long and brutal conflict.’,
‘The debris-covered battlefield enhances the chaotic and intense atmosphere of the scene, reflecting the ferocity and duration of the battle.’,
‘The overall appearance is one of destruction and turmoil, with the remnants of previous battles serving as a grim reminder of the ongoing struggle.’
],
tags=’battlefield, debris, broken weapons, shattered shields, scattered armor, uneven terrain, mud, dirt, brutal conflict, chaos, intensity, destruction, turmoil, previous battles, ongoing struggle, remnants, ferocity, duration’,
atmosphere=’Chaotic and intense, reflecting the ferocity of the battle.’,
style=’Detailed and realistic, emphasizing the destruction and chaos.’,
quality_meta=’High resolution with detailed textures and realistic debris.’,
HTML_web_color_name=’darkolivegreen’
)

5、对话编辑：用户可以继续与助手对话，进行进一步的图像编辑。例如，用户要求将龙改为恐龙。

用户：
将龙改为恐龙

助手会更新图像描述：

# 修改全局描述中的龙为恐龙
canvas.set_global_description(
description=’A fierce battle between warriors and a dinosaur.’,
detailed_descriptions=[
‘In this intense scene, a group of fierce warriors is engaged in an epic battle with a mighty dinosaur.’,
‘The warriors, clad in armor and wielding swords and shields, are positioned on the left side of the image.’,
‘Their expressions are determined and focused, reflecting their resolve to defeat the dinosaur.’,
‘The dinosaur, with its massive body and powerful legs, dominates the center of the image.’,
‘Its scales glisten with a metallic sheen, and its eyes burn with a fierce intensity.’,
‘The background is a dark, stormy sky with flashes of lightning, adding to the drama and tension of the battle.’,
‘The ground is littered with debris and the remnants of previous battles, enhancing the sense of a long and brutal conflict.’,
‘The overall atmosphere is one of chaos and intensity, with the warriors and the dinosaur locked in a fierce struggle for survival.’
],
tags=’battle, warriors, dinosaur, fierce, armor, swords, shields, stormy sky, lightning, intense’,
HTML_web_color_name=’darkslategray’
)

通过这种对话编辑方式，用户可以逐步调整和完善他们想要生成的图像，最终得到一个符合他们期望的图像。

以上展示了如何通过对话与助手互动生成复杂图像。用户可以提出具体的图像请求，助手通过生成详细的描述代码，并在虚拟画布上逐步构建图像。用户还可以在对话中继续调整图像细节，如改变图像中的元素（例如将龙改为恐龙），以实现他们的最终图像目标。这个过程展示了 Omost 如何利用大型语言模型的代码生成能力，通过详细描述和虚拟画布，最终生成高质量的图像。

Omost 工作原理详解

Omost 项目通过结合大型语言模型（LLM）与图像生成技术，创造出一种能够自动生成和编排图像的系统。以下是Omost的工作原理和关键组件的详细介绍：

核心组件

虚拟画布（Canvas）
- 功能：虚拟画布是Omost系统的核心，它负责图像元素的布局和编排。用户可以在画布上添加描述不同图像元素的局部描述。
- 方法：
  - set_global_description(description, detailed_descriptions, tags, HTML_web_color_name): 设置整个图像的全局描述。
  - add_local_description(location, offset, area, distance_to_viewer, description, detailed_descriptions, tags, atmosphere, style, quality_meta, HTML_web_color_name): 添加图像局部描述。
预训练模型
- Omost 提供了三种基于Llama3和Phi3变体的预训练模型，这些模型通过不同的数据集和技术进行训练，以优化其图像生成能力。
- 训练数据：
  - Open-Images 数据集的真实标注数据
  - 自动标注的图像数据
  - DPO（直接偏好优化）数据，特别是针对Python 3.10的代码可编译性
  - OpenAI GPT4o的多模态能力的少量调整数据

工作流程

图像描述生成
- 用户输入一个图像描述请求，LLM根据该请求生成相应的图像描述和详细描述。这些描述包括图像的全局描述和局部描述。
画布初始化
- 使用虚拟画布类初始化画布并设置全局描述和局部描述。每个描述都严格控制在75个标记以下，确保描述的独立性和编码的准确性。
位置和区域定义
- 系统将画布划分为9个位置，每个位置再细分为9个偏移位置，总共81个位置。然后，每个位置再进一步划分为9种区域，共729个不同的区域，用于精确定义图像元素的位置。
  首先，将画布划分为 3*3=9 个位置：
  
  然后，将每个位置进一步划分为 3*3 个偏移量，得出 9*9=81 个位置：
  以这些位置为中心，进一步定义了 9 种边界框：
  
  这种方法允许使用 9*9*9=729 种不同的边界框，几乎涵盖了图像中物体可能出现的所有常见位置。
深度和颜色设置
- 使用相对深度参数（distance_to_viewer）和HTML颜色名称（HTML_web_color_name）来设置图像元素的层次和颜色。相对深度用于排序图像元素的前后关系，而颜色名称用于初步绘制图像的草图。
编码和渲染
- 将所有描述编码为文本嵌入，确保没有语义截断错误。然后使用这些编码生成图像初步草图，并通过扩散模型（Diffusion Models）进行最终图像的渲染。

示例代码

# 初始化画布
canvas = Canvas()

# 设置全局描述
canvas.set_global_description(
description=’一场战士与恐龙的激烈战斗。’,
detailed_descriptions=[
‘在这幅紧张的画面中，一群勇猛的战士正在与一只强大的恐龙进行史诗般的战斗。’,
‘战士们身披盔甲，手持剑和盾，站在图像的左侧。’,
‘他们的表情坚定而专注，反映出他们战胜恐龙的决心。’,
‘恐龙位于图像的中央，巨大的身躯和强壮的双腿占据了画面的中心。’,
‘恐龙的鳞片在光线下闪闪发光，眼睛里燃烧着强烈的火焰。’,
‘背景是一片黑暗的风暴天空，闪电不时划破长空，为战斗增添了戏剧性和紧张感。’,
‘地面上布满了残骸和战斗的痕迹，增强了长时间激烈战斗的感觉。’,
‘整体氛围充满了混乱和紧张，战士们和恐龙在生死搏斗中拼尽全力。’,
],
tags=’战斗, 战士, 恐龙, 激烈, 盔甲, 剑, 盾, 坚定, 专注, 史诗, 紧张, 金属, 闪光, 风暴天空, 闪电, 残骸, 混乱, 生存, 混乱, 紧张, 剧烈, 巨大, 鳞片, 眼睛, 燃烧, 照亮’,
HTML_web_color_name=’darkslategray’
)

# 添加局部描述
canvas.add_local_description(
location=’在左侧’,
offset=’无偏移’,
area=’一个大面积的水平区域’,
distance_to_viewer=5.0,
description=’一群勇猛的战士。’,
detailed_descriptions=[
‘战士们身披盔甲，站在图像的左侧。’,
‘他们手持剑、盾和矛，准备迎战。’,
‘他们的表情坚定而专注，反映出他们战胜恐龙的决心。’,
‘有些战士正在挥动他们的武器，而其他战士则站立准备进攻。’,
‘他们的盔甲设计精细，上面有象征他们地位和勇气的图案和符号。’,
‘他们脚下的地面不平，上面布满了残骸，增强了战斗的混乱感。’,
],
tags=’战士, 盔甲, 剑, 盾, 矛, 坚定, 专注, 行动中, 精细设计, 符号, 图案, 地位, 勇气, 不平地面, 残骸, 混乱, 激烈, 战斗’,
atmosphere=’坚定而专注，准备迎战。’,
style=’高度详细和动态，捕捉战士们的激烈战斗。’,
quality_meta=’高分辨率，细节丰富，动态姿态。’,
HTML_web_color_name=’darkgoldenrod’
)