Diffusers

Diffusers 是huggingface开源的应用扩散模型的python库，用于生成图像、音频甚至分子的3D结构。Diffusers提供简单的推理解决方案以及训练自己的扩散模型，使用起来具有较好的可用性，同时非常的简单，支持多种diffusion模型以及算法，只需几行代码就可以实现diffusion的效果，同时对于civitai等开源模型可以无缝对接。

网站：https://huggingface.co/docs/diffusers/v0.19.3/en/index

Github：https://github.com/huggingface/diffusers

零、安装

这个自行查看：https://huggingface.co/docs/diffusers/v0.19.3/en/installation

一、如何简单推理diffusion model

一个简单的DDPM的例子

from diffusers import DDPMPipeline
import torch

ddmp = DDPMPipeline.from_pretrained("google/ddpm-cat-256").to("cuda")
img = ddmp(num_inference_steps=50).images[0]
img.save("/app/images/ddmp.png")

一个简单的stable diffusion的例子

import torch
from diffusers import StableDiffusionPipeline, EulerDiscreteScheduler


pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16, safety_checker=None)
pipe.scheduler = EulerDiscreteScheduler.from_config(pipe.scheduler.config)

pipe = pipe.to("cuda")



prompt = '1girl,small breast,curly hair,(in the dark:1.2),deep shadow,spreading legs,hairy pussy,water,<lora:creampieHairyPussy_creampieV11:0.5:BODY>,'
negative_prompt = '(worst quality:2),(low quality:2),(normal quality:2),lowres,watermark,badhandv4,ng_deepnegative_v1_75t,'

with torch.no_grad():
    for i in range(10):
        image = pipe(prompt=prompt,
                        negative_prompt=negative_prompt,
                        height=640, 
                        width=360,
                        num_inference_steps=30,
                        guidance_scale=8).images[0]

        image.save("./{}_{}.png".format(prompt[:10], i))
## call的参数列表：https://huggingface.co/docs/diffusers/v0.19.3/en/api/pipelines/stable_diffusion/text2img#diffusers.StableDiffusionPipeline.__call__

二、如何加载civitai的模型？

如果是civitai提供的checkpoint，可以使用diffusers提供的scripts下面的convert_original_stable_diffusion_to_diffusers.py转换一下，就可以直接用上面的demo加载了。

三、如何加载civitai的lora模型？

可以使用diffusers提供的scripts下面的convert_lora_safetensor_to_diffusers.py将下载下来的准换，就可以利用 pipe.unet.load_attn_procs(lora_path)，进行加载了。
可以直接使用 pipe.load_lora_weights(lora_path)，进行加载。推荐

## lora模型下载地址 wget https://civitai.com/api/download/models/15603 -O light_and_shadow.safetensors


import torch
from diffusers import StableDiffusionPipeline

model_path = 'gsdf/Counterfeit-V2.5'
lora_path = "light_and_shadow.safetensors"
pipe = StableDiffusionPipeline.from_pretrained(model_path, torch_dtype=torch.float16,safety_checker=None)
# pipe.unet.load_attn_procs(model_path) # 这个就对应的是转换后加载
pipe.to("cuda")
pipe.load_lora_weights(lora_path) # 这个对应直接加载，这个跟上面的load_attn_procs的作用是一样的，但是更推荐这个

prompt = "an woman with little liquid on big face"
negative_prompt = '(worst quality:2),(low quality:2),(normal quality:2),lowres,watermark,badhandv4,ng_deepnegative_v1_75t,'
image = pipe(prompt, negative_prompt=negative_prompt, num_inference_steps=30, guidance_scale=7.5, ).images[0]
image.save("liquid.png")

四、如下实现stable diffusion lora的训练

数据集的准备（数据集需要有图像和文本），如下形式：

# 文件夹如下：
folder/train/metadata.csv
folder/train/0001.png
folder/train/0002.png
folder/train/0003.png

## metadata.csv内容如下：
{"file_name": "0001.png", "text": "xxx"}
{"file_name": "0002.png", "text": "xxx"}
{"file_name": "0003.png", "text": "xxx"}


## 提供一个简单的脚本如下：

#coding:utf-8
import os
imagepath = "/app/src/my_dataset/0"
imagelist = os.listdir(imagepath)

captions = []
for img in imagelist:
    oriDict = {}
    oriDict["file_name"] =  img
    oriDict["text"] = "A photograph of an woman with liquid"
    captions.append(oriDict)

print (captions)

import json
# # path to the folder containing the images
root = "/app/src/my_dataset/0/"

# add metadata.jsonl file to this folder
with open(root + "metadata.jsonl", 'w') as f:
    for item in captions:
        f.write(json.dumps(item) + "\n")

准备好数据后，就可以使用example\text_to_image\train_text_to_image_lora.py 脚本进行训练了。
训练好后，会保存lora的weight，然后就可以使用前面加载lora的方法，进行测试了。

备注一些基础知识：

1、训练：

优化器还是adamw

DDPMScheduler，# 这个东西比较重要，这个是加噪声的方式，如下这样

1	noisy_images = noise_scheduler.add_noise(clean_images, noise, timesteps)

训练的主要步骤就是：

取图像
生成noise， torch.randn()
生成迭代次数
加噪声
模型前向预测噪声
预测噪声与噪声做loss
backward

diffusers提供了很多的管道类

from diffusers import DiffusionPipeline 

from diffusers import StableDiffusionPipeline 

from diffusers import StableDiffusionImg2ImgPipeline

2、Safety checker

模型默认会带一个nsfw的检测，所以使用safety_checker=None就不会使用这个checker了

1	stable_diffusion = DiffusionPipeline.from_pretrained(repo_id, safety_checker=None)

3、组件重用

from StableDiffusionPipeline，StableDiffusionImg2ImgPipeline 
model_id = “runwayml/stable-diffusion-v1-5” 
stable_diffusion_txt2img = StableDiffusionPipeline.from_pretrained（model_id）
stable_diffusion_img2img = StableDiffusionImg2ImgPipeline（    
vae = stable_diffusion_txt2img.vae，    
文本_encoder=stable_diffusion_txt2img.text_encoder,     
tokenizer=stable_diffusion_txt2img.tokenizer,     
unet =stable_diffusion_txt2img.unet，    
scheduler=stable_diffusion_txt2img.scheduler，    
safety_checker = None，    
feature_extractor = None，    
requires_safety_checker = False, )