--- library_name: transformers license: bsd-3-clause tags: - StableDiffusion - RaspberryPi5 - M.2 AI Card language: - en base_model: - latent-consistency/lcm-lora-sdv1-5 - Lykon/dreamshaper-7 --- # SD1.5-LCM.Axera 基于 StableDiffusion 1.5 LCM 项目,展示该项目 **文生图**、**图生图** 在基于 AX650N 的产品上部署的流程。 支持芯片: - AX650N 支持硬件 - [M4N-Dock(爱芯派Pro)](https://wiki.sipeed.com/hardware/zh/maixIV/m4ndock/m4ndock.html) - [M.2 Accelerator card](https://axcl-docs.readthedocs.io/zh-cn/latest/doc_guide_hardware.html) 原始模型请参考 - [Latent Consistency Model (LCM) LoRA: SDv1-5](https://huggingface.co/latent-consistency/lcm-lora-sdv1-5) - [Dreamshaper 7](https://huggingface.co/Lykon/dreamshaper-7) ## 性能对比 - Raspberry Pi5 使用基于 AX650N 的算力模组拓展,SD1.5 的推理性能可以提升 40 倍! - 输入图片大小 512x512 | Models | Raspberry Pi5 Only CPU | Intel i7-13700 | Raspberry Pi5 + M.2 Card | | --------------------- | ---------------------- | -------------- | ------------------------ | | UNet(1 step) | 14 s | 1.7 s | 0.43 s | | VAE Encoder | 25 s | 1.7 s | 0.46 s | | VAE Decoder | 58 s | 3.8 s | 0.91 s | | Total txt2img, 4 steps | 120 s | 10.6 s | 2.68 s | | Total img2img, 2 steps | 113 s | 8.9 s | 2.25 s | ## 模型转换 - 感兴趣的朋友可以尝试下自己导出 - 请参考[模型转换文档](https://github.com/BUG1989/sd1.5-lcm.axera/tree/main/model_convert) ## 运行 - 将编译好的 `unet.axmodel`, `vae_encoder.axmodel`, `vae_decoder` 模型拷贝到 `./models` 路径下 - 将 `Dreamshaper 7` 仓库中的 `text_encoder` 文件夹拷贝到 `./models` 路径下 - 在 Huggingface 上的项目已经在 `./models` 存放了 DEMO 展示的必要模型 ### 环境准备 - 系统内存:大于 5GiB - python 版本:大于等于 3.10,更高版本没有验证过,建议使用 Python 虚拟环境进行隔离,例如 `miniconda` - NPU Python API:[pyaxengine](https://github.com/AXERA-TECH/pyaxengine) ``` pip install -r requirements.txt ``` ### 文生图 - 运行 `run_txt2img_axe_infer.py` **Input Prompt** ``` Self-portrait oil painting, a beautiful cyborg with golden hair, 8k ``` **Output** ``` (sd1_5) axera@raspberrypi:~/samples/sd1.5-lcm.axera $ python run_txt2img_axe_infer.py [INFO] Available providers: ['AXCLRTExecutionProvider'] prompt: Self-portrait oil painting, a beautiful cyborg with golden hair, 8k text_tokenizer: ./models/tokenizer text_encoder: ./models/text_encoder unet_model: ./models/unet.axmodel vae_decoder_model: ./models/vae_decoder.axmodel time_input: ./models/time_input_txt2img.npy save_dir: ./txt2img_output_axe.png text encoder take 2891.1ms [INFO] Using provider: AXCLRTExecutionProvider [INFO] SOC Name: AX650N [INFO] VNPU type: VNPUType.DISABLED [INFO] Compiler version: 3.3 972f38ca [INFO] Using provider: AXCLRTExecutionProvider [INFO] SOC Name: AX650N [INFO] VNPU type: VNPUType.DISABLED [INFO] Compiler version: 3.3 972f38ca load models take 26628.9ms unet once take 437.5ms unet once take 433.4ms unet once take 433.6ms unet once take 433.6ms unet loop take 1741.2ms vae inference take 914.8ms save image take 210.5ms (sd1_5) axera@raspberrypi:~/samples/sd1.5-lcm.axera $ ``` **Output Image** ![](./asserts/txt2img_output_axe.png) ### 图生图 - 运行 `run_txt2img_axe_infer.py` **Input Prompt** ``` Astronauts in a jungle, cold color palette, muted colors, detailed, 8k ``` **Input Image** ![](./asserts/img2img-init.png) **Output** ``` (sd1_5) axera@raspberrypi:~/samples/sd1.5-lcm.axera $ python run_img2img_axe_infer.py [INFO] Available providers: ['AXCLRTExecutionProvider'] prompt: Astronauts in a jungle, cold color palette, muted colors, detailed, 8k text_tokenizer: ./models/tokenizer text_encoder: ./models/text_encoder unet_model: ./models/unet.axmodel vae_encoder_model: ./models/vae_encoder.axmodel vae_decoder_model: ./models/vae_decoder.axmodel init image: ./models/img2img-init.png time_input: ./models/time_input_img2img.npy save_dir: ./img2img_output_axe.png text encoder take 4494.8ms [INFO] Using provider: AXCLRTExecutionProvider [INFO] SOC Name: AX650N [INFO] VNPU type: VNPUType.DISABLED [INFO] Compiler version: 3.3-dirty 2ecead35-dirty [INFO] Using provider: AXCLRTExecutionProvider [INFO] SOC Name: AX650N [INFO] VNPU type: VNPUType.DISABLED [INFO] Compiler version: 3.3 972f38ca [INFO] Using provider: AXCLRTExecutionProvider [INFO] SOC Name: AX650N [INFO] VNPU type: VNPUType.DISABLED [INFO] Compiler version: 3.3 972f38ca load models take 27331.3ms vae encoder inference take 460.4ms unet once take 433.7ms unet once take 433.5ms unet loop take 871.7ms vae decoder inference take 914.5ms grid image saved in ./lcm_lora_sdv1-5_imgGrid_output.png save image take 427.5ms (sd1_5) axera@raspberrypi:~/samples/sd1.5-lcm.axera $ ``` **Output Image** ![](./asserts/lcm_lora_sdv1-5_imgGrid_output.png) ## 相关项目 NPU 工具链 [Pulsar2 在线文档](https://pulsar2-docs.readthedocs.io/zh-cn/latest/) ## 技术讨论 Github issues QQ 群: 139953715 ## 免责声明 - 本项目只用于指导如何将 [Latent Consistency Model (LCM) LoRA: SDv1-5](https://huggingface.co/latent-consistency/lcm-lora-sdv1-5) 开源项目的模型部署在基于 AX650N 的相关产品上 - 该模型存在的固有的局限性,可能产生错误的、有害的、冒犯性的或其他不良的输出等内容与 AX650N 以及本仓库所有者无关 - [免责声明](./Disclaimer.md)