--- license: apache-2.0 base_model: - THUDM/CogView4-6B base_model_relation: quantized tags: - quanto --- ## Quantization settings - `vae.`: `torch.bfloat16`. No quantization. - `text_encoder.layers.`: - Int8 with [Optimum Quanto](https://github.com/huggingface/optimum-quanto) - Target layers:`["q_proj", "k_proj", "v_proj", "o_proj", "mlp.down_proj", "mlp.gate_up_proj"]` - `diffusion_model.`: - Int8 with [Optimum Quanto](https://github.com/huggingface/optimum-quanto) - Target layers: `["to_q", "to_k", "to_v", "to_out.0", "ff.net.0.proj", "ff.net.2"]` ## VRAM cosumption - Text encoder (`text_encoder.`): about 11 GB - Denoiser (`diffusion_model.`): about 10 GB ## Samples |`torch.bfloat16` | Quanto Int8 | | - | - | | | | | VRAM 40GB (without offloading) | VRAM 28GB (without offloading) |
Generation parameters - prompt: `""" A photo of a nendoroid figure of hatsune miku holding a sign that says "CogView4" """"` - negative_prompt: `"blurry, low quality, horror"` - height: `1152` - width: `1152` - cfg_scale: `3.5` - num_inference_steps: `20`