[Bug] VAE not decoding correctly with Vulkan

### Git commit

$ git rev-parse HEAD
3d6064b37ef4607917f8acf2ca8c8906d5087413

### Operating System & Version

Ubuntu 26.04

### GGML backends

Vulkan

### Command-line arguments used

sd-cli -m models/checkpoints/v1-5-pruned-emaonly.safetensors -i test.png --strength 0 -v

### Steps to reproduce

I compiled and tried to use sd-cli and wondered about the broken ouput.
This issue is probably the same as https://github.com/leejet/stable-diffusion.cpp/issues/1455 or at least related.

I created this input image that might help visualizing the error.

<img width="512" height="512" alt="Image" src="https://github.com/user-attachments/assets/5a8473fe-bd18-43c1-8545-70947e3f6b19" />



### What you expected to happen

The output image should be the same as the input image (minus encoding/decoding errors).

### What actually happened

The output image looks like the VAE decoding  failed somehow. The channels seem to be mixed all over the picture, there are repeating patterns. Maybe some modulo or striding error.

Note that the silhouette of the circle (diameter 128px\*128px) appears correctly once in the 512px\*512px image, then 8 times as a 64px\*32px ellipse and then again 64 times as an 32px\*16px ellipse.

<img width="512" height="512" alt="Image" src="https://github.com/user-attachments/assets/1b59c3e8-2267-4186-8a0b-88472993f1ad" />

### Logs / error messages / stack trace

```
sd-cli -m models/checkpoints/v1-5-pruned-emaonly.safetensors -i test.png --strength 0 -v
[DEBUG] main.cpp:550  - version: stable-diffusion.cpp version master-593-3d6064b, commit 3d6064b
[DEBUG] main.cpp:551  - System Info:
    SSE3 = 1 |     AVX = 1 |     AVX2 = 1 |     AVX512 = 1 |     AVX512_VBMI = 1 |     AVX512_VNNI = 1 |     FMA = 1 |     NEON = 0 |     ARM_FMA = 0 |     F16C = 1 |     FP16_VA = 0 |     WASM_SIMD = 0 |     VSX = 0 |
[DEBUG] main.cpp:552  - SDCliParams {
  mode: img_gen,
  output_path: "output.png",
  image_path: "",
  metadata_format: "text",
  verbose: true,
  color: false,
  canny_preprocess: false,
  convert_name: false,
  preview_method: none,
  preview_interval: 1,
  preview_path: "preview.png",
  preview_fps: 16,
  taesd_preview: false,
  preview_noisy: false,
  metadata_raw: false,
  metadata_brief: false,
  metadata_all: false
}
[DEBUG] main.cpp:553  - SDContextParams {
  n_threads: 8,
  model_path: "models/checkpoints/v1-5-pruned-emaonly.safetensors",
  clip_l_path: "",
  clip_g_path: "",
  clip_vision_path: "",
  t5xxl_path: "",
  llm_path: "",
  llm_vision_path: "",
  diffusion_model_path: "",
  high_noise_diffusion_model_path: "",
  vae_path: "",
  taesd_path: "",
  esrgan_path: "",
  control_net_path: "",
  embedding_dir: "",
  embeddings: {
  }
  wtype: NONE,
  tensor_type_rules: "",
  lora_model_dir: ".",
  hires_upscalers_dir: "",
  photo_maker_path: "",
  rng_type: cuda,
  sampler_rng_type: NONE,
  offload_params_to_cpu: false,
  enable_mmap: false,
  control_net_cpu: false,
  clip_on_cpu: false,
  vae_on_cpu: false,
  flash_attn: false,
  diffusion_flash_attn: false,
  diffusion_conv_direct: false,
  vae_conv_direct: false,
  circular: false,
  circular_x: false,
  circular_y: false,
  chroma_use_dit_mask: true,
  qwen_image_zero_cond_t: false,
  chroma_use_t5_mask: false,
  chroma_t5_mask_pad: 1,
  prediction: NONE,
  lora_apply_mode: auto,
  force_sdxl_vae_conv_scale: false
}
[DEBUG] main.cpp:554  - SDGenerationParams {
  loras: "{
  }",
  high_noise_loras: "{
  }",
  prompt: "",
  negative_prompt: "",
  clip_skip: -1,
  width: -1,
  height: -1,
  batch_count: 1,
  init_image_path: "test.png",
  end_image_path: "",
  mask_image_path: "",
  control_image_path: "",
  ref_image_paths: [],
  control_video_path: "",
  auto_resize_ref_image: true,
  increase_ref_index: false,
  pm_id_images_dir: "",
  pm_id_embed_path: "",
  pm_style_strength: 20,
  skip_layers: [7, 8, 9],
  sample_params: (txt_cfg: 7.00, img_cfg: 7.00, distilled_guidance: 3.50, slg.layer_count: 0, slg.layer_start: 0.01, slg.layer_end: 0.20, slg.scale: 0.00, scheduler: NONE, sample_method: NONE, sample_steps: 20, eta: inf, shifted_timestep: 0, flow_shift: inf),
  high_noise_skip_layers: [7, 8, 9],
  high_noise_sample_params: (txt_cfg: 7.00, img_cfg: 7.00, distilled_guidance: 3.50, slg.layer_count: 0, slg.layer_start: 0.01, slg.layer_end: 0.20, slg.scale: 0.00, scheduler: NONE, sample_method: NONE, sample_steps: 20, eta: inf, shifted_timestep: 0, flow_shift: inf),
  custom_sigmas: [],
  cache_mode: "",
  cache_option: "",
  cache: disabled (threshold=inf, start=0.15, end=0.95),
  moe_boundary: 0.875,
  video_frames: 1,
  fps: 16,
  vace_strength: 1,
  strength: 0,
  control_strength: 0.9,
  seed: 42,
  upscale_repeats: 1,
  upscale_tile_size: 128,
  hires: { enabled: false, upscaler: "Latent", model_path: "", scale: 2, target_width: 0, target_height: 0, steps: 0, denoising_strength: 0.7, upscale_tile_size: 128 },
  vae_tiling_params: { 0, 0, 0, 0.5, 0, 0 },
}
[INFO ] common.cpp:1801 - set width x height to 512 x 512
[DEBUG] ggml_extend.hpp:58   - ggml_vulkan: Found 1 Vulkan devices:
[DEBUG] ggml_extend.hpp:58   - ggml_vulkan: 0 = AMD Radeon RX 580 2048SP (RADV POLARIS10) (radv) | uma: 0 | fp16: 0 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 0 | matrix cores: none
[DEBUG] util.cpp:713  - Found 2 backend devices:
[DEBUG] util.cpp:716  - #0: Vulkan0
[DEBUG] util.cpp:716  - #1: CPU
[DEBUG] ggml_extend.hpp:108  - Initializing backend: Vulkan0
[INFO ] stable-diffusion.cpp:210  - loading model from 'models/checkpoints/v1-5-pruned-emaonly.safetensors'
[INFO ] model.cpp:219  - load models/checkpoints/v1-5-pruned-emaonly.safetensors using safetensors format
[DEBUG] model.cpp:294  - init from 'models/checkpoints/v1-5-pruned-emaonly.safetensors', prefix = ''
[INFO ] stable-diffusion.cpp:303  - Version: SD 1.x
[INFO ] stable-diffusion.cpp:331  - Weight type stat:                      f32: 1131
[INFO ] stable-diffusion.cpp:332  - Conditioner weight type stat:          f32: 196
[INFO ] stable-diffusion.cpp:333  - Diffusion model weight type stat:      f32: 686
[INFO ] stable-diffusion.cpp:334  - VAE weight type stat:                  f32: 248
[DEBUG] stable-diffusion.cpp:336  - ggml tensor size = 400 bytes
[DEBUG] clip_tokenizer.cpp:65   - vocab size: 49408
[DEBUG] ggml_extend.hpp:2067 - clip params backend buffer size =  469.44 MB(VRAM) (196 tensors)
[DEBUG] ggml_extend.hpp:2067 - unet params backend buffer size =  2155.33 MB(VRAM) (686 tensors)
[INFO ] stable-diffusion.cpp:629  - using VAE for encoding / decoding
[INFO ] auto_encoder_kl.hpp:517  - vae decoder: ch = 128
[DEBUG] ggml_extend.hpp:2067 - vae params backend buffer size =  159.68 MB(VRAM) (248 tensors)
[DEBUG] stable-diffusion.cpp:753  - loading weights
[DEBUG] model.cpp:742  - using 8 threads for model loading
[DEBUG] model.cpp:764  - loading tensors from models/checkpoints/v1-5-pruned-emaonly.safetensors
  |==================================================| 1131/1131 - 3.97GB/s
[INFO ] model.cpp:993  - loading tensors completed, taking 1.00s (process: 0.00s, read: 0.12s, memcpy: 0.00s, convert: 0.16s, copy_to_backend: 0.41s)
[DEBUG] stable-diffusion.cpp:793  - finished loaded file
[INFO ] stable-diffusion.cpp:845  - total params memory size = 2784.45MB (VRAM 2784.45MB, RAM 0.00MB): text_encoders 469.44MB(VRAM), diffusion_model 2155.33MB(VRAM), vae 159.68MB(VRAM), controlnet 0.00MB(VRAM), pmid 0.00MB(VRAM)
[INFO ] stable-diffusion.cpp:918  - running in eps-prediction mode
[INFO ] stable-diffusion.cpp:3342 - generate_image 512x512
[INFO ] denoiser.hpp:499  - get_sigmas with discrete scheduler
[INFO ] stable-diffusion.cpp:2789 - sampling using Euler A method
[INFO ] stable-diffusion.cpp:2907 - IMG2IMG
[INFO ] stable-diffusion.cpp:2914 - target t_enc is 0 steps
[DEBUG] ggml_extend.hpp:1880 - vae compute buffer size: 848.50 MB(VRAM)
[DEBUG] vae.hpp:154  - computing vae encode graph completed, taking 1.64s
[INFO ] stable-diffusion.cpp:3081 - encode_first_stage completed, taking 1.65s
[DEBUG] conditioner.hpp:407  - parse '' to [['', 1], ]
[DEBUG] bpe_tokenizer.cpp:183  - split prompt "" to tokens []
[DEBUG] ggml_extend.hpp:1880 - clip compute buffer size: 1.42 MB(VRAM)
[DEBUG] conditioner.hpp:533  - computing condition graph completed, taking 26 ms
[DEBUG] conditioner.hpp:407  - parse '' to [['', 1], ]
[DEBUG] bpe_tokenizer.cpp:183  - split prompt "" to tokens []
[DEBUG] ggml_extend.hpp:1880 - clip compute buffer size: 1.42 MB(VRAM)
[DEBUG] conditioner.hpp:533  - computing condition graph completed, taking 27 ms
[INFO ] stable-diffusion.cpp:3143 - get_learned_condition completed, taking 0.06s
[INFO ] stable-diffusion.cpp:3376 - generating image: 1/1 - seed 42
[DEBUG] ggml_extend.hpp:1880 - unet compute buffer size: 559.90 MB(VRAM)
  |==================================================| 1/1 - 1.48s/it
[INFO ] stable-diffusion.cpp:3407 - sampling completed, taking 1.49s
[INFO ] stable-diffusion.cpp:3425 - generating 1 latent images completed, taking 1.49s
[INFO ] stable-diffusion.cpp:3167 - decoding 1 latents
[DEBUG] ggml_extend.hpp:1880 - vae compute buffer size: 1984.06 MB(VRAM)
[DEBUG] vae.hpp:207  - computing vae decode graph completed, taking 10.65s
[INFO ] stable-diffusion.cpp:3183 - latent 1 decoded, taking 10.65s
[INFO ] stable-diffusion.cpp:3187 - decode_first_stage completed, taking 10.65s
[INFO ] stable-diffusion.cpp:3562 - generate_image completed in 13.85s
[INFO ] main.cpp:441  - save result image 0 to 'output.png' (success)
[INFO ] main.cpp:490  - 1/1 images saved

```

### Additional context / environment details

CPU: 11th Gen Intel(R) Core(TM) i9-11900K @ 3.50GHz
GPU: Polaris 20 XL [Radeon RX 580 2048SP]
RAM: 64GB

Using `--vae-on-cpu` on the command line gives the desired result.
Using another VAE via `--vae models/vae/vae-ft-mse-840000-ema-pruned.safetensors` gives the same results: With `--vae-on-cpu` it works, without it the result is a garbled image.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] VAE not decoding correctly with Vulkan #1471

Git commit

Operating System & Version

GGML backends

Command-line arguments used

Steps to reproduce

What you expected to happen

What actually happened

Logs / error messages / stack trace

Additional context / environment details

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[Bug] VAE not decoding correctly with Vulkan #1471

Description

Git commit

Operating System & Version

GGML backends

Command-line arguments used

Steps to reproduce

What you expected to happen

What actually happened

Logs / error messages / stack trace

Additional context / environment details

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions