Stable Cascade configuration
Stable Cascade (the Würstchen v3 architecture) splits generation across three stages working in a heavily compressed latent space. Stage C is the text-conditioned prior doing the creative work, Stage B decodes it into a larger latent, and Stage A is a tiny VAE producing pixels. Most of your tuning happens on Stage C and Stage B, and this guide picks sensible values for your quality target and VRAM.
How it works
Because Stage C operates on a 42x-compressed latent, it needs surprisingly few steps and a low guidance scale to produce strong results. Stage B mostly decodes, so it needs even fewer steps and almost no guidance. The tool maps a draft/balanced/maximum target to step counts for each stage, suggests a CFG pair, and recommends a latent resolution that fits your GPU — warning you when the full bf16 pipeline is tight on smaller cards.
Tips and notes
- Don’t over-step Stage B. Beyond ~10 steps it adds time without real quality gains; spend your budget on Stage C instead.
- Keep Stage C guidance low. A CFG of ~4 is the sweet spot; high CFG over-saturates and distorts in the compressed latent.
- Use bf16. Stable Cascade ships in bfloat16 weights; running fp32 doubles VRAM for no quality benefit.
- Square resolutions are safest at 1024x1024; push to 1536 only with ample VRAM, as the prior was trained around 1024.