ControlNet Mode Reference

Quick reference for all ControlNet preprocessors and their ideal use cases

Ad placeholder (leaderboard)

ControlNet mode reference

ControlNet conditions an image model on a structural map extracted from a reference image — edges, a depth map, a pose skeleton, a segmentation mask — so the diffusion process follows that structure instead of starting from pure noise. The hard part is remembering which of the dozen-plus preprocessors does what and how hard to push it. Pick your goal above and this card surfaces the right mode, a sane weight range and the models it pairs with.

How the modes differ

Every ControlNet mode is a pair: a preprocessor that turns your reference into a control map, and a model trained to read that specific map. The map type is what matters:

  • Edge maps (Canny, Lineart, Softedge, MLSD) preserve outlines and detail.
  • Geometry maps (Depth, Normal) preserve 3D form and spatial layout.
  • Pose maps (OpenPose) preserve human/animal articulation only.
  • Semantic maps (Segmentation) preserve what is where by region.
  • Loose maps (Scribble, Tile, Reference) give the model the most freedom.

The tighter the map, the more faithfully output matches the reference — and the less room the prompt has to change things. That trade-off is exactly what the control weight tunes.

Tips

  • Stack modes carefully. OpenPose + Depth is a powerful combo for posed characters in a scene, but two strong controls fight each other — drop each weight to ~0.6 when stacking.
  • Match resolution. Generate at a size close to your reference’s aspect ratio so the control map isn’t stretched.
  • Use guidance end to free the late steps. Ending control around 0.7–0.8 lets the final denoising steps add texture and realism the rigid map would otherwise suppress.
Ad placeholder (rectangle)