Depth Map ControlNet Guide

Control spatial composition in SD images with depth map preprocessing

Ad placeholder (leaderboard)

Depth map ControlNet guide

A depth map ControlNet controls the spatial composition of an image — the sense of near and far, of volume and perspective — without locking down fine edges. It is the right tool when you want to preserve a scene’s three-dimensional layout but give the model freedom over surface detail. The two choices that drive quality are the depth estimator you use to build the map and the control strength you apply.

How it works

A depth estimator analyzes the source image and outputs a grayscale depth map: lighter pixels are nearer the camera, darker pixels are farther away. MiDaS is the fast, general-purpose default; LeReS resolves finer foreground detail and crisper depth boundaries; ZoeDepth produces metric-scale depth for accurate relative distances. ControlNet feeds this map to the diffusion model, which builds a new image whose spatial structure matches the map. Control strength scales how strictly that structure is enforced.

Tips for spatial control

  • Start with MiDaS. It is reliable for most scenes; only switch to LeReS or ZoeDepth when foreground detail or true scale matters.
  • Inspect the map first. If foreground and background blur into the same gray, the model will not separate them either — try a different estimator.
  • Use moderate strength for natural depth. High strength enforces layout rigidly; mid strength keeps the composition while letting the scene breathe.
  • Stack with other modules. Combine depth with Canny for composition plus edges, or with OpenPose to seat a figure correctly in 3D — lower each strength when stacking so they don’t fight.
Ad placeholder (rectangle)