IP-Adapter style reference guide
IP-Adapter lets Stable Diffusion take an image as a prompt alongside your text, borrowing the reference’s style, content, or facial identity. The results hinge on two choices: which model variant you load and what weight you set. This guide recommends both based on your transfer goal, and explains how to combine IP-Adapter with ControlNet.
How it works
IP-Adapter encodes your reference image with an image encoder and injects those features into the cross-attention layers of the diffusion model, in parallel with your text prompt. The weight controls how loudly the reference speaks: low weights leave your text prompt in charge with a stylistic nudge, while high weights let the reference dominate. The model variant changes what gets captured — Base/Plus for general style and content (Plus captures finer detail), Plus-Face tuned for facial identity, and Light for a gentle stylistic touch. Pairing with ControlNet separates concerns cleanly: IP-Adapter carries look, ControlNet carries structure.
Settings and best practice
- Style transfer: Plus model at weight 0.4–0.6 keeps your prompt’s subject but adopts the reference’s palette, texture, and mood.
- Face transfer: Plus-Face at 0.7–0.9; pair with a face-focused ControlNet or a good base prompt for the body and scene.
- Subtle inspiration: Light model at 0.2–0.4 nudges aesthetics without hijacking composition.
- Stacking with ControlNet: drop IP-Adapter weight by ~0.1 so style and structure conditioning don’t overpower each other, and feed ControlNet a separate control image for pose or depth.