TCFG: Tangential Damping Classifier-free Guidance

1Yonsei University, 2University of Michigan,
Animated GIF 1

Classifier-free guidance (CFG)

Animated GIF 2

Ours: TCFG

In CFG, unconditional scores include components that point towards directions other than the target distribution (i.e. half-moon), making the final destination deviates from the target distribution. TCFG removes the inconsistent tangent components in unconditional scores and reaches the target distribution more accurately combining with conditional scores.

Abstract

Diffusion models have achieved remarkable success in text-to-image synthesis, largely attributed to the use of classifier-free guidance (CFG), which enables high-quality, condition-aligned image generation. CFG combines the conditional score (e.g., text-conditioned) with the unconditional score to control the output. However, the unconditional score is in charge of estimating the transition between manifolds of adjacent timesteps from x_t to x_(t-1), which may inadvertently interfere with the trajectory toward the specific condition. In this work, we introduce a novel approach that leverages a geometric perspective on the unconditional score to enhance CFG performance when conditional scores are available. Specifically, we propose a method that filters the singular vectors of both conditional and unconditional scores using singular value decomposition. This filtering process aligns the unconditional score with the conditional score, thereby refining the sampling trajectory to stay closer to the manifold. Our approach improves image quality with negligible additional computation. We provide deeper insights into the score function behavior in diffusion models and present a practical technique for achieving more accurate and contextually coherent image synthesis.

Illustration of CFG process

Results

Results

Our results on SD3, SDXL, and SD1.5

Results

Our results on DiT

Quantitative Results on Stable Diffusion

FID ↓ CLIPScore ↑
SD v1.5 original 13.26 0.31
+ ours 13.12 0.31
SDXL original 13.36 0.32
+ ours 12.65 0.32
SD v3 original 16.66 0.32
+ ours 13.74 0.32

Zero-shot FID and CLIPScore measured on MSCOCO 30k. Our method consistently improves FID across all models—Stable Diffusion v1.5, SDXL, and SD v3—while maintaining a nearly identical CLIPScore.

Quantitative Results on DiT

FID ↓ sFID ↓ Precision ↑ Recall ↑ IS ↑
DiT 32.67 17.92 0.90 0.13 271.1
DiT+ours 29.5 13.27 0.90 0.19 270.0

FID-CLIP Curve

FID-CLIP Curve

FID-CLIP curves on SDXL with 50 sampling steps.

BibTeX

@misc{kwon2025tcfgtangentialdampingclassifierfree,
                title={TCFG: Tangential Damping Classifier-free Guidance}, 
                author={Mingi Kwon and Shin seong Kim and Jaeseok Jeong. Yi Ting Hsiao and Youngjung Uh},
                year={2025},
                eprint={2503.18137},
                archivePrefix={arXiv},
                primaryClass={cs.CV},
                url={https://arxiv.org/abs/2503.18137}, 
          }
}