Diffusion With Offset Noise: Fixing Dark and Light Image Generation

Turns out diffusion models do not have the ability to produce overly dark or light images out of the box, nor can they be trained to do so. Even when fine-tuned to a high number of steps, these models almost always generate images whose average pixel value stays relatively close to 0.5 (with entirely black being 0 and entirely white being 1).
The Problem
If you ask Stable Diffusion to generate a dark alleyway in a rainstorm or a snow-covered ski slope on a sunny day, the results tend to be washed out — areas of bright fog appear to counteract dark regions, grey backgrounds replace what should be white or black, and high-frequency textures fill in where empty areas should be. The model is softly constrained to produce images with a mid-range average value.
This isn’t just a quirk of the training data or the architecture — it’s a fundamental property of how diffusion models are formulated. The forward process in diffusion adds independently and identically distributed (iid) Gaussian noise at each step. Because the noise at each pixel is independent, any global signal (like overall brightness) gets drowned out very quickly during the noising process, making it nearly impossible for the model to learn or reconstruct it during denoising.
The Fix
The proposed solution is elegantly simple: add a small amount of “offset noise” — noise that is correlated across all pixels — during training. This allows the model to learn to predict and reconstruct the overall brightness of an image, not just the local details. The offset noise is just a single random value per channel, broadcast across all spatial dimensions and added on top of the usual per-pixel noise.
This simple hack significantly improves results for prompts that call for particularly dark or light images, without degrading quality on other prompts.