Stable Diffusion

Stable Diffusion CUDA Out of Memory — How to Fix It

The 'CUDA out of memory' error in Stable Diffusion WebUI occurs when your GPU does not have enough VRAM to process the image generation request. It is most commonly triggered when running high-resolution outputs, large batch sizes, or memory-heavy models on consumer-grade GPUs. Users with 4GB to 8GB VRAM cards encounter this error most frequently.

?

Why does this error happen?

Stable Diffusion loads the full model weights, attention maps, and intermediate latent tensors directly into GPU VRAM during inference. At higher resolutions, the attention mechanism in the U-Net scales quadratically, meaning a 768x768 image requires significantly more memory than a 512x512 image. When the cumulative memory demand of the model, VAE, and active tensors exceeds the physical VRAM capacity of your GPU, PyTorch throws a CUDA OutOfMemoryError and halts the process. This is compounded when running multiple images in a batch or using full-precision (float32) weights instead of half-precision (float16).

How to fix it

1

Reduce Image Resolution to 512x512

Start by setting your output resolution to 512x512 pixels, which is the native training resolution for most SD 1.5 models. This dramatically reduces the memory required for attention computations in the U-Net. Once generation is stable, you can use a hi-res fix pass to upscale the image without holding the full high-res tensor in VRAM at once.

2

Enable xformers in Launch Settings

xformers is a memory-efficient attention library that replaces the default PyTorch attention mechanism with a highly optimized version. Enable it by adding the --xformers flag to your launch command or toggling it in the WebUI settings under 'Optimizations'. This alone can reduce VRAM usage by 30-50% and also speeds up generation on most NVIDIA GPUs.

3

Add --medvram or --lowvram Launch Flag

The --medvram flag instructs Stable Diffusion to keep only the active model component in VRAM at a time, offloading others to system RAM. If you have 4GB or less VRAM, use --lowvram instead, which applies even more aggressive memory splitting at the cost of slower generation speed. Add the appropriate flag to your webui-user.bat or webui-user.sh file in the COMMANDLINE_ARGS variable.

4

Reduce Batch Size to 1

Generating multiple images simultaneously multiplies VRAM consumption almost linearly per image in the batch. Set your batch size to 1 in the WebUI to ensure only a single image is processed at a time. If you need multiple outputs, use the batch count setting instead, which generates images sequentially and reuses the same VRAM allocation.

Code example

# Launch with low VRAM mode
python launch.py --lowvram --xformers --no-half-vae

Pro tip

Add --no-half-vae to your launch flags alongside --lowvram to prevent the VAE decoder from producing black or corrupted images, which is a common secondary issue when running in low VRAM mode.

Frequently asked questions

Does --lowvram significantly slow down image generation?
Yes, --lowvram increases generation time because model components are constantly swapped between VRAM and system RAM. Using --medvram is a better balance of speed and memory savings if your GPU has at least 5-6GB VRAM.
Can I run Stable Diffusion SDXL on a 6GB VRAM GPU without this error?
SDXL requires significantly more VRAM than SD 1.5, typically 8GB minimum for standard use. On a 6GB card, you will need --medvram, xformers, and should avoid resolutions above 1024x1024 to prevent out of memory crashes.
Why does the error only happen sometimes and not every generation?
VRAM fragmentation and other GPU processes running in the background can cause inconsistent available memory between runs. Restarting the WebUI clears the VRAM cache, and closing other GPU-accelerated applications like browsers or games before generating can help stabilize memory availability.
Will upgrading to more system RAM fix the CUDA out of memory error?
Adding system RAM does not directly fix CUDA OOM errors because GPU VRAM is a separate memory pool. However, more system RAM helps when using --lowvram or --medvram flags, as those modes offload model parts to system RAM during generation.

Need more VRAM? Run Stable Diffusion on a cloud GPU with 24GB+ VRAM instantly — no setup required.

Related Guides