Stable Diffusion CUDA Out of Memory — How to Fix It
The 'CUDA out of memory' error in Stable Diffusion WebUI occurs when your GPU does not have enough VRAM to process the image generation request. It is most commonly triggered when running high-resolution outputs, large batch sizes, or memory-heavy models on consumer-grade GPUs. Users with 4GB to 8GB VRAM cards encounter this error most frequently.
Why does this error happen?
How to fix it
Reduce Image Resolution to 512x512
Start by setting your output resolution to 512x512 pixels, which is the native training resolution for most SD 1.5 models. This dramatically reduces the memory required for attention computations in the U-Net. Once generation is stable, you can use a hi-res fix pass to upscale the image without holding the full high-res tensor in VRAM at once.
Enable xformers in Launch Settings
xformers is a memory-efficient attention library that replaces the default PyTorch attention mechanism with a highly optimized version. Enable it by adding the --xformers flag to your launch command or toggling it in the WebUI settings under 'Optimizations'. This alone can reduce VRAM usage by 30-50% and also speeds up generation on most NVIDIA GPUs.
Add --medvram or --lowvram Launch Flag
The --medvram flag instructs Stable Diffusion to keep only the active model component in VRAM at a time, offloading others to system RAM. If you have 4GB or less VRAM, use --lowvram instead, which applies even more aggressive memory splitting at the cost of slower generation speed. Add the appropriate flag to your webui-user.bat or webui-user.sh file in the COMMANDLINE_ARGS variable.
Reduce Batch Size to 1
Generating multiple images simultaneously multiplies VRAM consumption almost linearly per image in the batch. Set your batch size to 1 in the WebUI to ensure only a single image is processed at a time. If you need multiple outputs, use the batch count setting instead, which generates images sequentially and reuses the same VRAM allocation.
Code example
# Launch with low VRAM mode
python launch.py --lowvram --xformers --no-half-vaePro tip
Add --no-half-vae to your launch flags alongside --lowvram to prevent the VAE decoder from producing black or corrupted images, which is a common secondary issue when running in low VRAM mode.