SDXL 1.0

A Replicate guide

Or, how I learned to make really weird cats

Stable Diffusion XL 1.0 is a new text-to-image model by Stability AI. It creates beautiful 1024x1024 images with simple prompts.

We’re going to look at how to get the best images by exploring:

Jump to the resolution section if you’re just here for weird cats.

Compare settings

Try changing the scheduler, guidance_scale and num_inference_steps to see what happens.

The scheduler defines a noise strategy to use during the denoising process
Sets the number of denoising steps
Tells the model how similar the output should be to the prompt
  "prompt": "A studio portrait photo of a cat",
  "num_inference_steps": 20,
  "guidance_scale": 7,
  "negative_prompt": "ugly, soft, blurry, out of focus, low quality, garish, distorted, disfigured",
  "seed": 1000,
  "width": 1024,
  "height": 1024,
  "scheduler": "K_EULER"
Run on Replicate →

Guidance scale

The guidance scale tells the model how similar the output should be to the prompt, start with a value of about 7.


A larger number of denoising steps increases the quality of the output but it takes longer to generate. Start with a value of about 20 steps. Don’t go too high, after a point each step helps less and less.


Schedulers (or samplers) define the denoising process. Most will get a decent image in as few as 10 steps with SDXL. Euler and Euler Ancestral give the sharpest and fastest results.

Compare resolutions

SDXL works best at 1024x1024, but what happens when you go bigger or smaller, or use a different aspect ratio?

Try changing width and height to see what happens.

The scheduler defines a noise strategy to use during the denoising process

Aspect ratio


  "prompt": "A studio portrait photo of a cat",
  "num_inference_steps": 50,
  "guidance_scale": 7.5,
  "negative_prompt": "ugly, soft, blurry, out of focus, low quality, garish, distorted, disfigured",
  "seed": 1000,
  "width": 1024,
  "height": 1024,
  "scheduler": "K_EULER"
Run on Replicate →

Try these dimensions for common aspect ratios:

Aspect ratioResolution


With SDXL you can use a separate refiner model to add finer detail to your output.

You can use the refiner in two ways:

One after the other

In this mode you take your final output from SDXL base model and pass it to the refiner. You can define how many steps the refiner takes.

Ensemble of experts

In this mode the SDXL base model handles the steps at the beginning (high noise), before handing over to the refining model for the final steps (low noise).

You get a more detailed image from fewer steps.

You can change the point at which that handover happens, we default to 0.8 (80%)

Tells the base model when to handoff to the refiner
The scheduler defines a noise strategy to use during the denoising process
  "prompt": "A studio portrait photo of a cat",
  "num_inference_steps": 100,
  "guidance_scale": 7.5,
  "negative_prompt": "ugly, soft, blurry, out of focus, low quality, garish, distorted, disfigured",
  "seed": 1000,
  "width": 1024,
  "height": 1024,
  "scheduler": "K_EULER",
  "refiner": "expert_ensemble_refiner",
  "high_noise_fraction": "0.80"
Run on Replicate →