SDXL 1.0

A Replicate guide

Or, how I learned to make really weird cats

Stable Diffusion XL 1.0 is a new text-to-image model by Stability AI. It creates beautiful 1024x1024 images with simple prompts.

We’re going to look at how to get the best images by exploring:

guidance scales
number of steps
the scheduler (or sampler) you should use
what happens at different resolutions
🆕 refiners, and how to use them

Jump to the resolution section if you’re just here for weird cats.

Use SDXL on Replicate

Compare settings

Try changing the scheduler, guidance_scale and num_inference_steps to see what happens.

Scheduler

The scheduler defines a noise strategy to use during the denoising process

Steps

150

Sets the number of denoising steps

Guidance scale

220

Tells the model how similar the output should be to the prompt

{
  "prompt": "A studio portrait photo of a cat",
  "num_inference_steps": 20,
  "guidance_scale": 7,
  "negative_prompt": "ugly, soft, blurry, out of focus, low quality, garish, distorted, disfigured",
  "seed": 1000,
  "width": 1024,
  "height": 1024,
  "scheduler": "K_EULER"
}

Run on Replicate →

Guidance scale

The guidance scale tells the model how similar the output should be to the prompt, start with a value of about 7.

Steps

A larger number of denoising steps increases the quality of the output but it takes longer to generate. Start with a value of about 20 steps. Don’t go too high, after a point each step helps less and less.

Scheduler

Schedulers (or samplers) define the denoising process. Most will get a decent image in as few as 10 steps with SDXL. Euler and Euler Ancestral give the sharpest and fastest results.

Compare resolutions

SDXL works best at 1024x1024, but what happens when you go bigger or smaller, or use a different aspect ratio?

Try changing width and height to see what happens.

Width

5122048

1024

Height

5122048

1024

Scheduler

The scheduler defines a noise strategy to use during the denoising process

Aspect ratio

1:1

{
  "prompt": "A studio portrait photo of a cat",
  "num_inference_steps": 50,
  "guidance_scale": 7.5,
  "negative_prompt": "ugly, soft, blurry, out of focus, low quality, garish, distorted, disfigured",
  "seed": 1000,
  "width": 1024,
  "height": 1024,
  "scheduler": "K_EULER"
}

Run on Replicate →

Try these dimensions for common aspect ratios:

Aspect ratio	Resolution
1:1	1024x1024
4:3	1152x864
3:2	1248x832
16:9	1344x768

Refiner

With SDXL you can use a separate refiner model to add finer detail to your output.

You can use the refiner in two ways:

one after the other
as an ‘ensemble of experts’

One after the other

In this mode you take your final output from SDXL base model and pass it to the refiner. You can define how many steps the refiner takes.

Ensemble of experts

In this mode the SDXL base model handles the steps at the beginning (high noise), before handing over to the refining model for the final steps (low noise).

You get a more detailed image from fewer steps.

You can change the point at which that handover happens, we default to 0.8 (80%)

High noise fraction

1100

Tells the base model when to handoff to the refiner

Scheduler

The scheduler defines a noise strategy to use during the denoising process

{
  "prompt": "A studio portrait photo of a cat",
  "num_inference_steps": 100,
  "guidance_scale": 7.5,
  "negative_prompt": "ugly, soft, blurry, out of focus, low quality, garish, distorted, disfigured",
  "seed": 1000,
  "width": 1024,
  "height": 1024,
  "scheduler": "K_EULER",
  "refiner": "expert_ensemble_refiner",
  "high_noise_fraction": "0.80"
}

Run on Replicate →