ComfyUI was created by comfyanonymous, who made the tool to understand how Stable Diffusion works.
Make sure you also check out the full ComfyUI beginner's manual.
Compared to other tools which hide the underlying mechanics of generation beneath the user interface, ComfyUI's user interface closely follows how Stable Diffusion actually works. This means if you learn how ComfyUI works, you will end up learning how Stable Diffusion works.
ComfyUI won't take as much time to set up as you might expect. Instead of creating a workflow from scratch, you can simply download a workflow optimized for SDXL v1.0.
In this guide, I'll use the popular Sytan SDXL workflow and provide a couple of other recommendations.
After you're all set up, you'll be able to generate SDXL images with one click:
ComfyUI VS AUTOMATIC1111
(That's not to ignore other great interfaces: Vladmantic's SD.Next, InvokeAI)
Here's why you would want to use ComfyUI for SDXL:
Workflows do many things at once
Imagine that you follow a similar process for all your images: first, you do text-to-image. Then you send the result to img2img. Finally, you upscale that.
In AUTOMATIC1111, you would have to do all these steps manually. In ComfyUI on the other hand, you can perform all of these steps in a single click.
This is well suited for SDXL v1.0, which comes with 2 models and a 2-step process: the base model is used to generate noisy latents, which are processed with a refiner model specialized for the final denoising steps. The base model can be used alone, but the refiner model can add a lot of sharpness and quality to the image. You can read more about this process in the official paper.
Because they are so configurable, ComfyUI generations can be optimized in ways that AUTOMATIC1111 generations cannot.
The workflow we're using does a portion of the image with base model, sends the incomplete image to the refiner, and goes from there.
This greatly optimizes the speed, with people reporting 6-10x faster generation times with ComfyUI compared to AUTOMATIC1111
Step 1: Download SDXL v1.0
You'll need to download both the base and the refiner models:
We'll be using NMKD Superscale x4 upscale your images to 2048x2048. There are other upscalers out there like 4x Ultrasharp, but NMKD works best for this workflow.
Go to this link: https://icedrive.net/s/14BM8qlGO6
Click on the folder
Superscale 4x and download the file inside with extension
Step 2: Download ComfyUI
Direct download only works for NVIDIA GPUs. For AMD (Linux only) or Mac, check the beginner's guide to ComfyUI.
Simply download this file and extract it with 7-Zip.
The extracted folder will be called
- Place the models you downloaded in the previous step in the folder:
- If you downloaded the upscaler, place it in the folder:
Step 3: Download Sytan's SDXL Workflow
Go to this link and download the JSON file by clicking the button labeled
Download raw file.
Step 4: Start ComfyUI
run_nvidia_gpu.bat and ComfyUI will automatically open in your web browser.
Click the Load button and select the .json workflow file you downloaded in the previous step.
Sytan's SDXL Workflow will load:
Testing Your First Prompt
You can scroll and hold down on blank space to pan around.
Let's run our first prompt.
First, you'll need to connect your models. In the yellow Refiner Model box, select
Then in the yellow Base Model box, select
If you downloaded the upscaler, make sure you select in in the Upscale Model node:
Now you're ready to run the default prompt. Click
Boxes will be outlined in green when they are running.
You'll notice that the generation time will be much faster than that of AUTOMATIC1111.
After you get the result, you can right click the image ->
Save Image to download to your computer.
Let's take a look at some of our settings. If you're coming from another Stable Diffusion interface, some of these settings will look familiar, others will be completely new.
1. Linguistic Positive
Notice that you can input 2 prompts: Linguistic Positive and Supporting Terms.
Stable Diffusion needs to "understand" the text prompts that you give it. To do this, it uses a text encoder called CLIP.
Whereas previous Stable Diffusion models only had one text encoder, SDXL v1.0 has two text encoders:
- text_encoder (CLIPTextModel) also known as CLIP_G: this is the encoder that was used for Stable Diffusion v2.0 & v2.1
- text_encoder_2 (CLIPTextModelWithProjection) also known as CLIP_L: this is the encoder that was used for Stable Diffusion v1.4 & v.15
This means you can pass each text encoder a different prompt.
Sytan's SDXL workflow gives the Linguistic Postive to CLIP_G.
CLIP_G is better at interpreting natural language sentences, so that's the style you would use for the Linguistic Positive eg.
35mm photo of a person on a park bench
2. Supporting Terms
Sytan's SDXL workflow gives the Supporting Terms to the CLIP_L text encoder.
CLIP_L is better at interpreting comma separated tags, so that's the style you would use for the Supporting Terms eg.
nikkor lens, kodachrome, gravel floor, shrubbery, foliage.
3. Fundamental Negative
The negative prompt goes here.
4. Image Resolution
For best results, keep height and width at 1024 x 1024 or use resolutions that have the same total number of pixels as 1024*1024 (1048576 pixels)
Here are some examples:
- 896 x 1152
- 1536 x 640
SDXL does support resolutions for higher total pixel values, however res
This is the combined steps for both the base model and the refiner model. The default value of 20 is sufficient for high quality images.
Notice the nodes First Pass Latent and Second Pass Latent. Both have fields
For best results, you Second Pass Latent
end_at_step should be the same as your Steps
end_at_step value of the First Pass Latent (base model) should be equal to the
start_at_step value of the Second Pass Latent (refiner model).
6. Base CFG
CFG is a measure of how strictly your generation adheres to the prompt. Higher values mean rigid adhererance to the prompt, lower values mean more freedom.
The default value of 7.5 is a good starting point.
You can optionally set a seed.
8. Refiner CFG
The default of 7.5 is fine.
9. Positive A Score
SDXL comes with a new setting called Aesthetic Scores. This is used for the refiner model only.
The training data of SDXL had an aesthetic score for every image, with 0 being the ugliest and 10 being the best-looking.
By setting your
SDXL high aesthetic score, you're biasing your prompt towards images that had that aesthetic score (theoretically improving the aesthetics of your images).
10. Negative A Score
SDXL low aesthetic score is more confusing. It sets the "bias" your negative prompt. Usually, you want this bias to resemble images that had a low aesthetic score...so this works in the reverse of high aesthetic score. The lower this value is, the better your images will look and vice-versa.
You can use any existing ComfyUI workflow with SDXL (base model, since previous workflows don't include the refiner). Here are some to try:
Check out these prompts to try in SDXL v1.0!