You can think of Stable Diffusion checkpoint models like a giant databases of numbers; these numbers encode descriptions that tell the generator how to produce images.
StabilityAI released the first public checkpoint model, Stable Diffusion v1.4, in August 2022. In the coming months, they continued to release v1.5, v2.0 & v2.1.
Soon after these models were released, users started to train their own custom models (AKA fine-tunes) on top of the base models. Today, most custom models are built on top of either SD v1.5 or SD v2.1.
A common point of confusion: "Stable Diffusion model" can refer to the official base models by StabilityAI, but is also a generic phrase used to refer to all of these custom models.
Also see: The Best Stable Diffusion Anime Models
Merges and Mixes
Later on, people discovered that they could combine different models so that the resulting model (called a merge or mix) would produce better results than the individual models.
Some popular merges include Realistic Vision, Deliberate & Dreamshaper. We'll get to these below.
The Stable Diffusion community often complains that:
- There are way too many merges, most of which are not very differentiated
- Creators of merges don't cite the models they use
- Blatant plagiarism
Other Training Methods
Besides checkpoint models, there are also a couple of other alternate training methods:
- Textual Inversion (aka embeddings)
These all function differently, but think of them as smaller databases (models) that only contain information about a particular subject or aesthetic.
You use them in addition to your checkpoint models. For example, you could use an ink painting LoRA to alter the aesthetic style of your outputs.
A popular use of these models is to train them on your own face, so you can produce images of yourself.
Where to find SD models
The 2 most popular places to find and download models are Civitai and Hugging Face. Personally, I prefer Civitai because you can see examples of the images people generate with different models.
SD models comes in two different formats: .ckpt and .safetensors. Checkpoint (ckpt) is the old format, and safetensors is the new (and safer) format.
How to run SD models
To use different SD models, you just have to download them as well as a User Interface to run them with. The most popular User Interface is AUTOMATIC1111. Installation instructions for different platforms:
The "one best model" doesn't exist: each one does something better than the others.
At this point however, the merges are much better than the base models.
Pick one or two models that can produce the kind of images you require, then stick with them. Models are shiny objects - downloading every model that comes out might be counterproductive.
Stable Diffusion v1.5
v1.5 is released in Oct 2022.
It produces slightly different results compared to v1.4. The general consensus is that it is slightly better, but the different is small.
Stable Diffusion v2.1
A considerable improvement over the poorly received v2.0. v2.0 was widely criticized because it was bad at generating people. What happened was
Best Merge Models
Note: you will have to have a civitai.com account to view some of these models, because many have NSFW capabilities.
Broad general purpose model, something for everybody, try photorealism/illustrations/fantasy
A merge of many models. This is a good model to start with, because it's good at so many things - you can try portraits, environments, fantasy etc.
I only just tried Deliberate a couple days ago, and holy shit have I been missing out. I don't know if it's just the way I write my prompts or what, but Deliberate is the best fuckin all-around model I've used, by far.TherronKeen
Photorealism model, used for both celebrities and original characters. Try in conjunction with ControlNet to turn cartoon characters into real life.
Merge of many photorealism-focused models.
Most well known fantasy art model
Merge of fantasy models - great at producing a fantasy illustration style.
SD v1.5 fine-tuned on high quality art.
Download link and Instructions
One of the more well known anime models. Unlike the other models in this list, this is a fine-tune of NAI Diffusion.
Many custom checkpoint models also come in different file sizes: fp16 (floating-point 16, aka half-precision floating-point) and fp32 (floating point 32, aka full-precision floating-point). This refers to the format in which the data are stored inside the model, which can either be 16-bit (2 bytes), or 32-bit (4 bytes). Unless you want to train or mix your own custom model, the smaller (usually 2 GiB) fp16 version is all you need. For most casual users, the difference between the image quality produced by fp16 and fp32 is insignificant.
1.5: support for NSFW, more custom models, better ControlNet support, more LoRA, TIs, have more artists and celebrities in the training data image set. Training set is 512x512, so optimal size for quick exploration is 512x512, which is kind of limited.
2.x: Better support for photos, landscape. Training set is 768x768 so one get more interesting composition and detail, easier to explore and experiment starting at 768x768. Has one great model: Illuminati v1.1 that can produce interesting images with minimum prompting, kind of like Midjourney. Some "controversial" artist such as Greg Rutkowski has been removed. No nudity, and no custom model that supports nudity (yet). There is only one Anime model (Replicant). (Update: the creator has removed Illuminati from CivitAI, so use either rmadaMerge or MangledMerge as replacement.)
The reason you get better composition and interesting images with SD 2.1 based models is because the AI starts generating at 768x768. With an SD 1.5 based model, the starting point is 512x512, even if you specify 768x768 as your image size. That's 589,824 vs 262,144 pixels, i.e. over twice as much space for the AI to work with, so that it can include more "stuff" into the composition.
It is easy to prove this for yourself. Try generating a 768x768 image with a SD 1.5 model without hires-fix, and you often get twin heads. You don't have that problem with an SD 2.1 based model. When you do turn on hires-fix all the AI is doing is upscaling the 512x512 initial image, so the composition is already fixed.
Here are more insights from u/Delerium76
SD 1.4 and 1.5 are 512x512 models that generate pretty well and are the base for most of the models people have created. It has a lot more celebrity and artist prompt recognition than 2.0
SD2.0 was created to solve what the creators saw as a potential for lawsuits by implementing a custom prompt interpreter that is able to filter out words they don't want you to use in prompts, purely for liability reasons (deep fake pron, etc). 2.0 was very heavy-handed in it's filtering, and it became somewhat difficult to build good prompts for it. Also, 2.0+ models seem to rely more on negative prompting than 1.4 and 1.5.
SD2.1 improves on the prompting and fixes a lot of what was missing from 2.0 in regard to celebrities and artists being missing. The filters were also improved to not overreact with false positives.
SD2.0 and 2.1 768 is a model trained on 768x768 images to provide better quality. Everything before this is always 512x512. When using this model you have to use 768x768 resolution or things get very distorted.
All the inpaint versions are just custom models built for modifying existing pictures with img2img inpainting.
I would use 1.4 or 1.5 over 2.0 or 2.1 any day of the week, but that's also because I'm more familiar with the style of prompt building that is most effective in 1.4 & 1.5. I'd completely skip 2.0 btw. It's inferior to 2.1 in every way. I've heard people describe 2.1 as "not as good as 1.x versions, but an important step to laying the foundation of SD features for the future." I don't really know what features they spoke of, but I just don't have much luck with 2.1. the 768 model does generate higher quality images, but it's such a pain in the butt to get it to cooperate with your prompt and actually generate exactly what you want that it's not even worth it. Again, some people are better at using 2.1 than I am, so YMMV
Deliberate, Realistic Vision, RealBiter, Ares Mix, aEros, and it's update "liberty" all do a great job on realistic people, but each one still has it's own distinct style to them. Ares mix, aEros, and liberty needs some negative prompting to keep it SFW, but they really do generate realistic looking faces, depending on the sampling method (strangely some methods aren't great with some models, so you should play around with the sampling methods on each model to find the "best" one for your needs) For realistic human expressions, the Emotion-Puppeteer-LoRA comes in handy because you can use it with the other models above. ChilloutMix I don't use due to it's license drama that I just don't want to deal with. Plus, it seems to be trained predominantly on Asian women, so it's very niche.
Strangely enough, the base 1.4 and 1.5 are great at that! I created a list of artists with their styles inside the "styles" dropdown of automatic1111's webui. It's very useful to draw inspiration from if you are having prompt writers block. Just build one prompt and generate that in many different artistic styles for ideas.
Fantasy wise, RPG, Dreamshaper, creator has a new model named "NeverEnding Dream" that seems to be in between Dreamshaper and a Fully realistic model. aEros does fantasy well too.
For Sci-fi I use Experience, Protogen infinity, Synthwavepunk (very 80's sci-fi inspired) In fact, protogen has several versions that each specialize in a specific style. Worth checking out.