Stable Diffusion

Stable Diffusion VAE Guide and Comparison

by

Yubin

on

updated

No Comments

VAE stands for Variable Auto Encoder. These are files that you use in addition to your Stable Diffusion checkpoint models to get more vibrant colors and crisper images. VAEs often have the added benefit of improving hands and faces.

If you're getting washed-out images you should download a VAE. Sometimes you don't even realize what is 'washed out' until you see the alternative.

Technically, all models have built-in VAEs, but sometimes other VAEs will work better than the built-in one.

VAE Comparison

Pretty much 90%+ of VAEs in circulation are renamed versions of the following:

  • kl-f8-anime2 VAE (for anime)
  • anything VAE aka orangemix VAE (for anime)
  • vae-ft-mse-840000-ema-pruned (for realism)

Some checkpoints already have the VAE built in eg. Anything VAE is built into Anything checkpoint, so applying it will not have any effect.

Here's a comparison using the Anything V5 checkpoint

No VAE
kl-f8-anime2
vae-ft-mse-840000-ema-pruned

And here's the same comparison using the Deliberate checkpoint for realistic images:

No VAE
kl-f8-anime2
vae-ft-mse-840000-ema-pruned

VAE can come in the older .vae.pt and .ckpt format as well as the newer .safetensors format, all of them will work.

kl-f8-anime2

Most of the anime VAEs out there are actually just this one, but renamed. Created by Hakurei by finetuning the SD 1.4 VAE on a bunch of anime-styled images.

Download link

vae-ft-mse-840000-ema-pruned

For realistic models or styles. Created by StabilityAI. It can also make anime look crisper.

Download link

Usage

Download any of the VAEs listed above and place them in stable-diffusion-webui\models\VAE

In the WebUI click on Settings tab > User Interface subtab. Then in Quicksettings list add sd_vae after sd_model_checkpoint, separated by a comma.

Restart the UI by scrolling to the footer and clicking "Reload UI".

Now, you'll see an SD VAE dropdown at the top. Pick your VAE there.

Note: before you had to rename the VAE to be the same as the model name. Now it's no longer essential to have the same name, since you can manually select what VAE you want to use in the UI settings. If your VAE is the same name it will be selected automatically for you when you load the model.

How does a VAE work?

Image you have a set of 512x512 images, and you have to shrink them down to 256x256, then later return them to 512x512 with losing details. Seems impossible, right?

As it turns out, you can train a neural network to both encode and decode the images.

It can look at millions of images to calculate probabilities, so when it looks at the 256x256 images, it can make really accurate guesses about what the original 512x512 data was that was encoded in those 256x256 "latent" images.

Now let's say you have to do the same thing, but with 128x128 images.

So now you would train the neural network millions of 256x256 "latent" images, teaching the system to encode a ton of data into 128x128 pixels, so that later it can return them back to your larger 256x256 latent image.

And now you have a nifty two-step process that takes 512x512 images and compresses them into 256x256, then further compresses those to 128x128 latent images. And it can do it all in reverse, turning those 128x128 latent images into a near-approximation of the original 512x512 image. Magic.

Now do the exact same thing with 64x64 images.

Now your Neural Network can take any 512x512 image, send it through 3 layers of the VAE and make a 64x64 latent image. Then you can send that 64x64 latent image in reverse through those 3 layers of the VAE and get something kinda like your original image back out.

And that's a variational auto-encoder.

Yeah, AI moves way too fast

Get the email that makes keeping up with AI easy and fun.

Leave a Comment