Get Started with AI Generation
When you want to get started with generative AI, everyone will dump a huge list of links on you.
They don’t tell you which ones to check out if you’re a beginner, or which ones to look at if you want to try [X] thing.
In this guide I will also dump a huge list of links on you.
But I’ll try my best to explain what’s what, and give you different starting points depending on what you want to do.
Items on this list with a 🤩 next to them represent my top pick for the category. This rating is entirely subjective and represents my preferences, not what is necessarily “the best”.
This will be long, so use the Contents below to jump to the sections you’re interested in.
Image Generation Apps
I just want to get started fast
In general, a good place to get started is one of the popular image generation apps:
- Midjourney 🤩 produces beautiful results straight out of the box without much effort. It works great across a broad spectrum of styles. It’s also on the way to become the most popular image generation tool by 2023, simply because it can produce the most viral and sharable results.
- DreamStudio is the web app version of Stable Diffusion, created by StabilityAI which is the same company that made Stable Diffusion in the first place. They made an app for the people who didn’t want the hassle install Stable Diffusion.
- 200 credit free trial. 1 generation costs around 1 credit. $10 for 1000 credits.
- DALL·E 2 is a popular app by the company OpenAI (creators of the massively popular AI chat bot ChatGPT). It leans towards a realistic style.
- For 1024×1024 images, the cost is $0.02
For anime generation:
- Niji Journey 🤩 is the Anime version of Midjourney
- Free trial of 25 generations. It comes as a 2-in-1 deal with Midjourney ($10/mo or $30/mo plans).
- List of cool things to try
- NovelAI is one of the most popular online anime generators.
- No free trial for images. Pricing runs from $10/mo to $25/mo.
A discussion of AI image generation wouldn’t be complete without a section on popular mobile apps.
Mobile apps aren’t the most robust generative AI tools. However, they will be what your friends talk about when they talk about AI.
- Lensa is the AI photo editing app that went viral from Instagram and other social media.
- Costs $29.99/year for the premium subscription
- Meitu is a Chinese photo retouching tool. They released an AI anime filter that catapulted them to the top of the international app charts.
- Costs $33.99/year for VIP subscription
- TikTok now has a feature for text-to-image AI generation
Running Stable Diffusion on your Computer
I DON’T WANT to pay and I want to have high-customizability
You don’t want to pay? You want to try all these custom generation models that people are talking about?
You will need to install Stable Diffusion.
For clarification: Stable Diffusion is the name of the model. There are many different models you can use.
All you have to do is download the model file. Model files have the file extension checkpoint (.ckpt).
Most likely, you don’t want to enter long commands in the command line either. You want a nice user interface with sliders and toggles.
The most popular and feature-rich User Interface to use with Stable Diffusion is AUTOMATIC1111’s Web-GUI.
- Installation for Windows with NVIDIA GPU: official instructions
- Installation for Windows with AMD GPU: official instructions
- Installation for Apple Silicon (Mac M1/M2): official instructions | my instructions
- Installation for Linux: instructions by Joshua Kinsey (overall not recommended)
I don’t want to use Stable Diffusion, I want to use [X] model I saw on Reddit/Twitter!
That’s OK. You will install AUTOMATIC1111’s Web-GUI all the same.
Then, instead of downloading the Stable Diffusion model, you’ll download the model that you want.
Huggingface.co is the most popular model hosting website. It features Machine Learning models for all kinds of cool applications. In this section, we’re only interested in the most popular text-to-image models.
You can jump to the Models section below.
Things you can do with Stable Diffusion
Text-to-image is the reason most people use Stable Diffusion. You type in words or phrases (called a prompt) and the AI generates an image for you.
You can input an image and also a text prompt, and generate a similar image to your original.
You can edit/change parts of the image you don’t like.
PROMPT: What you want the AI to think about. Whatever you put in here, the AI will attempt to include it in the output.
- Start with “masterpiece, best quality,” and keep your tokens under 75. To the right of the text box is a counter, which will read as x/75. This is the number of “tokens” or things the AI is thinking about. The AI will let you go over, but staying under 75 tokens yields more reliable output.
- Protip: Try to group related items together in a short phrase. For example: “a busty android girl” vs “girl, android, busty”. With the second prompt you’re more likely to end up with a girl with a robot at her side, and who knows who gets the bigger chest!
NEGATIVE PROMPT: What you want the AI to avoid. Whatever you put in here, the AI will attempt to avoid having it in the output.
- Start with the NAI default above, and be sparing in the number of additional items you add. If you overdo it, you will back the AI into a corner and you may end up with the same image over and over!
SAMPLING STEPS: How long the AI spends working on the image. General rule of thumb is the longer the better, but diminishing returns.
- Start with 20-70. To save time, try to use as few steps as possible while getting an output you’re happy with.
SAMPLING METHOD: How the AI thinks about your image. Different methods use different approaches, and in testing you may discover some yield very similar results.
- Euler and Euler a are popular because they generally produce predictable results.
Protip: While Euler tends to get sharper with more steps, Euler a varies output greatly with steady quality starting from around step 20, that gives it the potential to give good output with fewer steps.
WIDTH/HEIGHT: How big you want the output. This correlates to the time and amount of VRAM needed per output, so you will encounter a VRAM memory error at some point.
- Start with 256-1024px in both directions. Potato systems may not even be able to go larger than the 512×512 default, it depends on your system spec.
CFG SCALE: How “focused” you want the AI to be on your prompt. Lower value = less “focused”, higher = more.
- Start with 5-15. Going below this range may yield random content, where as going too high will limit the variety of outputs.
SEED: Source number for the beginning of AI processing. Two images with the same parameters and same seeds should yield identical pictures.
- Start at -1 (random). Until you find a composition/arrangement you like, you can keep rolling random seeds. Once you find the image you’d like to refine, you can save it by clicking the “recycle” icon next to the box.
Other Ways to Run Stable Diffusion Locally
While we recommend you use AUTOMATIC1111’s Web-GUI, there’s lots of other great implementations to try:
- InvokeAI 🤩 Runs on Windows, Linux and Mac. It has a great community and the developers are adding features quickly.
- Speaking of cool features, InvokeAI has an insanely great canvas mode that lets you do inpainting, outpainting, txt2img and img2img in the same place!
- DiffusionBee is Mac only. It’s an actual Mac App, with an easy one click install. Has inpainting, outpainting, txt2img and img2img. Only drawback is it is not as feature rich as other implementations
Running Stable Diffusion Online
I want to run Stable Diffusion but my computer is really bad / Stable Diffusion is too slow on my computer
You can try any of the following free online services:
- Stable Diffusion v2.1 DEMO: StabilityAI’s free demo of Stable Diffusion on HuggingFace.
- Demo is not very fast: it’s for upselling their paid service DreamStudio
- Mage.space: free generations! You get more features if you login
- Craiyon: Formerly called DALL-E mini, no relationship with OpenAi’s DALL·E 2. Quality is not very good compared to other tools, but they were created early and still have a lot of users.
Or you can jump straight to the big guns:
What’s Google Colab? It’s a service that allows you to run Python code, but on Google’s servers.
It might seem intimidating if you’ve never coded, but it’s really just pressing a series of “run” buttons to execute different sections of code.
- TheLastBen’s Fast Stable Diffusion: Most popular Colab for running Stable Diffusion
- AnythingV3 CoLab: Anime generation colab
- Deforum Colab: for creating animations with colab
Models determine what the AI knows
Which model you use will determine what the AI “knows” and can generate.
Top models such as Stable Diffusion and Waifu Diffusion are trained on large datasets, and are regularly updated to improve their performance.
Anybody can create their own model.
For a comprehensive list, check out: https://rentry.org/sdmodels
Here are the most popular models:
- Stable Diffusion v1.4: Trained on billions of images, Many derivative models are trained on this (download)
- Stable Diffusion v1.5: Not too different from v.14, however handles realism a bit better. #1 most popular model on HuggingFace
- Stable Diffusion v2.0: Supposedly an upgrade from Stable Diffusion v1, however many people like v1 better
- OpenJourney: Stable Diffusion model fine tuned to look more like Midjourney. #2 most popular model on HuggingFace
Here are the most popular anime models:
- NAI Diffusion: #1 most popular anime model. A custom model created to reproduce the style of the Official NovelAI’s model.
- Renamed to Anything, however it is more commonly known as NAI Diffusion. (The website NovelAI’s proprietary model is called NAI Diffusion. However, we will refer to it as Official NovelAI to avoid confusion.)
- Hugely popular in China and Japan. There is a long list of resources in Chinese and Japanese which we have summarized here, and will begin translating in English.
- Waifu Diffusion: #2 most popular anime model. Trained on Stable Diffusion 1.4 + 680k images from Danbooru
- Eimis Anime Diffusion: #3 most popular anime model. Haven’t tried it yet so no comments.
- TrinArt: #4 most popular anime model. Haven’t tried it yet so no comments.
Here are some popular thematic / general purpose models:
- Redshift Diffusion: fine-tuned Stable Diffusion model for high resolution 3D artworks
- Pokemon Diffusion: fine-tuned Stable Diffusion model to generate Pokemon
- MoDi Diffusion: fine-tuned Stable Diffusion model for modern Disney style
- Arcane Diffusion: fined-tuned model for the style of the popular Netflix show Arcane
While all these models may seem confusing, it’s really just plug and play.
If you’ve already downloaded AUTOMATIC1111’s Web-GUI in the previous section, all you have to do is download your model with the link above (navigate to Files and versions tab, then download the .ckpt file).
After downloading the model, rename your .ckpt file to “model.ckpt”, and place it in the
/models/Stable-diffusion folder of your AUTOMATIC1111 installation.
Note: Place as many models as you want in the folder, “model.ckpt” is just the one it will load by default
Warning: Be careful when downloading models
.ckpt files and python files can execute code. This means people can also create models with malicious code that infects your computer with viruses. AUTOMATIC1111’s Web-GUI should remove malicious code from files, but that’s only one line of defense.
You will need to find the right combinations of words that will direct the tool to the content you want to generate – these are called prompts.
There are a ridiculous amount of resources on how to write better prompts. And different tools will have different prompts that work better with them.
So here are some very high-level tips:
- In general, the model knows what you want. You can be highly descriptive with long sentences, or use short words and phrases, both work.
Here are sites where you can find existing prompts:
- Midjourney Gallery: Midjourney makes its users’ generations publicly available (unless they choose private mode)… you can also see every single prompt!
- Lexica.art: There are a number of sites which aggregate images and prompts, however Lexica is my favorite. Just search for what you’re looking for. Quality will vary though, so be prepared to dig around.
- Site note: Lexica also has its own model for photorealism called Aperture.
- List of Artists Represented in Stable Diffusion 1.4: A list of all the artists in the v1.4 Stable Diffusion model, with example generations
- Note that different models have different levels of knowledge about artists. However many models use Stable Diffusion as a starting point, so this is a good knowledge-base to use as a starting point
- Arthive Artist Database: 74,000 artists – not all of them are represented in Stable Diffusion, but great for inspiration.
You can also use image-to-text (img2txt) tools to attempt to reverse engineer the prompt that created an image:
Anime is where prompt resources really shine:
- The Codex of Quintessence: A gigantic Chinese series about NAI Diffusion prompting. Divided into many parts:
- P1atdev’s Websitev: A gigantic repository of prompt knowledge in Japanese, by @p1atdev