DALL·E 2: The Ultimate Guide for Beginners

Created by OpenAI, DALL·E 2 allows you to generate unique images without possessing necessary artistic skills.

It’s relatively easy to use, and is available online for everyone to access. With a text prompt that is connected to an AI that automates original imagery, and an upload feature that creates new variants of uploaded pictures. It’s similar to Stable Diffusion & Midjourney, but it’s more user-friendly and easily accessible. The possibilities are practically endless - excluding NSFW content.

In this article, we’ll go through exactly what DALL·E 2 is, how to use DALL·E, how it works, the controversy behind the app, and some useful guides to help you along your new journey.

What is DALL·E 2?

DALL·E 2 is an AI tool that creates realistic images, art, and illustrations from naturally written descriptions written in a text prompt. It’s algorithm combines concepts, attributes and styles. Also, it expands images beyond their original canvases, expanding new compositions. So, in addition to producing original imagery, it can also make realistic edits to real photographs using captions. It does this by adding or removing objects, shadows, reflections, textures and other details.

The software is accessible online through a web browser, so, it’s web-based. In general, it is easy-to-use for beginners with a familiar appearance. So, you can easily get the hang of it to generate some incredible artwork based on existing work - or something entirely original. Also, experimenting with the AI tool is a simple and straightforward process because all it requires is a text description & an image (if expanding on artwork that’s already in existence).

Similar to Mid Journey, it can create multiple variations of original images. Its deep learning AI has been trained to know the relationship between images and their descriptions. It uses diffusion, an automation that begins with patterns of random dots. Then, alternates patterns towards images, following the recognition of segments within inputted imagery. DALL·E 2 is the most up-to-date version of the AI tool.

Below are 2 examples of what you could produce with DALL·E 2:

Example

Prompt: An impressionist painting in the style of vincent van gogh.

Image Variation Example

Original Image:

4 New Variants:

How to Use DALL·E 2

Using DALL·E 2 is a straightforward process that’s easy to get the hang of. Below are detailed steps to help you learn how to use the AI tool effectively and without problems. It’s easy-to-use, so you don’t have to worry about facing difficulties. However, this guide will help you along the way - removing any doubts you may have about the app.

Step 1: Image Generation

When it comes to AI image generation, it’s all about coming up with good text prompts to feed into the algorithm. To generate an image, simply input a text description into the prompt in-front of you. This can be anything from an artist's impression of Bangkok’s nightlife scene in the style of Bansky, or a self portrait of an elderly in the style of Piccaso.

Alternatively, you can do something along the lines of a 1980s pixelated version of the ancient scrollings found in the underground tomb near the Pyramid of Djosa. So, you can get creative with this. The possibilities are endless, you can generate anything you want - providing it isn’t harassment, doxing or illegal content.

Once you have come up with a text description, click on ‘Generate’. After clicking generate, all you have to do is wait for around 30 seconds to 1 minute. Then 4 results will appear in-front of you. These results are generated images that are original. THey’re fully automated by the AI, and

Then, all you do is select the image that you want to keep, and click either Create Collection or Right-Mouse Click and ‘Save As’ to download it to your computer.

When you create a collection, you can name it, and choose whether or not you want it to be private or not.

Creating artwork with AI and a text prompt is considered an art form. It’s all about accurately describing how you want your generated images to be. Also, it’s worth learning how to use DALL·E 2 the correct way to fully take advantage of its algorithm and capabilities. So, you can show off some incredible masterpieces to loves ones and friends.

It’s worth keeping in mind that DALL·E 2 doesn’t enable you to change aspect radio on your AI generated images. You can upscale them in other apps, but it’s not going to be as good as having an in-app upscaler where you can alter the original image’s size. Also, many upscalers are out of reach for many, due to the expense involved in signing up to them.

Step 2: Adding Variations to Pre-Existing Images

DALL·E 2 is an excellent image editing tool, as it generates variations of images. This includes different angles, styles, and themes. Also, it has the ability to expand on original photographs. To start generating variants of your photographs, simply click the ‘Upload’ button, which is included in the screenshot below. It is above generate, and to the right of ‘Surprise Me’.

Upon upload, you can either crop your image or skip the cropping process entirety. I suggest cropping your image for maximum effect with the AI.

Then, you click on ‘Generate Variations’. Alternatively, you can go back to editing your image with the ‘Edit Image’ button. Upon clicking the second button, you will come across the below screen. It’s nothing to worry about. It’s a progress bar to tell you how long the image generation process is taking.

You will get 3-4 original variations of your original image. They will differ in style, angles, and themes. If you don’t like what the AI has generated, you can repeat the process.

Providing you like what you see, you can click ‘Save to Collection’ or ‘Create Collection’.

To create a collection, simply give it a name. You can choose to make your collection private by flicking the ‘Make This Collection Private’ switch. To proceed, simply click on ‘Create Collection’.

Your collection will appear on the ‘Save to Collection’ option. You can save multiple generations into it. This will allow you to compare them later..

To publish your collection, simply click on ‘Publish’. This will enable other people to see what you’ve generated.

Now that you have published your collection, you will be provided with a unique link to it. Click ‘Copy Link’ to share with loved ones and friends. Below is the collection I created for your convenience:

http://labs.openai.com/s/bUirT32KR1YGAwAFd5cEWpMt

Also, below are the images included in the collection:

Also, you can edit variations in their text editor by clicking on ‘Edit’ when viewing one of your results. In addition, you can generate more variations by clicking on ‘Variations’.

Below is a screenshot of the in-app photo editor. From here, you can erase features and objects. You can also download your edit with the download button in the top right corner of the screen.

Here’s a screenshot of 4 variants based on this Ai generated variation of the original image.It’s also included in the above collection for your convenience.

Step 3: Styling

When it comes to styling, you can choose from literally all of them (that’s in recorded history). If you’re trying hard to generate photo-style images in DALL·E 2, it’s wise to get your wording correct.The best Ai-generated art comes from a range of complex text prompts, along with using styling keywords correctly. So, that’s why learning how to use DALL·E 2 correctly gives you access to a vast amount of possibilities.

Below are some styles you can experiment with in-app:

Autochrome: Color photos started to occur in the 1900s, and reached more people in the 1930s. Although, color photography was too expensive for the Average Joe at the time. In this style, various shades of green and pink often occur. Whereas, blue and orange tend to be faded or lost in the green and pink abstract.
Surveillance: Something that’s worth trying is ‘CCTV’, ‘surveillance’ and ‘grayscale’ for an eerie effect. You may end up with an image of a quiet supermarket or an animal in the middle of the street.
Daguerreotype: If you want to generate semi-dystopian-type imagery without mimicking Banksy’s style, it’s worth trying out the first publicly available photography process that was popular in the mid-19th century.
Disposable Camera: Generating images in the style of the 1980s with the disposable camera creates a retro feel to your imagery. It makes your art low-quality and grainy, adding a sense of nostalgia.
Double Exposure: This style allows you to combine two images together to create a mesmerizing artistic effect in your artwork. Consider diversifying your backgrounds with color or a variety of textures with silhouettes.
Aerial Photography: An aerial view allows you to see far more than what the human eye can see. Consider utilizing it into your artwork by typing something along the lines of ‘view from above’.
Catwalk and Fashion: If you include ‘editorial fashion photography’ within your query, along with ‘masked’ or ‘viewed from behind’, you can get masked figures in red angular suits, people in mustard and mauve outfits, or hooded individuals in an eerie setting. Also, it’s worth including names of famous fashion photographers and magazine styles for maximum effect.

Those are mere ideas that the AI is capable of generating. You can also type queries like ‘Award-winning stage show design, influenced by Queen’s We Will Rock You, and high-resolution theater images’. In addition, you can include the above styles within prompts like this. In essence, the world is your oyster!

Below is an example with the aerial view effect:

Prompt: ‘Theatrical design inspired by Queen's We Will Rock You stage show (Aerial View of stage set)’.

You can also do it with theater costumes:

Prompt: Theater costumes inspired by Queen's We Will Rock You (press release).

H4: Let’s try two more:

Prompt: Stage set inspired by queen's we will rock you cctv footage (off-hours).

Just because I can, I’ve ran the same prompt as above, but with an elephant in the middle of the stage. So, the potential is truly limitless.

Prompt: Stage set inspired by queen's we will rock you cctv footage (off-hours) with a huge elephant in the middle.

Step 4: Generating Ancient & Modern Art (& Everything In-Between)

Providing you wish to take it up a notch, you can generate art based on the following styles that were used throughout human history:

Early Human Art
Ancient Pharaohs Tomb Paintings
Ancient Rome Mosaics & Roman Works
Baroque to Neoclassical
19th Century
Modern Art

H4: Early Human Art

Early humans started creating art by painting in caves roughly 43,000 years ago. It’s possible to bring history back to life by typing prompt like:

Prompt: Cave Paintings in the Style of 43,000 Years Ago

Crazy Thought: Just to think, isn’t it amazing how humans went from drawing stick figures in caves to using AI to generate imagery? We have imminently advanced in 43,000 years.

H4: Ancient Egyptian Tombs

Around 5,000 years ago, but 7,000 according to local Egyptians, Ancient Egyptian art is mysterious & imminently interesting. To generate art in the style of this period, simply type a prompt similar to the following (or feel free to try the same description!):

Prompt: Ancient Egyptian mural found on a Giza tomb.

If you want to go completely off the rails, try something like this:

Prompt: Ancient Egyptian mural with modern humans walking with their heads down on their smartphones in the style of the art.

This one would drive conspiracy theorists crazy!

H4: Ancient Rome Mosaics & Roman Works

Ancient Rome has a variety of different mosaics around the city. To prompt a mosaics with a man holding an Apple, or a time traveler from the distant future, or anything of your choice, type the following:

Prompt: Ancient Roman Mosaic depicting a time traveler visiting Ancient Rome, well-preserved opus from Ancient Rome.

Roman paintings were found across the Roman empire, but especially in its capital. Like mosaic (and any other style), you can generate literally anything within the style of a Roman painting.

Prompt: Ancient Roman painting depicting a man holding an apple while wearing Roman attire. In the style of Roman Turkey.

H4: Mozarabic Art

In the middle ages (8th - 11th century), Mozarabic Art was huge on the Iberian Peninsula during the Arab rule of the region, which is now Spain. Like other styles, you can experiment with it by staying authentic or going completely off the rails! To generate Mozarbic art, simply type use the prompt below:

Prompt: Mozarabic art of Cat in Sunglasses, illustration from a Mozarabic Bible, 11th Century Leon

H4: Middle Ages

Bestiaries were beasts that were featured in art during the middle ages. To generate images in this style, simply use the following prompt below:

Prompt: Detail from Middle Ages bestiary of humanoid dog in a straw hat, in the style of illuminated 13th century manuscript.

H4: 19th Century

In the 19th century, paintings of royals were popular, along with photos of the dead. If you go to a British museum, you’ll find plenty of these images. The below image was the closest AI automation I came across when asking DALL·E to produce a British Victorian photograph of the dead. Below is the prompt I used:

Prompt: 19th century photograph of the dead (Victorian Britain Style).

H4: Modern Art

Modern art is vast, and comes in an infinite number of different styles. We have come a long way since the stone age. We get sculptures made from old radios, carpets where everyone can draw things with their fingers on, paintings, NFTs, among many other artforms. For modern art, we’ll focus on Dystopia.

Prompt: A Dystopian train station with cameras everywhere. High resolution image in the style of Bansky.

Step 5: Inpainting

Inpainting is the process of removing an image's background, and replacing it with something else with an AI image generator.

To inpaint, simply go onto DALL·E, and click on ‘Upload an Image’.

Then, press ‘Crop’.

After, click on ‘Edit Image’.

Following that, use the eraser to erase the background.

After, enter a prompt, and click on generate.

After around 30 seconds, a new AI generated background should appear.

It’s as simple as that.

Outpainting

Outpainting is the process of expanding an image beyond its boundaries. Firstly, you repeat the same process as before. Upload an image and crop it. Then, remove the background with the eraser tool.

Then, click on add generational frame.

From here, you can select an AI generated image to include in your new, expanded image. Once chosen, click ‘Accept’. From here, your image will be expanded. Unfortunately, we ran out of credits. So, we were unable to finish the process. The app allows a fixed amount per month. I advise you to be careful with your credits.

How DALL·E 2 Works

DALL·E 2 has been trained to map text inputted into a text prompt to a representation space. The model encodes it using an image encoder that obtains the semantic information within a user’s inputted prompt. As a result, the image decoder sophisticatedly automates 3 original images, which are visual representations of the semantic information. So, you're telling the AI tool exactly how you want your picture to look like - and it generates 4 unique pieces of artwork based on what you’ve asked it to create.

It generates images based on text descriptions that are fed into the AI model. Also, it automates variations of images based on pre-existing art. Below is a breakdown of how the app actually works, but firstly, take a look at what I generated with the AI tool.

Below is an image generated by DALL·E 2:

The above image has been automated based upon a description imputed by a user (the writer of this article - in this case). It features a Japanese couple overlooking either a staircase or waterfall, and is in the style of Japan’s traditional watercolor painting. This is an ancient art, and has been practiced in their national art for millenia. Personally speaking, Kyiv art museum has Japanese water color paintings from thousands of years ago, and they look just like the images DALL·E has generated based upon my query.

Below is the query:

“An artist impression of a couple, Japanese watercolor style”. The output of this query is included above, and it looks remarkably authentic. No bias intended, but it looks like a human-being has painted the piece in the above image.

Step 1: Linking Text and Visual Content

Firstly, you input a query into the AI text prompt. It can be anything along the lines of artist impressions, pixelated art, or any style of your choosing. The way a user’s description is linked with virtual imagery in DALL·E 2 is operated by an Open AI model entitled ‘CLIP’ (Contrastive Language-Image Pre-Training). This deep learning AI has been taught with a diverse range of captions and images. Rather than attempting to predict a caption given to an image, the algorithm keeps the quantity of a text snippet in mind when relating it to a picture.

CLIP is a contrastive objective that enables its deep learning algorithms to learn & adapt to the links between textual and visual information based on abstracts. The whole DALL·E 2 model is dependent on CLIP’s ability to remember semantics from naturally written human-like language.

Before we go into how the process works, we need to understand the terminology. Here’s a list for your convenience:

Concine Similarity: Cosine similarity refers to the similarities between two vectors that are within the dot product of magnitudes. It does this by measuring angles between 2 vectors in something referred to as a vector space. So, machine learning recognizes similarities between vectors. To put it plainly, the AI detects properties within ‘vectors’ to decide whether or not they’re similar. So, a matching result can be automated by DALL·E.
Training Data: The AI is trained on WebImageText sets of data that are composed of 400 million image and text packets. The text data is natural language that a human would type on a daily basis.
Parallelizability: Parallelizabilityy refers to CLIP’s deep learning process that’s inevitable immediately during use. Every encoded text-string and image, along with cosine similarities, may get processed alongside each other in a parallel fashion.
Text Encounter Architecture: The text encoder is a mere transformer that turns text to recognizable data.
Image Encoder Architecture: The image encoder is a transformer that the same thing, but with images. The only difference is that it’s a vision transformer, instead of a text transformer.

Below is a breakdown of how CLIP is trained:

Respective Encoding: All images & text descriptions get sent through their relevant encoders that map every object into an m-dimensional space.
Pairing: The AI works to pair text and images together based on cosine similarity.
Maximize and Minimize Cosine Similarity: The deep AI’s aim is to maximize cosine similarity amongst N correct encoded caption/photo pairs. Whereas, it also minimizes cosine similarity between N squared and N inaccurate encoded photos and descriptions.

CLIP plays an important role in DALL·E 2. Reason being, it’s what drives the whole semantically related operation using natural language processing to automate visual concepts based on Concine Similarity.

Step 2: Visual Semantic-Based Image Automation

Following the deep learning process, the CLIP model becomes stored in its present state. DALL·E 2 begins its second task, which is attempting to reverse the image encoding mapping process that CLIP has been previously taught.

As an algorithm, CLIP adapts to new representations of digital space that’s relevance is easy to detect - in regards to textual & visual encoded information. Although, we are focusing on image generation in this article. So, it’s important to learn the process of exploiting representations within space to successfully undergo the process.

When generating images, OpenAi uses a different version of a model they developed previously entitled ‘GLIDE’. This algorithm learns to invert final results from the image encoding process to successfully decode CLIP image automations.

For example, if you have an image of a woman playing the guitar, and run it through CLIP’s encoder, GLIDE utilizes its features through its encoder to automate an original piece of art with salient features that the pre-existing picture has.

Below is an example of the process:

As illustrated in the above graph, it’s important to keep in mind the aim isn’t to build autoencoders. Instead our goal is to automate original imagery that keeps salient features of their originals. To successfully maintain elements and objects, GLIDE takes advantage of Diffusion. So, it can easily identify segments of pre-existing images to generate new artwork that’s original while respecting the rules of the original photograph.

Conclusion

Using DALL·E and its successor are perfect ways to generate original images for publication purposes, along with personal use. In addition, it’s inpainting and out painting feature, as it can expand on pre-existing images & modify them. Also, the image variation feature makes it possible to have different variations of your picture. This includes camera angles, styles, scenarios, among many other things.

Learning how to use DALL·E the right way allows you to fully take advantage of its features, generating some masterpieces at a few clicks of a button. Experimenting with the text prompt makes it possible to generate any art you want, as long as it’s legal and doesn’t infringe anyone's rights. Thankfully, the AI has blocked NSFW requests, so you don’t have to worry too much about being harassed with this software.

In recent years, text prompting with AI has become a recognized modern art. The possibilities are limitless with vast potential. One day you may feel like generating an original piece of the secret Chinese pyramids that are covered in grass and trees (potentially). The next, you could generate an image of a man holding a coffee mug in a cafe 19th century style. So, the world’s your oyster. Feel free to type anything you can think of in DALL·E’s text prompt, and check out your results!