Midjourney V5 has just been released. The hype is real and very justified: V5 sets a new bar for realism and aesthetics.
The first version of Midjourney was released July 2022. Newcomers to Midjourney have no idea how much it has improved since then.
Starting with strange, painterly images and moving to full-on masterpieces with a tremendous number of real world applications, Midjourney has become much, much smarter over time.
Let's get to the comparison:
As AI generated art becomes more and more commonplace, people will look back and think of Midjourney V1 as a bad dream. They won’t realize how revolutionary it was at the time.
Take a look at this example:
/Imagine organic house embedded into the hilly terrain designed by Kengo Kuma, architectural photography, style of archillect, futurism, modernist architecture.
Unlike later versions, the art looks more like a painting - rather than a realistic photograph. And you can tell that it’s related to the prompt. At the time of release, the color,depth and materials were quite remarkable.
V1 didn’t let you add details when you upload. This was updated in subsequent versions.
Here’s an example of a Midjourney V1 doing a character:
/imagine pixiv, hyper detailed, harajuku fasion, futuristic fashion, anime girl, headphone, colorful reflective fabric inner, transparent PVC jacket, backpack
The image is colorful, and includes many of the details mentioned in the prompt. However, it looks more like some kind of psychedelic modern art than what the prompter intended, which was a cohesive character design.
In the below variations, the character wears slightly different clothes. Also, the angles differ in each image. It has a standard background, but later versions generate stunning features behind the character. Also, it’s worth mentioning the face is still slightly disfigured.
Disclaimer: To ensure a fair test, we used the same prompts to compare all versions. However, be aware that every version has a different optimal prompting style.
V2 improves V1 considerably. It gave us more cohesive, recognizable objects with more depth.
Version 2’s upscaling feature upscales results with higher resolution as compared to V1. There are more details, but noise and grit still appears in images generated with version.2.
Here’s the anime prompt. You can see the character’s face is much more proportional and her edges are more clearly delineated. The clothes resemble real clothes much more, and the lighting effects are getting better.
The images on the grid are also an improvement to the first version, with better angle-work and better detail. Also, some of the character’s clothing items appear different in each image, which adds to the variation of the results.
In Midjourney Version 3, you can use light Upscale to include less features in your images. Also, version 3 included -uplight to generate images from looking overcooked by the AI - preventing distortion and noise. Also, it helps to balance out softness among other factors.
Version 3 came with the following perks:
- New Imagine Algorithm
- New Upscaler with far less distortion and artifacts.
The improved model included stylize and quality arguments, which are new to the AI.
The higher you set the stylized number, the less likely it is to strictly follow your prompt to the letter. You could now increase the quality of the image, at the cost of higher compute cost. You could now increase the quality of the image, at the cost of higher compute cost.
As you can see in the upscaled V3 image, the features and quality are far superior to the previous 2 versions.The depth, materials, and composition have all improved. We’re moving from a painterly look towards a photographic look.
In comparison to version 1 and 2, the organic house appears to be more structured and well-defined. The background has a lot more to show for it, with stunning views and features on the horizon.
Version 3 showed a greater understanding of the second prompt, and included more specific details in accordance to the description. The pallet is higher, and colors are blended in a greater fashion.
In terms of variation, the model went as far as adding the character to a backpack - instead of having her wear it herself. It’s an interesting take on the prompt, and this is a prime example of Midjourney’s gradual improvement.
With Midjourney version 4, the developers went back to the drawing board. They created a whole new architecture for the model…and it shows. V4 was revolutionary when it was released. Nobody, not even Stable Diffusion power-users had seen anything like it.
So, how does V4 differentiate from its predecessors?
- Better Knowledge of Creatures, Humanoids, Places, Skies, Among Others.
- Improved Attention to Detail.
- Able to Take On Highly Complex Prompts with Great Detail
- Better at Generating Scenes with Multiple Objects and Characters.
- Supports Multiple Prompting and Image Prompting
- Includes-chaos arg to control the variety of image grids. It’s set from 0 to 100.
There is significant improvement in the image quality, complexity, and depth in the below images generated by running prompt 1 through V4 - and it’s nothing in-comparison to V5! You’ll see.
The image contains a variety of different lighting, tone, and details. This is the moment Midjourney images truly to look more like photographs instead of paintings
The structures are well-defined, the trees are thoroughly detailed, and the frames look precise. The image looks almost three-dimensional with a great level of depth.
Inevitably, the upscaling has massively improved in quality and detail.
Editor’s comment: I love the evening sky with the dark rocks and lit-up houses. It’s truly incredible, and utterly picturesque.
As for the grid, the level of commitment that went behind developing V4 is highly noticeable. The high variation is to the extent each image has almost a slightly different climate geographically.
Although the highest resolution of V4 output is 1024pv by 1024px, you still get highly detailed images - and you haven’t seen anything yet! V5 is even better.
The second prompt in version 4 has turned out extremely well. The hair, face, clothing, background, and other features are remarkably detailed. The art is rich in color and vibrance, which shows the true potential of this model.
Unlike the first version of this image, all features are clear. It goes to show the amount of effort that went into developing Midjourney as an AI image generator.
The grid features a variety of takes on this prompt in V4. It’s safe to say these images look less like cartoons, especially with the clothing items - all while maintaining the Anime look. It would take a human many man-hours to produce an image of this quality without using an AI.
Midjourney V5 is the most trained and newest version of the AI tool, which was released on March 15th, 2023.
In comparison to previous versions, the model is better at interpreting natural language prompts.
The model introduces a new way to prompt, which includes writing your description in a mere sentence. Don’t worry about parameters, style methods, commas and slashes.
So, what’s new with Midjourney V5?
- A greater level of stylistic range, along with more responsive prompting.
- A significantly higher level of image quality with 2x revolution increase, and enhanced dynamic range.
- A higher quantity of detail with enhanced precision - resulting in less unwanted text.
- Faster image prompting with better performance.
- IW for weighing image prompts VS. text prompts.
The latest version also comes with experimental features:
- Tile Argument Rations Greater than 2:1
- Ar Aspect Ratios Greater than 2:1
What does all of this mean to you?
- Stylistics: The model’s aesthetic range is far greater than others. Meaning, it understands natural language prompts more accurately than before.
- Quality: The dynamic range and resolution is double the amount as before - making higher quality images at a faster rate. As compared to version 4, which required users to wait for each image to get upscaled, version 5 instantly showed larger images immediately after clicking the upscale button.
- Detailing: With Midjourney V5, the neutral network excels when it comes to minor detail. It goes as far as to drawing 5 fingers on each hand it generates, and doesn’t include additional facial features like an extra eyelash. The company states they have enhanced facial detailing at an immensely high level.
One of the main perks of Midjourney 5 is that you can create realistic photo visuals, which may appear uncanny to others due to how real they look. Many people are bound to say “Wow, that person really isn’t real?”.
Before we get to our 2 prompts, I ran a photovisual prompt to demonstrate how advanced Midjourney has gotten since the first version was released.
We are truly able to generate images with far greater graphic detail than ever thought was possible 9 months ago. The image looks as if it was taken yesterday, but the truth is it’s 100% AI generated.
As per the prompt 1, the result is impeccable. In comparison to the first version of this image, it’s like comparing an artist’s impression to a high resolution photograph.
While the organic house looks as if it's a real build, the trees in the fore and background blend with the image perfectly.
The inside of the house can also be seen, which makes this image look even more convincing.
The attention to detail on the rocks below the organic house look like a real cliff-top, while the wooden infrastructure is thoroughly defined and precise. The features of the image, including the inside of the house, add depth to the image - providing the observer with an effective three dimensional experience.
With the grid, variation is at its peak. The images include different angles of the building, which reveal features that otherwise wouldn’t be seen in this prompt. To state the obvious, this means version 5 is far more thorough than its predecessors.
The collage also shows different parts of the day, as the way the sunlight is positioned differs on each image.
Below is prompt 2 generated by version 5. Again, there’s a significant difference in the upscaling. The resolution, detail and palette is flawless.The background features distinctive features of a city at night.
The lights in the background, along with the colors of the character’s clothes and hair blend well together in the image. The background is out of focus, which adds depth to the image.
The images on the grid are well-varied. Like version 4, the variants feature different angles of the art. This also shows features you’d otherwise not see in previous versions. Also, version 1 didn’t generate much of a background - unlike V5 which includes vibrant backgrounds.
The character is wearing different clothes, while they vary in color. Also, the backpack is different in each image.
How Well Do They Generate Variants?
Now, let’s dive into how well the different versions of Midjourney generate variants with image to prompts. Like the prompt examples, we have chosen to opt with the same stock image to ensure a fair comparison.
Below is the stock image from PixaBay. We felt it would be best to use a real image in these examples to create a real-life scenario.
Like with the imagine prompts, version 1 handled our image to image prompt by generating a painting of the glasses and bottle. It looks like something you’d find in an art gallery from the 14th century - painted on a wooden canvas.
Version 2 came up with some interesting takes on the prompt - generating patterns that vary in color. There isn’t much going for it, but it did generate something rather original.
Version 3 also created something original. It looks like the skyline of a megacity. Whereas, the other images in the grid look like the glass up close - resembling pyramid shapes.
Now, this is where it gets interesting. Version 4 has come up with something completely unique with this prompt. Not only did it generate variants on the wine glasses and bottle, but also added slightly haunting onto its images.
I didn’t add any additional text to the image to image prompt, the AI did this on its own accord.
It’s clear version 4 was the most experimental Midjourney model, and some could the AI is trying to create an appearance for itself - a lot like Loab appearing in hundreds of AI images.
The grid contains 2 figures, one male and the other female. Could the model be trying to tell us a story? It’s quite possible.
Backgrounds look distinct and vast, resembling a view from a great height. So, the setting of them is an apartment, high-up restaurant or roof-top bar.
It’s evident version 5 is a little less experimental with handling image to image prompts. Although, maybe this is a good thing.
Rather than coming up with cool-looking figures, this model created variations of the original image’s layout.
Version 5 has been faithful to the original image, which I find pretty impressive. It has come up with nametags and different camera angles. Also, the positioning of the wine glasses differs from image to image.
Although version 5 focuses on detail and is more thorough than previous models, it’s safe to say it’s a little less experimental than its counterparts - unless you instruct it otherwise.
Midjourney has changed a lot since its initial release. I believe version 6 will continue the trend and continue to wow us.
Midjourney went from producing psychedelic and strange artwork to high quality images that can be used in a number of different applications.
If you're new to Midjourney we do recommend checking out the older versions, if only to see how far we've come in a very short time.