Dall-E vs Midjourney comparison

--

Recently, I was chatting with a coworker, and we discussed two cool AI tools: Dall-E and Midjourney. I’ve been using Midjourney for a bit, but I didn’t know much about Dall-E. So, I decided to give Dall-E a try and got myself a subscription, since I already had Midjourney.

Dall-E vs Midjourney

Which one is better? So, I picked seven categories from Instagram to give me ideas, and I used those as prompts for both Dall-E and Midjourney. Then, I compared what they came up with based on their accuracy, how well they put things together, and how close they stuck to the prompts.

To keep things fair, I didn’t use any special settings or filters specific to either tool. I just kept the prompts simple and generic, like someone new to all this.

Categories
1. Portrait
2. Landscape
3. Fashion
4. Food
5. Architecture
6. Wildlife
7. Astro

Criteria — rated from 1–10

a. Composition — Aspects like balance, symmetry, leading lines, framing, and the rule of thirds. How the subjects or focal points are positioned and how they interact with the background. The use of colour, lighting, and perspective enhances the overall composition and visual impact.
b. Photorealism — How accurately they depict real-life subjects or scenes. The level of detail, clarity, and sharpness in the images. The realistic colours, textures, and lighting closely resemble those found in reality. Factors like depth of field, perspective, and shadows to gauge the authenticity of the photos.
c. Closer to the vision — This is a bit subjective, but then vision is the prime use case of text-to-photo AI isn’t it?

So here we go

1.Portrait — Prompt {Baby’s portrait at dusk in the garden, smiling amid lush foliage, bathed in golden evening light.}

Comments — Although both models came very close to each other, while Dall-E prioritised photorealism, Midjourney went full creative mode, which I like. Midjourney created 4 distinct styles from the same prompt- photorealism, digital print, oil painting, and possibly watercolour.

So the verdict

a. Composition — Dall-E — 8 / Midjourney — 9

While both models picked up the rather simplistic prompt very accurately and came up with brilliant images, the mid-journey edges forward in creating rather 4 versions of the composition, which I believe gives a lot of freedom to the user. Dall-E just went with one frame and just changed the babies.

b. Photorealism — Dall-E-7 / Midjourney — 9

This is hands down Mijourney all the way, while Dall-E has improved in recent times but Midjourney v.6 is possibly the most accurate model in the world right now. If you zoom closely, the photos from Dall-E still have that AI sheen on the skin, which I believe is one of the most difficult thing to reproduce, Midjourney kicks this out of the park with just one photorealistic pic.

c. Closer to Vision — Dall-E — 8 / Midjourney — 9

I think even this Midjourney takes the cake, I know it is subjective, but when I saw the 3 other renditions of the same prompt I was rather surprised and would use any of the photos there without any further editing, thereby saving your instance money as well. For Dall-E I would have to edit to bring that photorealism.

So after round 1 — Dall-E 23/ Midjourney 27

2. Landscape — {Awash in the morning sunlight, a lush valley unfolds: a meandering river, an alpine cottage nestled among verdant hills, smoke curling from its chimney. Nature’s tranquillity envelops the scene, inviting peace and serenity in this picturesque landscape.}

a. Composition — Dall-E 9/ Midjourney 9

I feel in this attempt both models performed very well. I know many of you might want to pick Midjourney here, but if you look at the prompt I did not specify the style and the models were open to choose any style. While Dall-E chose a more surrealistic painting, Midjourney on the other hand went for a detailed one. And neither of them is bad, it is just based on your perception.

b. Photorealism — DallE -NA/Midjourney-NA

I would rather abstain from this category as neither was required to be photorealistic, it was based on the model’s preference to choose a style.

c. Closer to Vision — DallE-8 /Midjourney — 9

I think if you ask me which one of the two gives me more serenity and peace, it has to be the detailed one from Midjourney and it is just my preference.

So after round 2— Dall-E 40/ Midjourney 45

3. Fashion — {A high-fashion shoot featuring a stunning model amidst the iconic streets of Paris. Radiating sophistication and charm, the model exudes effortless grace in chic attire, set against the backdrop of historic architecture and bustling boulevards. Capturing the essence of Parisian haute couture in every frame.}

Comments:- I mean what can I say, the consistency of Midjourney here is too good, it is now basically showing off.

a. Composition — Dall-E 7/ Midjourney 9

Although the composition for both models is spot on, but Midjourney’s ability to vary the elements and create a whole new set of pictures is very liberating and works as an assistant rather than DallE’s rather similar-looking composition, not saying it is bad, but Midjourney here is much better.

b. Photorealism — Dall-E — 6/Midjourney-10

I would say this is the defining moment in this analysis, Midjourney is as close as it has come to the real picture as it can. Dall-E on the other hand is still AI AI, if you know what I mean.

c. Closer to Vision — DallE-8 /Midjourney — 9

I will not discount DallE’s callousness in creating a very fake-looking model here and would give both very close points as both matched what I had in my mind. But by god, Midjourney's latest V6 is so good.

So after round 3— Dall-E 61/ Midjourney 72

4. Food — {A tantalizing photo for your food blog or magazine, showcasing a mouthwatering dish as the star. With an expertly blurred background creating a bokeh effect, the focus remains on the delectable food, inviting viewers to indulge in its flavours through the screen.}

Comments:- In this, both come very close to each other, not so much as they did in Landscape, but still very close, let's get into this.

a. Composition — Dall-E 8/ Midjourney 8

I would say the composition from both models is spot on. I cannot make any differentiation here, as the food placement, the background, the rule of thirds, lighting, and background are very similar and amazingly done, so a tie.

b. Photorealism — Dall-E — 8/Midjourney-9

Although both DallE and Midjourney have delivered very photorealistic images if you just think from an eater’s perspective, which one looks more palatable? I would assume the Midjourney’s representation is closer to how a food might look like. DallE seems to be creating some weird versions of Ratatouille.

c. Closer to Vision — DallE-8 /Midjourney — 9

For me it is Midjourney again, for the reason the food looks much closer to what I had in my mind, also it looks more posh/fine dining.

So after round 4— Dall-E 83/ Midjourney 98

5. Architecture — {A nostalgic feature in Architectural Digest highlighting a post-modern masterpiece from the 70s. With a grainy filter reminiscent of the era, capture the imposing beauty of this brutalist gem nestled in the highlands of Eastern Europe. Evening hues cast a captivating glow, framing breathtaking panoramic views.}

a. Composition — Dall-E 8/ Midjourney 7

I would say in setting this scene up Dall-E has surpassed Midjourney. This is an increasing issue with Midjourney, it just focuses too much on the subject, although it has added an amazing captivating background, Dall’E has been very true to this particular section of the prompt — Evening hues cast a captivating glow, framing breathtaking panoramic views.

b. Photorealism — Dall-E — 8/Midjourney-9

From photorealism, I believe there is no doubt the sampling MIdjourney is doing is above par and the same has been seen here. The picture looks more crisp, detailed, and very true to real.

c. Closer to Vision — DallE-8 /Midjourney — 8

I would say in some ways or another both have come close to my vision for this picture. Yes, they do not look alike, but from my perspective, they both were the kind of vision I had when I was writing this prompt. I would end this section with a footnote that Midjourney still looks more like a picture that will get featured in Architectural Digest.

So after round 5— Dall-E 107/ Midjourney 122

6. Wildlife — {Tiger drinks water from a lake and looks into the camera. With intense focus on the tiger’s piercing gaze, ensure the reflection shimmers on the water’s surface, conveying a moment of raw beauty and connection between predator and lens. Capture the majesty of wildlife in a semi-arid landscape as a tiger pauses to drink from a tranquil lake.}

a. Composition — Dall-E 7/ Midjourney 9

By now I would say Midjourney has been talking to all the right people in photography circles. Look at those pics, if they were taken by any photographer, they would have featured in Natgeo, to say the least. If you consider the frame from Dall-E how it is similar to Midjourney, but Midjourney’s trickery with zooming out just a wee bit makes such a big difference that it offers a complete pic.

b. Photorealism — Dall-E — 6/Midjourney-10

I think Dall-E is still in 2022 and Midjourney is pretty much photorealistic as it gets. Let's close at that.

c. Closer to Vision — DallE-8 /Midjourney — 10

Both are very close but Midjourney was what I was looking for. The ferociousness of the tiger is seen in the eyes as it looks at the camera, the ripples on the water, and the reflections on it are just top-notch stuff. I would not edit a bit on Midjourney’s responses.

So after round 6— Dall-E 128/ Midjourney 151

7. Astro — {Through the lens of a bubble telescope, immerse viewers in the vast expanse and vibrant hues of the Crab Nebula. Captured with a super wide-angle perspective, unveil the magnanimity of this celestial wonder, showcasing its intricate details and mesmerizing colours against the backdrop of the cosmos.}

a. Composition — Dall-E 5/ Midjourney 8

Although I liked the representation of the nebula that Dall-E came up with it is not the composition I was looking for. This goes to show that at times these models do pay weightage to the wrong keywords. So for incorporating an actual lens in the frame I have to deduct some marks there. Had it been one of the four as a creative choice, things would have been different. For mid-journey, this is how astrophotography is done and this is what crab nebula looks like, so pretty standard there.

b. Photorealism — Dall-E — 6/Midjourney-9

I think I have already alluded to it, Midjourney is very close to actual Hubble pics so top marks there. And Dall-E it seems does not know what Crab nebula is, which is rather surprising if ChatGPT sits behind Dall-E then why it did not have the context of it?

c. Closer to vision — Dall-E 5/ Midjourney 8

It is now evident that Midjourney did what it does, it makes great, accurate, and true-to-prompt images. Dall-E needs to work harder, it still lags a lot in all of those departments.

RESULTS

DALL-E — 144 and Midjourney — 176

Midjourney knocked out the big brother OpenAI’s Dall-E there.

In Plain English 🚀

Thank you for being a part of the In Plain English community! Before you go:

--

--

14 years of understanding users, business, and products. Love AI as much as UX and want to see how either of them can match to provide a better world for all.