Get all your news in one place.
100’s of premium titles.
One app.
Start reading
Tom’s Guide
Tom’s Guide
Technology
Ryan Morrison

I put 5 of the best AI image generators to the test using NightCafe — this one took the top spot

Barista hands woman a coffee in AI generated clip.

Competition in the AI image generator space is intense, with multiple companies like Ideogram, Midjourney and OpenAI hoping to convince you to use their offerings. That is why I'm a fan of NightCafe and have been using it for a few years. It has all the major models in one place, including DALL-E 3, Flux, Google Imagen and Ideogram.

I've created a lot of AI images over the years and every model brings something different. For example, Flux is a great general purpose model in different versions. Imagen 4 is incredible for realism and Ideogram does text better than anything but GPT-4o.

With NightCafe you can try the same prompt over multiple models, or even create a realistic image of say a train station using Google Imagen, then use that as a starter image for an Ideogram project to overlay a caption or stylized logo. You can also just use the same prompt over multiple models to see which you prefer.

NightCafe also offers most of the major video models including Kling, Runway Gen-4, Luma Dream Machine and Wan 2.1. For this test we’re focusing on image models.

Picking a favorite model

Having all those models to hand is a great way to test each of them to find the one that best matches your personal aesthetic — and they’re each more different than you think.

As well as the 'headline' models like Flux and Imagen, there are also community models that are fine-tuned versions of Flux and Stable Diffusion. For this I focused on the core models OpenAI GPT1, Recraft v3, Google Imagen 4, Ideogram 3 and Flux Kontext.

I’ve come up with a prompt to try across each model. It requires a degree of photorealism, it presents a complex scene and includes a subtle text requirement.

The prompt: “A small independent coffee van parked on a quiet cobblestone street in Paris during early autumn, captured in candid 35mm street photography style with natural light and shallow depth of field. Golden morning sunlight reflects off the damp stones after a light rain. The van is a matte forest green Citroën Type H, with a hand-painted chalkboard sign leaning against it that reads “Café du Matin” in elegant cursive. A barista in a denim apron hands a coffee to a smiling elderly woman in a beige trench coat holding a small umbrella. Fallen leaves gather near the tyres, and gentle steam rises from takeaway cups on the wooden counter.”

1. Google Imagen 4

(Image credit: NightCafe/Future/Ryan Morrison)

Google’s Imagen 4 is the model you’ll use if you ask the Gemini app to create an image of something for you. It's also the model used in Google Slides when you create images.

This was the first image for this test and while it captured the smoke rising it emphasised it a little. It did create a visually compelling scene and followed the requirement for the two people in the scene. It captured the correct vehicle but there’s no sign of the text.

2. Flux Kontext Max

(Image credit: NightCafe/Future/Ryan Morrison)

Black Forest Labs Flux models are among the most versatile and are open source. With the arrival of the Kontext variant, we got image models that also understand natural language better. This means, a bit like OpenAI’s native image generation in GPT-4o, it gives much more accurate results, especially when rendering text or complex scenes.

Flux Kontext captured the 'Cafe Matin' perfectly, got the woman right and it somehow feels more French than Imagen but I don't think it's as photographically accurate.

3. OpenAI GPT Image-1

(Image credit: NightCafe/Future/Ryan Morrison)

GPT Image-1, not to be confused with the 2018 original GPT-1 model, is a multimodal model from OpenAI designed for improved render accuracy, it is used by Adobe, Figma, Canva and NightCafe. Like Kontext, it has a better understanding of natural language prompts.

One downside to this model is it can’t do 9:16 or 16:9 images. Only variants of square. It captured the truck and the name, but I don't think the scene is as good. It also randomly generated a second umbrella and placement of hands feels unreal.

4. Ideogram v4

(Image credit: NightCafe/Future/Ryan Morrison)

Ideogram has been one of my favorite AI image models since it launched. Always able to generate legible text, it is also more flexible in terms of style than the other models. The Ideogram website includes a well designed canvas and built-in upscaler.

The result isn’t perfect, the barista leans funny but the lighting is more realistic, the scene is also more realistic with the truck on the sidewalk instead of the road. It also feels more modern and the text is both legible and well designed.

5. Recraft v3

(Image credit: NightCafe/Future/Ryan Morrison)

Recraft is more of a design model, perfect for both rendered text and illustration, but that doesn’t mean it can’t create a stunning image. When it hit the market it shook things up, beating other models to the top of leaderboards.

I wasn’t overly impressed with the output. Yes, it's the most visually striking in part thanks to the space given to the scene. But it over emphasises the smoke and where is the barista? Also for a model geared around text — there’s no sign writing.

My favorite: Flux Kontext Max

(Image credit: NightCafe/Future/Ryan Morrison)

While Flux had a number of issues visually, it was the most consistent and it included legible sign writing. If I were using this commercially, as a stock image, I’d go with the Google Imagen 4 image, but from a purely visual perspective — Flux wins.

What you also get with Flux Kontext is easy adaptation. You could make a secondary prompt to change the truck color or replace the old lady with a businessman. You can do that in Gemini but not with Imagen. You’d need to use native image generation from Gemini 2+.

If you want to make a change to any image using Kontext, even if it wasn't a Kontext image originally, just click on the image in NightCafe and select "Prompt to Edit". Costs about 2.5 credits and is just a simple descriptive text prompt away.

Final thoughts on NightCafe

I used the most expensive version of each model for this test. The one that takes the most processing time to work on each image. This allowed for the fairest comparison. What surprises me is just how differently each model interprets the same descriptive prompt. But it doesn't surprise me how much better they’ve all got at following that description.

What I love about NightCafe though, is its one stop shop for AI content. It isn’t just a place to use all the leading image and video models, it contains a large community with a range of games, activities and groups centered around content creation. Also, you can edit, enhance, fix faces, upscale and expand any image you create within the app.

More from Tom's Guide

Sign up to read this article
Read news from 100’s of titles, curated specifically for you.
Already a member? Sign in here
Related Stories
Top stories on inkl right now
Our Picks
Fourteen days free
Download the app
One app. One membership.
100+ trusted global sources.