ChatGPT’s New Images Model Excels In Creating Text Within Images

By Sandy Verma On Apr 24, 2026

Not long ago, spotting AI-generated images was simple—errors like misspelled food names made them obvious, with odd creations such as “enchuita” or “margartas” exposing the flaws.

However, newer systems like ChatGPT Images 2.0 now produce outputs so polished that something like a restaurant menu can look completely usable, with only subtle details raising doubt.

ChatGPT 2.0 Version Launched With Improved Image Creation Feature

Earlier models such as DALL-E 3 struggled heavily with text because they relied on diffusion processes, which rebuild visuals from noise and often overlook small elements like words.

As Asmelash Teka Hadgu explained, text occupies a tiny portion of pixels, so models prioritized broader visual patterns instead of accurate lettering.

Meanwhile, researchers began exploring alternatives like autoregressive systems, which predict image components step by step, functioning more like language models.

Even so, OpenAI has not disclosed the exact architecture behind Images 2.0.

The company does highlight new “thinking capabilities,” allowing the model to verify outputs, browse information, and generate multiple variations from a single prompt.

Because of this, it can now create structured outputs like marketing materials or even multi-panel comic strips with consistent design.

Additionally, it handles non-Latin scripts more effectively, including Hindi, Bengali, Japanese, and Korean, marking a significant improvement in multilingual rendering.

Its knowledge base extends only up to December 2025, which may limit accuracy for recent events.

OpenAI Highlights High-Precision, 2K-Ready Image Generation Capabilities

OpenAI claims the system achieves high precision and detail, accurately following instructions while handling elements that previously caused issues, such as small text, icons, and dense layouts.

These outputs can reach resolutions up to 2K, making them suitable for professional use.

That said, generating complex visuals still takes longer than text responses, though results like comic strips can be completed within minutes.

The model became available to ChatGPT and Codex users, with advanced features reserved for paid tiers, alongside the release of the gpt-image-2 API.

A major breakthrough lies in solving the long-standing “text problem,” replacing distorted or unreadable words with clear, accurate typography.

It can now produce detailed UI mockups, multilingual designs, and up to eight connected images for storytelling purposes.

Notably, its reasoning mode analyzes prompts and checks for real-world accuracy, enabling precise placement of labels in diagrams or layouts.

As a result, creators are already using it for posters, infographics, and other high-quality assets that previously required manual editing.

Although minor visual flaws can still appear in highly complex scenes, overall feedback suggests the model has effectively “shattered the text barrier” in AI-generated imagery.

Source