Baidu's ERNIE-Image: Open-Source AI Model Beats Bigger Rivals
Baidu has unleashed ERNIE-Image on April 15, 2026, an 8-billion-parameter Diffusion Transformer that posts a GenEval score of 0.8856 and LongTextBench score of 0.9733 while running on a single 24 GB consumer GPU under an Apache 2.0 license. This release collides directly with Pope Leo XIV’s May
Baidu has unleashed ERNIE-Image on April 15, 2026, an 8-billion-parameter Diffusion Transformer that posts a GenEval score of 0.8856 and LongTextBench score of 0.9733 while running on a single 24 GB consumer GPU under an Apache 2.0 license. This release collides directly with Pope Leo XIV’s May 2026 encyclical “Magnifica Humanitas,” which calls for tighter AI oversight. The result is a clear flashpoint: the most capable open-weight text-to-image model yet is now free for anyone to download and modify.
The Aitrepreneur YouTube tutorial already shows users installing the model in ComfyUI with an 8-step Turbo mode, while Vatican footage of the Pope signing the encyclical circulates on AP video feeds. Both developments landed within weeks of each other, forcing a real-time debate over speed versus safeguards.
Baidu Drops ERNIE-Image: Open-Source AI Model Beats Bigger Rivals on Text Rendering
Atlanta, GA – June 7, 2026 — Baidu released ERNIE-Image, an 8-billion-parameter Diffusion Transformer, on April 15, 2026, under the Apache 2.0 license. The model records a GenEval score of 0.8856 and a LongTextBench score of 0.9733, the highest marks reported for any open-weight system at the time of launch. It supports bilingual Chinese-English prompts, includes a built-in Prompt Enhancer, and runs inference on a single 24 GB VRAM card with an optional 8-step Turbo mode.
Baidu’s Technical Breakthrough
The 8-billion-parameter DiT architecture marks Baidu’s first public open-weight text-to-image release. Engineers at the company’s Beijing headquarters trained the model on a mixture of Chinese and English image-text pairs, enabling native handling of both languages without translation layers. The Prompt Enhancer automatically rewrites short user inputs into detailed prompts before generation begins.
Benchmark Dominance Over Competitors
ERNIE-Image’s GenEval score of 0.8856 surpasses every previously released open-weight model, while its LongTextBench result of 0.9733 demonstrates superior rendering of long, legible text inside generated images. These numbers were measured on standard evaluation suites and released alongside the model weights on April 15, 2026. No larger closed model had published higher open-weight-comparable scores at that date.
Running on Consumer Hardware
The full model fits inside 24 GB of VRAM, allowing operation on a single high-end consumer graphics card. An 8-step Turbo mode further reduces generation time while preserving most of the quality. Users can load the model directly into ComfyUI workflows, as demonstrated in the Aitrepreneur YouTube video posted shortly after release.
Regulatory Shadows from Vatican
Pope Leo XIV signed the encyclical “Magnifica Humanitas” in late May 2026, explicitly addressing societal risks from rapidly advancing AI systems. The AP video of the signing ceremony shows the Pope calling for international guardrails on open-source model distribution. ERNIE-Image’s Apache 2.0 release, which permits unrestricted commercial use and modification, lands squarely inside that debate.
Community Adoption Through Tutorials
The Aitrepreneur YouTube channel released a step-by-step installation guide titled “RIP Z-IMAGE! NEW FREE NSFW IMAGE AI IS HERE! LESS THAN 8GB VRAM!” within days of the April 15 launch. The video walks viewers through ComfyUI node setup and Turbo-mode configuration, accelerating grassroots testing of the bilingual text-rendering capabilities.
What This Means
ERNIE-Image supplies verifiable state-of-the-art performance under fully open terms at the exact moment global institutions debate new AI controls. The model’s documented scores, hardware requirements, and license terms are now public data points in that discussion. Whether regulators move before or after wider adoption will shape the next phase of open-source image generation.
By Jessica Ali, Staff Writer
What's Your Reaction?
Like
0
Dislike
0
Love
0
Funny
0
Wow
0
Sad
0
Angry
0
Comments (0)