Google Gemini Diffusion: What's It About?
The world of artificial intelligence is buzzing with new ways to create text, and Google DeepMind's "Gemini Diffusion" is one of the latest experimental models catching attention. It's not your typical language model. Instead of building text word by word, it uses a technique called diffusion, aiming for faster, more creative, and more controllable text generation.
Let's break down what makes Gemini Diffusion interesting, based on what Google DeepMind has shared.
Tired of Postman? Want a decent postman alternative that doesn't suck?
Apidog is a powerful all-in-one API development platform that's revolutionizing how developers design, test, and document their APIs.
Unlike traditional tools like Postman, Apidog seamlessly integrates API design, automated testing, mock servers, and documentation into a single cohesive workflow. With its intuitive interface, collaborative features, and comprehensive toolset, Apidog eliminates the need to juggle multiple applications during your API development process.
Whether you're a solo developer or part of a large team, Apidog streamlines your workflow, increases productivity, and ensures consistent API quality across your projects.
What's Different About Diffusion Models?
Most language models you hear about, like many in the Gemini family or OpenAI's GPT series, are "autoregressive." Think of them as careful writers, putting down one word (or "token") after another, each choice depending on the words already written. This can be a bit slow and sometimes, if the model makes a less-than-perfect choice early on, it can affect the rest of the text.
Gemini Diffusion takes a different path. It works more like a sculptor starting with a block of stone and gradually refining it. According to Google DeepMind, "Diffusion models work differently. Instead of predicting text directly, they learn to generate outputs by refining noise, step-by-step." This means the model can look at a whole chunk of text and improve it iteratively. Google suggests this is great for "tasks like editing, including in the context of math and code," where getting things just right is key.
You can read more about it on Google's official blog: https://deepmind.google/models/gemini-diffusion/
What Can Gemini Diffusion Do?
Based on the information released, Gemini Diffusion shines in a few key areas:
- Speedy Responses: It's reportedly much faster than Google DeepMind's previous fastest models. They've shared a specific number: 1479 tokens per second (this is how fast it can generate, not counting some initial setup time, which is about 0.84 seconds). This speed comes from its ability to work on whole blocks of text at once.
- More Joined-Up Text: Because it "generates entire blocks of tokens at once," the text it produces can be more coherent. It has a better sense of the overall picture, leading to smoother and more logically connected writing.
- Fixes as It Goes: The step-by-step refinement process means it "corrects errors during generation." This is a big plus for consistency and accuracy.
How Does It Stack Up? (The Benchmarks)
Google DeepMind has compared Gemini Diffusion to "Gemini 2.0 Flash-Lite" (another of their models) on several tests. These tests, called benchmarks, measure performance on different tasks. All scores are "pass @1," meaning the model got it right on the first try without multiple attempts.
Here’s a look at the numbers:
Category | Benchmark | Gemini Diffusion (%) | Gemini 2.0 Flash-Lite (%) |
---|---|---|---|
Code | LiveCodeBench (v6) | 30.9 | 28.5 |
Code | BigCodeBench | 45.4 | 45.8 |
Code | LBPP (v2) | 56.8 | 56.0 |
Code | SWE-Bench Verified* | 22.9 | 28.5 |
Code | HumanEval | 89.6 | 90.2 |
Code | MBPP | 76.0 | 75.8 |
Science | GPQA Diamond | 40.4 | 56.5 |
Mathematics | AIME 2025 | 23.3 | 20.0 |
Reasoning | BIG-Bench Extra Hard | 15.0 | 21.0 |
Multilingual | Global MMLU (Lite) | 69.1 | 79.0 |
Note: For SWE-Bench Verified, it was a specific type of single-edit task. |
What These Numbers Tell Us (Simply Put):
- Good at Code and Math: Gemini Diffusion seems to do well, and sometimes better, on several coding tests (like LiveCodeBench, LBPP) and notably on a math test (AIME 2025). Its ability to refine and correct seems helpful here.
- Flash-Lite Stronger Elsewhere: Gemini 2.0 Flash-Lite scored higher on tests for science knowledge (GPQA Diamond), complex reasoning (BIG-Bench Extra Hard), and understanding multiple languages (Global MMLU).
- Fast and Capable: Google DeepMind says Gemini Diffusion's performance is "comparable to much larger models, whilst also being faster." This is a big deal – it’s like getting good results more quickly.
Trying Out Diffusion Models: Exploring LLaDA
Gemini Diffusion is currently an experimental demo, and you need to join a waitlist to potentially try it. Since it's not open source or widely available yet, you might be curious about other diffusion models for text.
One such project is LLaDA (Large Language Diffusion with Masking), an open-source model you can find on GitHub: https://github.com/ML-GSAI/LLaDA.
Here's a bit about LLaDA from its repository:
- It's an 8 billion parameter diffusion model, trained from scratch.
- Its creators state it rivals the performance of LLaMA3 8B (a well-known autoregressive model).
- LLaDA's goal is to explore "a theoretically complete language modeling approach — masked diffusion models."
- Like many modern LLMs, it uses the Transformer architecture but applies a diffusion-based probabilistic approach.
- It employs a masking strategy where the masking ratio varies randomly, which is key to its generative capabilities.
- You can use it with Hugging Face Transformers. Here’s a snippet from their page:
from transformers import AutoModel, AutoTokenizer tokenizer = AutoTokenizer.from_pretrained('GSAI-ML/LLaDA-8B-Base', trust_remote_code=True) model = AutoModel.from_pretrained('GSAI-ML/LLaDA-8B-Base', trust_remote_code=True, torch_dtype=torch.bfloat16)
- They also provide scripts like
chat.py
for conversations and anapp.py
for a Gradio demo.
The LLaDA team notes that its current sampling speed is slower than autoregressive models for a few reasons, but they believe there's "significant room for optimization," drawing parallels to how image diffusion models became much faster over time.
Running LLaDA Locally on a Mac with MLX?
If you're a Mac user with Apple silicon (M1, M2, M3 chips), you might be interested in MLX, Apple's framework for running machine learning models efficiently on their hardware.
There's active development in the open-source community to bring more models to MLX. For LLaDA specifically, there's a pull request (a proposed change) on the mlx-lm
GitHub repository to add support for it: https://github.com/ml-explore/mlx-lm/pull/14.
While the specifics depend on this pull request being finalized and merged, if LLaDA support is integrated into mlx-lm
, you could potentially run it locally using commands similar to how other models are run with mlx-lm
.
For example, you might be able to use commands like:
- For text generation:
python -m mlx_lm.generate --model GSAI-ML/LLaDA-8B-Base --prompt "Your prompt here"
- For an interactive chat (with the Instruct version):
python -m mlx_lm.chat --model GSAI-ML/LLaDA-8B-Instruct
Important Note: You'd need to have mlx-lm
installed and ensure you're using a version that includes LLaDA support (once the pull request is merged and released). Always check the official mlx-lm
documentation for the most up-to-date instructions.
What's Next?
Gemini Diffusion is a peek into how Google is exploring new ways to make AI generate text better and faster. While it's still experimental, the ideas behind it—like iterative refinement and block-by-block generation—are exciting. And with open-source projects like LLaDA, the broader community can also explore and contribute to the advancement of diffusion-based language models. It's a rapidly evolving field, and we're likely to see many more innovations ahead!