Mistral 3 vs. DeepSeek-V3.2: A Deep Dive into the Newest Open-Source AI Models
The world of open-source AI is buzzing with excitement as two major players, Mistral and DeepSeek, have unveiled their latest models. This article breaks down the new Mistral 3 series and DeepSeek-V3.2, analyzing their architecture, performance, and where they stand in the competitive landscape of large language models (LLMs).
[00:08.177]
This analysis will cover the latest advancements from Mistral and DeepSeek, two of the most influential companies in the open-source model space. While DeepSeek made a significant impact with its previous V3 and R1 models, Mistral was one of the first Western companies to release high-quality, open models that could be run locally with impressive performance.
[00:12.637]
Both Mistral and DeepSeek have a history of pushing the boundaries of what’s possible with open models. Mistral initially gained popularity with models like Mistral Nemo, which was a top choice for local execution. However, the company later shifted towards more restrictive licenses and kept its larger models closed-source. Now, with the launch of Mistral 3, they are re-engaging with the open-source community.
Introducing Mistral 3 and Mistral Large 3
[01:05.177]
Mistral has launched two significant updates: Mistral Large 3 and the Mistral 3 series, which includes 14B, 8B, and 3B parameter models. These models are presented as state-of-the-art (SOTA) based on a selection of benchmarks provided by Mistral.
[01:09.117]
The Mistral models represent the best performance-to-cost ratio in their category. At the same time, Mistral Large 3 joins the ranks of frontier instruction-fine-tuned open-source models.
One of the most important things to note about Mistral Large 3 is that it’s categorized as a non-reasoning model. This means it is a raw, foundational model that doesn’t have specialized reasoning capabilities out of the box.
[01:35.857]
Architecturally, Mistral Large 3 is a sparse Mixture of Experts (MoE) model. It contains a massive 675 billion total parameters but only activates about 41 billion parameters during inference. This approach is similar to the architecture used by DeepSeek-V3, allowing for high performance while managing computational costs. The new Mistral models are fully compatible with most libraries that support this architecture.
[01:46.337]
DeepSeek-V3.2: Pushing the Frontier with Sparse Attention
[02:17.387]
Simultaneously, DeepSeek has released its new model, DeepSeek-V3.2. This isn’t just an incremental update; it introduces significant architectural changes designed to tackle one of the biggest bottlenecks in LLMs: attention.
[02:28.297]
The key technical breakthroughs of DeepSeek-V3.2 are as follows: (1) DeepSeek Sparse Attention (DSA), an efficient attention mechanism that substantially reduces computational complexity… (2) Scalable Reinforcement Learning Framework… (3) Large-Scale Agentic Task Synthesis Pipeline.
As the context length of an LLM increases, the computational cost of the attention mechanism typically explodes, making it expensive and slow to run models with large context windows. DeepSeek’s solution is DeepSeek Sparse Attention (DSA).
[03:31.987]
DSA introduces a “lightning indexer” that acts like a spotlight. It quickly scans the entire context to identify the most relevant “top-k” tokens for a given query and focuses the model’s attention on them, while ignoring the rest. This innovation allows the model to process massive contexts of up to 128,000 tokens with only a fraction of the computational power required by traditional dense attention models.
The “Speciale” Variant for Advanced Reasoning
[04:20.597]
DeepSeek also released DeepSeek-V3.2-Speciale, a variant specifically optimized for complex reasoning tasks. During its training, the constraints on output length were relaxed, allowing the model to “think” for as long as it needs to arrive at a solution. This makes it a distinct and powerful tool for developers who require advanced reasoning capabilities.
Benchmark Analysis: Mistral Large 3 vs. DeepSeek V3.2
So, how do these new models perform on real-world tasks? Here’s a look at their performance on the KingBench benchmark suite.
Mistral Large 3 Performance
[05:22.047]
Despite the hype, Mistral Large 3 showed disappointing results on several coding and generation tasks.
- 3D Floor Plan: Failed to generate a usable 3D model.
- SVG Panda: Produced a finicky and poorly drawn image.
- Pokéball in Three.js: The generated 3D object was broken and misplaced.
- Chessboard with Autoplay: The feature did not work at all.
While it successfully solved a riddle, its performance on coding and complex generation tasks was underwhelming.
DeepSeek V3.2 Performance
[06:45.287]
DeepSeek V3.2 delivered much stronger results, particularly in coding tasks.
- 3D Floor Plan: While it didn’t create a 3D model, it generated a detailed text-based floor plan, which was a more structured failure.
- SVG Panda: The output was significantly better than Mistral’s.
- Pokéball in Three.js: The generation was quite good, with only minor details missing.
- Chessboard with Autoplay: The model produced a fully functional chessboard with seamless autoplay logic.
[07:34.907]
The general version of DeepSeek V3.2 showed impressive coding capabilities, even though its specialized “Reasoning” variant sometimes produced buggy or incomplete code via API calls.
Final Leaderboard and Conclusion
[08:27.317]
On the final leaderboard, the results speak for themselves:
- DeepSeek V3.2 (New) secures the 11th position, outperforming models like GPT-5.1 Codex and GLM-4.6. This is an impressive feat, demonstrating the power of its architectural improvements.
- Mistral Large 3 lands at the 27th position. While a respectable placement, it falls short of the top-tier performance suggested by its initial announcement.
[09:16.487]
In conclusion, while it’s exciting to see new, powerful open-source models from both Mistral and DeepSeek, the latest benchmarks indicate that DeepSeek-V3.2 currently holds a significant edge, especially in coding and complex task generation.
The introduction of DeepSeek Sparse Attention is a game-changer for handling long contexts efficiently, making DeepSeek-V3.2 not only powerful but also incredibly cost-effective. While Mistral Large 3 is a solid new entry, it appears that other models, including those from GLM and Minimax, may still offer better overall performance for now. The open-source community continues to benefit from this healthy competition, and it will be fascinating to see how these models evolve.