Offset: 0.0s
Space Play/Pause

NEW Deepseek just dropped...

A new powerhouse has entered the artificial intelligence arena. DeepSeek V3.2 has arrived, marking another significant “DeepSeek moment” in AI. This isn’t just another model release; it’s a his…

6 min read

DeepSeek V3.2: The Open-Source AI That Outsmarts GPT-5 in Math

A new powerhouse has entered the artificial intelligence arena. DeepSeek V3.2 has arrived, marking another significant “DeepSeek moment” in AI. This isn’t just another model release; it’s a historic achievement. For the first time, an open-source model has officially achieved a gold-medal score at the International Math Olympiads (IMO). It’s not just keeping up; it’s actively outperforming top-tier, closed-source models from giants like OpenAI and Anthropic. Remarkably, DeepSeek has accomplished this with incredible efficiency and on a fraction of the budget of its competitors. Let’s break down what makes this model so revolutionary.

[00:28.169]

[A tweet from DeepSeek AI announcing the launch of DeepSeek V3.2 and V3.2-Speciale with a bar chart showing its performance against competitors.]

The official announcement from the DeepSeek team on X (formerly Twitter) revealed two new versions: DeepSeek V3.2 and a more powerful variant, DeepSeek V3.2 Speciale. The key takeaway from their launch is their focus on creating “Reasoning-first models built for agents!” This signifies a strategic shift towards building models that are not just conversationalists but capable, autonomous agents that can reason and execute complex tasks. The regular V3.2 model is designed for balanced performance, while the Speciale version is engineered to push the absolute boundaries of reasoning capabilities.

[00:50.311]

[A table comparing the benchmark scores of various AI models including GPT-5 High, Gemini 3.0 Pro, and the new DeepSeek models across different reasoning tasks.]

A glance at the benchmark scores reveals just how formidable these new models are. In a direct comparison with models like GPT-5 High and Gemini 3.0 Pro, DeepSeek V3.2 Speciale consistently takes the lead. On the AIME 2025 math benchmark, it scores a remarkable 96.0, surpassing both Gemini 3.0 Pro (95.0) and GPT-5 High (94.6). The trend continues across other challenging reasoning benchmarks like HMMT, where it again claims the top spot. The numbers in parentheses in the chart indicate the number of tokens used for the task. While the standard V3.2 model is highly token-efficient, the Speciale version uses a larger token budget to achieve its state-of-the-art results, demonstrating a trade-off between cost and peak performance. It also shows strong performance on coding benchmarks like LiveCodeBench.

[01:57.251]

[A benchmark table specifically focusing on the “Thinking in Tool-Use” capabilities of DeepSeek V3.2 compared to other models.]

DeepSeek’s innovation isn’t just about raw scores; it’s about pioneering new algorithmic approaches. The team has introduced several novel techniques to enhance the model’s performance, particularly in agentic tasks and tool usage. This focus on algorithmic improvement is a hallmark of DeepSeek’s research and is what enables them to compete at the highest level.

[02:10.127]

[The abstract from the DeepSeek V3.2 technical paper, outlining its key technical breakthroughs.]

The technical paper’s abstract reveals the three core breakthroughs behind the model’s success. First is the DeepSeek Sparse Attention (DSA), a highly efficient attention mechanism. It fundamentally changes how the model processes information by reducing computational complexity, especially for long sequences of text. This allows for massive context windows without the typical slowdown, moving from a quadratic O(l²) complexity to a more manageable near-linear one. Second is a Scalable Reinforcement Learning (RL) Framework. By implementing a robust RL protocol and scaling up post-training compute, DeepSeek V3.2 achieves performance comparable to GPT-5. The paper proudly states:

Notably, our high-compute variant, DeepSeek-V3.2-Speciale, surpasses GPT-5 and exhibits reasoning proficiency on par with Gemini-3.0-Pro, achieving gold-medal performance in both the 2025 International Mathematical Olympiad (IMO) and the International Olympiad in Informatics (IOI).

[02:10.127]

[The abstract from the DeepSeek V3.2 technical paper, highlighting the third key breakthrough.]

The third major innovation is a Large-Scale Agentic Task Synthesis Pipeline. To make the model a proficient agent, DeepSeek developed a novel system to systematically generate vast amounts of high-quality training data for tool-use scenarios. This methodology allows for scalable agentic post-training, leading to substantial improvements in the model’s ability to generalize and follow complex instructions in interactive environments.

[04:20.176]

[The speaker in a slightly different shirt, introducing the video’s sponsor, Zapier.]

For those inspired by AI models specializing in tool use, the video presents a practical application through its sponsor, Zapier. Zapier is an automation platform that allows you to connect over 8,000 different apps and AI tools, enabling them to work together autonomously. You can set a trigger in one app and define a series of actions for other apps and AI to perform, creating powerful, hands-off workflows.

[04:27.108]

[An animation showing the Zapier Copilot interface where a user asks it to design a system for triaging support tickets.]

With Zapier, you can inject AI directly into your workflows to supercharge them. This could involve automatically drafting emails, summarizing meeting notes, generating social media content, or updating database records, all without manual intervention. You can simply describe the automation you want in plain English, and Zapier’s AI Copilot will build the workflow for you.

Collect leads, enrich them with AI, and follow up automatically

[04:41.879]

[A visual diagram of an automated workflow in Zapier for lead qualification and scheduling.]

The power of this automation is vast. For example, a workflow can be created to take a new lead from a form, send it to a Lead Qualification Agent, update a table with the results, and then automatically send a scheduling link via email. This level of integration streamlines business processes and saves a significant amount of time.

[05:31.026]

[A snippet from the DeepSeek technical paper with a sentence highlighted about the post-training computational budget.]

Diving back into the technical details, the paper reveals the significant resources dedicated to refining the model after its initial training. This investment in post-training is crucial for unlocking its advanced capabilities.

Notably, this framework allocates a post-training computational budget exceeding 10% of the pre-training cost, unlocking advanced capabilities.

[05:52.709]

[A sentence from the paper is highlighted, explaining that extensive synthesized data drives the reinforcement learning process.]

To excel at agentic tasks, the model was trained on a massive, custom-built dataset. The team generated data from over 1,800 distinct environments and created 85,000 complex prompts. This extensive synthesized data is the fuel for the reinforcement learning process, dramatically enhancing the model’s generalization and instruction-following abilities within an agent context.

[06:15.528]

[A section of the paper titled “Inference Costs” with a sentence highlighted about reducing core attention complexity.]

The DeepSeek Sparse Attention (DSA) mechanism is the key to the model’s efficiency. It successfully reduces the complexity of the attention mechanism from quadratic O(l²) to a near-linear O(lk). In simple terms, this means that as the context length increases, the computational cost grows in a controlled, linear fashion rather than exploding exponentially. This makes running the model much less expensive, especially for tasks requiring a long context.

[06:53.300]

[A section of the paper discussing the results of DeepSeek V3.2-Speciale on tool-use benchmarks.]

When it comes to tool use, DeepSeek V3.2 substantially narrows the performance gap between open-source and leading closed-source models. While it doesn’t yet surpass the absolute frontier in every tool-use scenario, it represents a massive leap forward and establishes itself as a highly cost-effective and powerful alternative for building AI agents.

This new model is a frontier-level AI, but it is also relatively small for its capabilities. It’s a Mixture of Experts (MoE) model with a total of 671 billion parameters, of which 37 billion are active during inference. The best part? The model is fully open-source, with open weights and a permissive MIT license, allowing for widespread use in both research and commercial applications. DeepSeek V3.2 is not just a new model; it’s a testament to the power of open-source innovation and a sign of exciting things to come in the world of AI agents.