Offset: 0.0s
Space Play/Pause

GLM-4.7 (Fully Tested): RIP Opus 4.5! The BEST Open Model is HERE!

Zhipu AI has once again raised the bar with its latest release, GLM-4.7, a model that significantly advances the coding capabilities of open-source AI. Having had early access to this powerful …

5 min read

GLM-4.7: A Deep Dive into the New King of Open Source Coding Models

Zhipu AI has once again raised the bar with its latest release, GLM-4.7, a model that significantly advances the coding capabilities of open-source AI. Having had early access to this powerful new tool, I can confidently say it stands out as the best open model for coding tasks available today, and by a significant margin. This analysis will break down its features, performance on various benchmarks, and why it might be your new go-to coding partner.

A Legacy of Excellence

Before diving into GLM-4.7, it’s worth remembering the strong foundation laid by its predecessors. I’ve previously covered GLM-4.5, GLM-4.6, and even the earlier GLM Code Models, all of which have been impressive in their own right. A key reason for their popularity is their commitment to being open weights, a philosophy that continues with the latest iteration. In fact, the influence of these models is so widespread that it’s rumored Cursor’s highly-regarded composer one model is a fine-tune of the GLM-4.5 variant. These models have consistently delivered great performance, setting high expectations for what’s next.

[00:56.090]

When GLM-4.6 was released on the exact same day as Claude 4.5 Sonnet, it directly challenged the established players. Now, we have an even more improved version in GLM-4.7. So, what exactly makes this new model so special? Let’s look at the features and performance metrics.

Advancing Core Coding Capabilities

[01:17.070]

GLM-4.7 introduces substantial improvements across the board, particularly in its core coding functionalities. The model brings clear gains over its predecessor, GLM-4.6, in multilingual agentic coding and terminal-based tasks.

“In benchmarks alone, it scores about a 6% improvement in SWE-bench. Multilingual SWE-bench is up by 13%, and Terminal Bench 2.0 has a score of 16.5%.”

[01:41.050]

Beyond raw benchmark scores, GLM-4.7 also demonstrates significant improvements in complex tasks within mainstream agent frameworks. It supports “thinking before acting,” enhancing its performance in frameworks such as Claude Code, Kilo Code, Cline, and Roo Code.

[01:53.330]

One of the most exciting new features is Vibe Coding. GLM-4.7 takes a major step forward in UI quality, producing cleaner, more modern webpages and generating better-looking presentation slides with more accurate layouts and sizing. This focus on aesthetics sets it apart from many other coding models.

[02:06.110]

The model also achieves significant improvements in Tool Using. This is reflected in its strong performance on benchmarks like t²-Bench and web browsing via BrowseComp, indicating a more robust and reliable ability to interact with and utilize external tools.

[02:21.570]

Finally, GLM-4.7 delivers a substantial boost in mathematical and complex reasoning. It achieves an impressive 42.8% on the HLE (Humanity’s Last Exam) benchmark, a significant jump compared to GLM-4.6. Based on these scores and the model’s track record, it’s clear that these are not just “benchmark-maxed” models; they deliver genuine, practical improvements.

Real-World Performance: King Bench Analysis

To put GLM-4.7 to the test, I ran it through my custom benchmark suite, King Bench, which evaluates models on a series of general and coding-specific tasks.

[03:19.340]

The model achieved an overall score rate of 65%. Let’s break down some of the specific results.

[03:27.790]

For a task requiring the creation of a 3D floor plan, the functionality was impressive. You could hover over different areas to see room names like “Kitchen & Dining.” However, the overall layout was a bit chaotic, showing there’s still room for improvement in complex spatial design.

[03:41.870]

In a more creative task, generating an SVG of a panda holding a burger, the model performed remarkably well. The result was a clean, well-drawn panda that was also animated to float gently and blink its eyes, showcasing its strength in visual and graphical code generation.

[04:01.400]

The model’s skill with 3D graphics was further proven with a Three.js Pokeball generation. The output was excellent, with accurate dimensions and realistic light reflection, highlighting its capability in more advanced web graphics.

[04:16.680]

A standout performance was the creation of a fully functional chessboard with an autoplay feature. The design was slick, using proper SVG pieces instead of simple emojis, and the AI opponent made logical moves. This was one of the best generations for this task I’ve seen in a while.

[05:24.310]

After all tests, GLM-4.7 secured the third position on the one-shot King Bench leaderboard, placing it above powerful models like Sonnet 4.5 and GPT-5.2 (XHigh). While still behind Claude Opus and Gemini 3 Pro, it’s important to note that Gemini’s top score is primarily due to its strength in one-shot questions, whereas agentic, multi-step tasks are where it tends to fall apart.

Excelling in Agentic Benchmarks

Agentic tasks, which require a model to perform a series of steps, are a true test of a coding AI’s practical utility. Here, GLM-4.7, when paired with Kilo Code, also delivers a top-tier performance.

[06:05.748]

[A terminal user interface (TUI) for a calculator, built in Go using lipgloss and bubbletea libraries.]

For instance, when tasked with creating a Go-based terminal calculator using the lipgloss and bubbletea libraries, the model performed flawlessly. The resulting TUI was visually appealing, functional, and worked “insanely well.” This task was executed using the Z.ai GLM coding API, which is incredibly affordable, with plans starting at just a few dollars.

[06:42.270]

The model also did quite well creating a movie tracker app in Expo. The UI was clean, displaying movies in a scrollable carousel and providing a detailed inner page with a GitHub-style tracker for logging watched movies. For a single-shot generation, this was a very strong result.

[08:01.990]

On the agentic leaderboard, GLM 4.7 + Kilo Code earned the 5th position, proving it’s a formidable contender for complex, multi-step coding projects. It is fast, the API is cheap, the weights are open, and it’s far better than comparable models like Gemini Flash.

“I think that this is currently the best model if you want to do AI coding.”

Final Verdict

GLM-4.7 represents a significant leap forward for open-source AI coding models. Its combination of strong core coding skills, improved reasoning, impressive visual generation, and excellent performance in agentic tasks makes it a top choice for developers. Its affordability and open-weights nature only add to its appeal. Whether you’re building a complex web application, a terminal tool, or a 3D game, GLM-4.7 has proven itself to be a reliable and highly capable partner.