📊 Full opportunity report: The Model Is Only 10%: The Real Lesson of the New SDLC on ThorstenMeyerAI.com — validation score, market gap, and execution plan.

TL;DR

A recent whitepaper emphasizes that in AI-assisted software engineering, the actual AI model accounts for only about 10% of system behavior. The majority of performance depends on the harness, verification, and context engineering. This shifts strategic focus from model choice to system configuration and design.

A new Google whitepaper, titled The New SDLC With Vibe Coding, asserts that the most significant shift in software engineering is moving from writing code to expressing intent and trusting machines to translate that into working software. The paper emphasizes that the AI model accounts for only about 10% of system behavior, with the remaining 90% determined by the harness, verification, and context engineering. This challenges conventional focus on model selection and suggests a strategic pivot for AI development teams.

The whitepaper, authored by Addy Osmani, Shubham Saboo, and Sokratis Kartakis, highlights that the dominant factor in AI system performance is how the AI is integrated and configured. It distinguishes between ‘vibe coding’—quick, minimal prompts suitable for prototypes—and ‘agentic engineering,’ which involves formal specifications, automated tests, and human oversight. The authors argue that most failures in AI agents are due to configuration errors, missing tools, or vague rules rather than the AI model itself.

Concrete evidence from benchmarks shows that changing the system’s harness—such as prompts, tools, or middleware—can dramatically improve performance, even with the same underlying model. For example, moving an agent from outside the top 30 to the top 5 was achieved solely by adjusting the harness. The paper emphasizes that costs are primarily driven by how the AI is set up and maintained, not just the model’s capabilities.

At a glance
reportWhen: published early 2026
The developmentA Google whitepaper introduces a new framework for AI-driven software development, highlighting that the AI model itself is only a small part of the system, with most influence coming from harness and verification.
The Model Is Only 10% — The New SDLC With Vibe Coding
AI Dispatch · Field Notes
Google · Osmani, Saboo & Kartakis · May 2026

The model is only 10%

A Google whitepaper argues software’s biggest shift is from writing code to expressing intent. Its sharpest claim: the model you obsess over is the smallest part of the system — the scaffolding around it does the real work.

A spectrum, not a binary — the differentiator is how outputs get verified
Vibe Coding
Casual prompts · “does it seem to work?” · disposable code · high risk
Structured AI-Assisted
Detailed prompts + constraints · manual testing · features in real codebases
Agentic Engineering
Formal specs · automated tests + evals + CI gates · production scale · low risk
Tests verify the deterministic; evals verify the rest. Without both, it’s vibe coding — however clever the prompt.
The idea worth building your strategy around
Agent = Model + Harness
~10%
HARNESS — prompts · tools · context · hooks · sandboxes · observability
MODEL~90% IS YOUR SURFACE AREA, NOT THE PROVIDER’S
Outside Top 30 → Top 5 on Terminal Bench 2.0 by changing only the harness — same model.
“Most agent failures, examined honestly, are configuration failures” — a missing tool, a vague rule, a noisy context.
The economics: it’s a token-cost problem (CapEx vs OpEx)
Vibe Coding
Low CapEx · High OpEx
Looks free, hides debt: token burn (fix-it loops), maintenance tax (AI spaghetti), security remediation. Crosses over to 3–10× more per feature.
Agentic Engineering
High CapEx · Low OpEx
Pay upfront (specs, evals, context), then ship cheaply. Levers: context engineering for first-pass success + intelligent model routing — cheap models for the easy work.
85%
of devs use AI coding agents (51% daily)
41%
of all new code is AI-generated
~90%
of agent behavior is the harness, not the model
+19%
longer on some tasks (METR) — verification is the cost
The read

The clearest map yet of how serious AI development works — and mostly tool-agnostic. But it’s a Google funnel: the concepts are neutral, the on-ramps point to Gemini, Jules & the ADK. If the harness is 90% and it’s yours, your moat and your costs both live there — so own your scaffolding, route across models, and remember: AI amplifies whatever engineering culture it lands in.

Source: Osmani, Saboo & Kartakis, “The New SDLC With Vibe Coding,” Google (May 2026). Figures are the paper’s own, incl. METR & LangChain. Analysis is the author’s.
thorstenmeyerai.com

Implications for AI Development Strategies

This shift means that organizations should focus less on acquiring the latest AI models and more on building robust, configurable systems. The harness and context engineering become the primary areas for competitive advantage, as they determine the AI’s behavior and reliability. Understanding that the model is only a small part of the system can lead to more cost-effective, scalable, and secure AI deployment strategies, especially given the high costs associated with token consumption and system maintenance.

TOPDON TopScan Lite OBD2 Bluetooth Scanner, Bi-Directional All System Diagnostic Tool with AI Assistant, 8 Resets, Repair Guides, Performance Test, FCA AutoAuth & CAN-FD for iOS Android

TOPDON TopScan Lite OBD2 Bluetooth Scanner, Bi-Directional All System Diagnostic Tool with AI Assistant, 8 Resets, Repair Guides, Performance Test, FCA AutoAuth & CAN-FD for iOS Android

Bi-Directional Control, Quickly Locate Problems: Turn your phone into a professional diagnostic tool. You can send commands from…

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Evolution of AI-Assisted Software Engineering

Prior to this development, the industry largely focused on improving AI models and adopting new frameworks. The 2024-2025 period saw widespread adoption of AI coding agents, with over 85% of developers using them regularly. However, the whitepaper signals a paradigm shift: the real value lies in how these models are integrated and managed. The concept of ‘vibe coding’—rapid, low-structure prompts—was common, but the authors warn that this approach leads to higher long-term costs and lower reliability. The emerging emphasis on the ‘harness’ and verification reflects an evolution toward disciplined, system-oriented AI engineering.

“The biggest shift in software engineering isn’t a new language or framework; it’s moving from writing code to expressing intent and trusting machines to do the rest.”

— Addy Osmani

LEAN PROGRAMMING FOR FORMAL SOFTWARE VERIFICATION: Mathematical proof systems and logical frameworks for verified computation

LEAN PROGRAMMING FOR FORMAL SOFTWARE VERIFICATION: Mathematical proof systems and logical frameworks for verified computation

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Unclear Aspects of System Implementation

While the paper provides strong evidence that harness and configuration are dominant, it does not specify exact best practices for system design across different domains. The precise methods for scaling context engineering and automation in complex systems remain to be fully developed and validated in real-world scenarios. Additionally, the long-term implications for AI model innovation and how this shift might influence future model development are still evolving.

Amazon

AI model harness configuration tools

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Next Steps for AI System Design and Adoption

Organizations are expected to reevaluate their AI strategies, investing more in system architecture, tooling, and verification processes. Future research and industry efforts will likely focus on developing standardized frameworks for harness design, context management, and cost optimization. Monitoring how companies implement these principles in production environments will be key to understanding the practical impact of this paradigm shift.

AI-Powered Software Testing: Practical Techniques for Quality Assurance with Generative AI

AI-Powered Software Testing: Practical Techniques for Quality Assurance with Generative AI

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Key Questions

Why is the model only 10% of the system’s behavior?

The whitepaper shows that most of an AI system’s performance depends on how the model is integrated, configured, and verified—collectively called the harness—rather than the model’s raw capabilities.

How does this shift affect AI development costs?

Focusing on system configuration and verification can significantly reduce long-term costs, as it minimizes token waste, maintenance, and vulnerabilities, despite higher initial investment in system design.

What is ‘agentic engineering’?

Agentic engineering involves formal specifications, automated testing, and oversight, creating disciplined, reliable AI systems rather than quick, vibe-based prompts.

Does this mean AI models are becoming less important?

While models are still essential, their role is now seen as a smaller part of the overall system. The focus shifts toward how models are integrated and managed within the larger architecture.

What should companies do next?

They should invest in system architecture, develop better harnessing strategies, and prioritize verification and context engineering to optimize AI performance and costs.

Source: ThorstenMeyerAI.com

You May Also Like

Forezai · Polybot: When the AI Disagrees With the Odds

Polybot, an open-source AI trading experiment, tests when and if an AI can reliably disagree with prediction market prices, highlighting risks and insights.

Agentic Loop Failure Modes: A Production Taxonomy at the End of Year One

A comprehensive taxonomy of failure modes in production agentic AI after one year of deployment, detailing categories, detection, and mitigation strategies.

Cybersecurity operations signal monitor: A backdoor in a LinkedIn job offer

Security researchers identify a backdoor in a LinkedIn job listing, raising concerns about targeted cyber threats and organizational security.

Three Public Vulnerabilities. Chained.

A chain of three publicly known vulnerabilities was exploited in the TanStack npm packages, leading to a significant supply-chain incident on May 11, 2026.