📊 Full opportunity report: Engineering Is Automated. Research Is the Residual. on ThorstenMeyerAI.com — validation score, market gap, and execution plan.
TL;DR
Recent AI progress indicates engineering tasks in AI development are now largely automatable. Research, however, remains partly human-driven, raising questions about future AI innovation. The key development is the saturation of core AI engineering benchmarks, signaling near-complete automation of engineering skills.
Recent empirical data indicates that AI systems are now capable of automating the core engineering tasks involved in AI research, reaching near-saturation on multiple benchmarks. However, the capacity for AI to automate the research process itself remains uncertain, with some aspects likely still requiring human insight. This development could accelerate AI development timelines and reshape the role of human researchers.
Research by Thorsten Meyer, analyzing Jack Clark’s recent work, shows that AI has achieved significant progress in automating core engineering skills necessary for AI development. Six benchmarks measuring AI capabilities—such as research reproduction and Kaggle competition performance—are nearing saturation, with some reaching over 95% success rates within 15 to 16 months. For example, the CORE-Bench, which tests the reproduction of research papers, is now considered ‘solved’ at 95.5%, meaning AI can handle dependencies, code execution, and output analysis at a level comparable to experienced researchers. Similarly, the MLE-Bench, evaluating AI in Kaggle competitions, has seen performance improve from 16.9% to 64.4%, approaching mid-tier human performance. These trends suggest that the engineering side of AI research—building models, optimizing kernels, and automating infrastructure—is nearing full automation. Conversely, research tasks such as hypothesis generation, creative problem-solving, and novel discovery remain less automated, with the extent of AI’s future role still unclear. The progress across these benchmarks indicates a structural shift: engineering may soon be fully automated, while research could become the residual task requiring human insight.Engineering is automated.
Research is the residual.
Six skill benchmarks. Edison’s framing. The question Clark leaves open is whether research is just engineering at scale.
Jack Clark’s Import AI #455 catalogs six benchmarks measuring AI capability on AI R&D tasks and concludes “AI can today automate vast swatches, perhaps the entirety, of AI engineering.” The residual question is research. The structural read on the residual: it may not be a permanent moat.
Six skills. One trajectory.
Clark catalogs six benchmarks measuring AI capability on AI R&D-relevant tasks. Each individual benchmark could be noise. Six benchmarks moving together is a curve. The pattern is the cascade observed across the broader Clark series — visible here in the specific R&D-skill domain.

CLAUDE AI UNLEASHED From First Prompts to Pro: The Complete Guide to Claude AI for Writing, Research, Coding, and Business (The Claude AI Mastery Series)
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Three data points. Mixed signal.
Clark provides three data points on the creative-spark question. Yes-evidence: Erdős-1051, centaur math discovery, sporadic Move-37-style moments. No-evidence: low yield, framing dependence, absence of acceleration. The mixed signal is the honest read.
The data supports two readings. Pessimistic: rare moments suggest creative insight is qualitatively distinct from engineering work. Optimistic: rare moments are an artifact of low-volume exploration; more shots on goal yields more discoveries. Both readings are consistent with Clark’s “vast swatches, perhaps the entirety” claim. They differ on the residual.

AI Engineering: Building Applications with Foundation Models
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Five dimensions Clark gestures at but leaves underdeveloped.
Clark’s section is rigorous on the empirical evidence. Five strategic dimensions matter for the institutional response that the Clark series synthesis argues is structurally inadequate.

Innovation in Music: Current Research Perspectives (Perspectives on Music Production)
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Two readings. Different equilibria.
The structural question Clark leaves open: is research a permanent moat that bounds automated AI R&D, or is it engineering at scale that dissolves with more shots on goal? Both readings are consistent with the current data. They differ by orders of magnitude in consequences.
Productivity multiplier years
Recursive loop operational

Automated Machine Learning: Methods, Systems, Challenges (The Springer Series on Challenges in Machine Learning)
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Five audiences. Asymmetric cost of being wrong.
The institutional response should not bet on inspiration being a permanent moat. If the distinction holds, capacity built is still useful. If it closes, capacity is necessary. Asymmetric cost-of-being-wrong points toward building now.
IN INDUSTRY
IN ACADEMIA
POLICYMAKERS
INVESTORS
EVERYONE ELSE
Engineering is automated. The residual is the question. The institutional response should not bet on inspiration being a permanent moat.
Implications for AI Development and Human Role
The rapid automation of AI engineering tasks could significantly accelerate AI development cycles, reduce costs, and shift the human role from engineering to higher-level research and innovation. This transition may influence institutional strategies, funding, and talent allocation in AI labs. However, the uncertainty surrounding AI’s capacity to automate research itself raises questions about the future of AI-driven scientific discovery and the potential need for new frameworks to manage AI-generated research outputs.
Progress of AI Benchmarks and Automation Milestones
Over the past 15-16 months, multiple independent benchmarks—CORE-Bench, MLE-Bench, and kernel design—have shown rapid improvements in AI capabilities relevant to research and engineering. The CORE-Bench, which tests the reproduction of scientific papers, has been ‘solved’ at 95.5%, indicating AI can reproduce experimental setups reliably. The MLE-Bench, assessing performance in Kaggle competitions, has improved from 16.9% to 64.4%, nearing competitive human performance. Meanwhile, advances in kernel design, including automated GPU kernel optimization, are increasingly integrated into production infrastructure. These developments suggest a pattern of rapid saturation across core engineering skills, driven by continual improvements in large language models and automation techniques. Prior to this, progress was considered incremental, but recent data indicates a structural shift toward near-complete automation of engineering tasks in AI research.
“The pattern across these benchmarks indicates that AI can today automate vast swaths, perhaps the entirety, of AI engineering.”
— Thorsten Meyer
Extent of AI Automation in Scientific Research
It remains unclear how much of the research process—such as hypothesis generation, creative problem-solving, and novel discovery—can be automated. While engineering tasks are nearing full automation, the residual role of human researchers in innovative science is still uncertain. The structural question Clark leaves open is whether research itself is a form of engineering at scale, which could mean the residual closes faster than anticipated. Additionally, institutional responses and the development of AI capabilities beyond current benchmarks are still evolving, making future trajectories uncertain.
Monitoring Benchmark Saturation and AI Research Capabilities
In the coming months, researchers and institutions will closely monitor the progression of benchmarks, especially as some, like MLE-Bench, have paused submissions to develop better measurement standards. Further empirical studies will clarify whether AI can fully automate research tasks or if new challenges emerge. Additionally, development in kernel design and infrastructure automation will likely continue, potentially leading to near-complete automation of engineering tasks within the next 18-24 months. The key focus will be on understanding the residual research capabilities and their implications for scientific discovery.
Key Questions
What are the main benchmarks indicating AI automation progress?
The main benchmarks are CORE-Bench, measuring research reproduction; MLE-Bench, assessing Kaggle competition performance; and various kernel design benchmarks, evaluating automated infrastructure development.
How close is AI to fully automating AI engineering tasks?
Based on recent data, AI has nearly saturated key engineering benchmarks, suggesting full automation could be achievable within the next 18 months.
What tasks in AI research remain difficult for AI systems?
Tasks involving hypothesis generation, creative problem-solving, and novel scientific discovery are still less automated and remain challenging for current AI systems.
Could AI eventually automate the entire research process?
This remains uncertain. While engineering is approaching full automation, the residual research tasks may require human insight for the foreseeable future, though the structural possibility exists that research itself becomes a form of engineering at scale.
What does this mean for human researchers and institutions?
Institutions may need to shift focus from engineering to higher-level research and innovation, adapting to a landscape where automation handles most technical tasks, potentially transforming the scientific workforce and research paradigms.
Source: ThorstenMeyerAI.com