LLM Wrapped

2025

A data-driven look at language model development and deployment across the year.

The Enterprise Shift

While industry discourse proclaimed 2025 "the year of the agent," actual deployment tells a more measured story.

of enterprise developers exploring AI agents

of organizations experimenting

scaling agents in production

The gap between experimentation and production-scale deployment remained significant.

Frontier Model Evolution

2025 saw continued investment in large-scale models

Q1-Q2 2025

Claude Sonnet 3.7 (February) - First agent-oriented LLM
Gemini 2.5 Pro (March) - Deep Think reasoning mode
DeepSeek V3.1 (March) - Hybrid thinking/non-thinking architecture

Q2-Q3 2025

Claude Sonnet 4 & Opus 4 (May) - Tool use, extended thinking
Llama 4 (April) - Multimodal with 10M token context
GPT-5 (August) - Model routing, specialized thinking variants
Grok 4 (July) - Competitive benchmark performance

Q4 2025

Gemini 3 Pro (November) - Replaced Ultra tier
GPT 5.1 (November)
Claude Sonnet 4.5 & Opus 4.5 (December)

First Measurable Killer App

Code generation emerged as AI's first demonstrable commercial success.

Claude market share

OpenAI market share

$0B

Total ecosystem value

This marks the first clear productization beyond general chatbot interfaces.

From Instant to Iterative

2025 marked a shift from single-pass inference to multi-step reasoning architectures.

Chain-of-thought prompting
Moved from research to production
Extended thinking modes
16K-64K token reasoning chains
Self-verification loops
Reflection and iterative improvement
Production models
OpenAI o1 series, Gemini Deep Think, Claude's thinking mode

Improved performance on complex problem-solving at the cost of increased latency and compute.

Beyond Text Processing

Native multimodal processing became standard across frontier models.

Visual searches monthly (Google Lens)

Shopping-related searches

Vision: Image understanding, chart/diagram analysis
Audio: Voice mode with conversational latency
Video: Processing hours of video content (MMCTAgent architecture)
Cross-modal: Unified input/output across text, image, audio, video

The Open Weight Movement

Open source models continued closing the capability gap with proprietary systems.

Meta Llama 4 - Scout: 10M tokens, Maverick: multimodal
DeepSeek R1 - Reasoning models
Mistral Mixtral 8x22B - MoE architecture
OpenAI GPT-OSS - First open-weight release from OpenAI

Open Source Market Share

13% of AI workloads (down from 19% six months prior)

Scaling Input Capacity

Context windows grew by orders of magnitude

Llama 4 Scout tokens

Gemini 2.5 Pro tokens

Claude tokens

This enables processing entire codebases, lengthy documents, and multi-session conversations without summarization loss.

Deployment Realities

McKinsey survey data reveals uneven progress

Organizations using AI

AI enables innovation

Enterprise-level EBIT impact

Transition from proof-of-concept to enterprise-wide deployment continues to be the primary bottleneck.

Economics of Model Training

Training costs showed significant variance

DeepSeek V3

685B parameters (37B active via MoE)
Training cost: $5.5M
2.788M GPU hours on H800s
Competitive performance with models costing 10x more

Architecture choices (MoE, efficient attention) can dramatically reduce training economics while maintaining capability.

Standardization Efforts

2025 saw the emergence of protocols for agent interoperability

MCP

Model Context Protocol
Anthropic-initiated standard
for AI-to-system integration

A2A

Agent-to-Agent Protocol
Emerging standard
for multi-agent coordination

These represent infrastructure necessary for production agentic systems.

Persistent Issues

Despite advances, fundamental challenges remain

Technical

• "Black box" decision-making
• Hallucination rates improved but not eliminated
• Reasoning still brittle on edge cases
• Energy consumption at scale

Organizational

• ROI measurement remains difficult
• Change management for AI integration
• Data governance and privacy concerns
• Skill gaps in implementation teams

Text-to-Video Progress

Video generation moved from research demos to production tools

OpenAI Sora - General release
Google Veo - Integrated with Gemini
Runway Gen-2 - Commercial deployment
Kling O1 - Unified multimodal creation, solved character consistency

60-second 4K generation, improved temporal consistency, character/scene persistence across frames.

Conversational AI Audio

Voice capabilities advanced beyond text-to-speech

Real-time conversational latency - Sub-second response
Emotional tone modulation - Natural expression
Mid-utterance interruption - Human-like interaction
Multi-turn conversation - Context retention

Applications: Customer support automation, accessibility tools, language learning

Efficiency-Focused Development

2025 saw increased focus on smaller, specialized models

Tokens/second on device

On-device models - Phones, laptops (Gemma 3n, Microsoft Mu)
NPU-optimized inference - Edge computing
Privacy-preserving - Local deployment
Domain-specific - Fine-tuning

A counter-trend to "bigger is better," driven by edge computing, privacy requirements, and cost optimization.

Performance Measurement

New benchmarks emerged to test advanced capabilities

Humanity's Last Exam - Reasoning under open-ended conditions
GPQA Diamond - Complex question accuracy
SWE-bench - Software engineering problem-solving
Video-MME - Multimodal video understanding

As models saturate traditional benchmarks, evaluation frameworks continue evolving to test emerging capabilities.

Hype Calibration

Industry predictions showed typical technology adoption patterns

$0B

AI agent market (2024)

$0B

Projected (2030)

CAGR

Gartner positioned AI agents at "Peak of Inflated Expectations" - Classic Gartner Hype Cycle dynamics in action.

Embodied Intelligence

2025 saw early work on robotics integration

Google Gemini Robotics On-Device - VLA models
Vision-language-action models - Optimized for edge deployment
Real-world sensor data - LIDAR, GPS, video integration

This represents a research direction rather than deployed capability, but signals the next frontier beyond purely digital agents.

What 2025 Demonstrated

Technical Progress

• Reasoning capabilities improved
• Multimodal processing standard
• Context windows scaled
• Code generation reached PMF

Deployment Gap

• Experimentation exceeds deployment
• Enterprise-wide scaling challenging
• Human oversight essential
• ROI measurement evolving

2026 Outlook

Based on current trajectories

Likely

• Agentic workflow refinement
• Further multimodal integration
• More efficient training methods
• Expanded vertical applications

Uncertain

• Enterprise-wide deployment acceleration
• Novel capability breakthroughs
• Regulatory framework development
• Open vs. proprietary dynamics

The industry remains in rapid evolution with uncertain convergence points.

My small take: it's not clear that the industry will converge on a single model or architecture. The most likely outcome is a continued diversity of models and architectures, each with its own strengths and weaknesses. This is a good thing, as it will allow for more experimentation and innovation. However, it will also make it more difficult to compare models and architectures, and to understand the trade-offs between them. This is why we need more benchmarks and more standardized evaluation frameworks.

References

Survey Data & Market Research

IBM and Morning Consult Developer Survey

IBM. (2025). "AI Agents in 2025: Expectations vs. Reality." IBM Think Insights.

https://www.ibm.com/think/insights/ai-agents-2025-expectations-vs-reality

Key data: 99% of enterprise developers exploring AI agents, agent adoption patterns

McKinsey Global Survey on AI

McKinsey & Company. (2025). "The state of AI in 2025: Agents, innovation, and transformation."

https://www.mckinsey.com/capabilities/quantumblack/our-insights/the-state-of-ai

Survey period: June 25 - July 29, 2025 | Sample: 1,993 participants across 105 nations | Key data: 88% AI adoption, 62% experimenting with agents, 23% scaling agents, 39% reporting EBIT impact

Menlo Ventures LLM Market Update

Menlo Ventures. (2025). "2025 Mid-Year LLM Market Update: Foundation Model Landscape + Economics."

https://menlovc.com/perspective/2025-mid-year-llm-market-update/

Survey: 150 technical decision-makers at enterprises and startups (June 30 - July 10, 2025) | Key data: Claude 42% code generation market share, OpenAI 21%, $1.9B coding ecosystem, 13% open-source workload share

Gartner Research

Gartner, Inc. (2025). "Gartner Hype Cycle for Artificial Intelligence, 2025."

https://www.gartner.com/en/newsroom/press-releases/2025-08-05-gartner-hype-cycle-identifies-top-ai-innovations-in-2025

Key insight: AI agents and AI-ready data at "Peak of Inflated Expectations"

MarketsandMarkets Projection

Cited in: Alvarez & Marsal. (2025). "Demystifying AI Agents in 2025: Separating Hype From Reality and Navigating Market Outlook."

https://www.alvarezandmarsal.com/thought-leadership/demystifying-ai-agents-in-2025-separating-hype-from-reality-and-navigating-market-outlook

Projection: AI agent market from $5.1B (2024) to $47.1B (2030), 44.8% CAGR

Capgemini Research

Cited in: Collabnix. (2025). "Agentic AI Trends 2025: The Complete Guide to Autonomous Intelligence Revolution."

https://collabnix.com/agentic-ai-trends-2025-the-complete-guide-to-autonomous-intelligence-revolution/

Key data: 82% of organizations plan AI agent integration by 2026

Model Releases & Technical Documentation

Anthropic - Claude Models

Anthropic. (2024-2025). Claude 3.5 Sonnet, Claude 3.7 Sonnet, Claude 4 family releases. Source: Menlo Ventures report references February 2025 Claude 3.7 Sonnet release.

Key features: Extended thinking, MCP integration, agent-first design

OpenAI - GPT-5 Series

Multiple sources reference August 2025 GPT-5 release. InfoQ. (2025). "InfoQ AI, ML and Data Engineering Trends Report - 2025."

https://www.infoq.com/articles/ai-ml-data-engineering-trends-2025/

Google - Gemini Models

TechTarget. (2025). "30 of the best large language models in 2026." Splunk. (2025). "Top LLMs To Use in 2026: Our Best Picks."

https://www.splunk.com/en_us/blog/learn/llms-best-to-use.html

Key data: Gemini 2.5 Pro (March 2025) with Deep Think mode, 1M token context; Gemini 3 Pro (November 2025)

Meta - Llama 4

Shakudo. (2025). "Top 9 Large Language Models as of December 2025."

https://www.shakudo.io/blog/top-9-large-language-models

Key features: Llama 4 Scout with 10M token context window, multimodal capabilities

DeepSeek Models

Multiple sources document DeepSeek V3 and V3.1. Splunk report: DeepSeek-V3-0324 launched March 24, 2025.

Key data: 685B parameters (37B active), $5.5M training cost, 2.788M GPU hours on H800s. DeepSeek V3.1 (August 2025) with hybrid thinking/non-thinking modes

xAI - Grok 4

Medium (2025) report: Grok 4 launched July 2025.

Performance: Comparable SWE-bench to GPT-5 and Claude Opus 4.1

Mistral AI

Shakudo report documents Mixtral 8x22B with Mixture-of-Experts architecture.

Apache 2.0 license, open-source availability

Multimodal & Video Generation

Google Lens Data

Lumar. (2025). "Multimodal Search in 2025: Image, Video, & Voice Search."

https://www.lumar.io/blog/industry-news/multimodal-search-video-image-and-voice-search/

Key data: 20 billion visual searches monthly, 20% shopping-related, fastest growing among 18-24 age group

Kling O1 Video Model

Morningstar. (2025). "Kling O1 Launches as the World's First Unified Multimodal Video Model." Announcement: December 1, 2025.

Features: Unified multimodal creation tool, character consistency

Runway Gen-2

Runway Research. "Gen-2: Generate novel videos with text, images or video clips."

https://runwayml.com/research/gen-2

Capabilities: Text-to-video, image-to-video generation

Microsoft Research - MMCTAgent

Microsoft Research. (2025). "MMCTAgent: Enabling multimodal reasoning over large video and image collections." Published: November 12, 2025.

https://www.microsoft.com/en-us/research/blog/mmctagent-enabling-multimodal-reasoning-over-large-video-and-image-collections/

Architecture: Multi-modal Critical Thinking Agent for long-form video reasoning

General Multimodal AI Overview

Medium. (2025). "Multimodal AI in 2025: Integrating Text, Image, Audio, and Video for Smarter AI." Aya Data. (2025). "Multimodal AI: Breaking Down Barriers Between Text, Image, Audio and Video."

https://www.ayadata.ai/multimodal-ai-breaking-down-barriers-between-text-image-audio-and-video/

Industry Analysis & Trends

Microsoft AI Trends

Microsoft. (2025). "6 AI trends you'll see more of in 2025." Published: May 1, 2025.

https://news.microsoft.com/source/features/ai/6-ai-trends-youll-see-more-of-in-2025/

MIT Technology Review

MIT Technology Review. (2025). "What's next for AI in 2025." Published: January 24, 2025.

https://www.technologyreview.com/2025/01/08/1109188/whats-next-for-ai-in-2025/

Morgan Stanley Technology Analysis

Morgan Stanley. (2025). "5 AI Trends Shaping Innovation and ROI in 2025."

https://www.morganstanley.com/insights/articles/ai-trends-reasoning-frontier-models-2025-tmt

InfoQ Trends Report

InfoQ. (2025). "InfoQ AI, ML and Data Engineering Trends Report - 2025." Published: September 24, 2025.

https://www.infoq.com/articles/ai-ml-data-engineering-trends-2025/

Topics: Reasoning models, AI DevOps, Physical AI, Model Context Protocol

MarkTechPost Analysis

MarkTechPost. (2025). "AI Agent Trends of 2025: A Transformative Landscape." Published: August 10, 2025.

https://www.marktechpost.com/2025/08/10/ai-agent-trends-of-2025-a-transformative-landscape/

AIwire LLM Roundup

AIwire. (2025). "LLM Roundup: A Wave of New Releases Winds Down the Year." Published: November 25, 2025.

https://www.hpcwire.com/aiwire/2025/11/25/llm-roundup-a-wave-of-new-releases-winds-down-the-year/

Open Source & Local Models

Instaclustr Open Source Guide

Instaclustr. (2025). "Top 10 open source LLMs for 2025." Published: October 29, 2025.

https://www.instaclustr.com/education/open-source-ai/top-10-open-source-llms-for-2025/

Pinggy Local LLM Tools

Pinggy. (2025). "Top 5 Local LLM Tools and Models in 2025." Published: June 4, 2025.

https://pinggy.io/blog/top_5_local_llm_tools_and_models_2025/

Coverage: OpenAI GPT-OSS, DeepSeek V3.2-Exp, Qwen models, Llama 4, Gemma 3

Additional Technical Resources

Global Market Insights

Cited in Aya Data article. Multimodal AI market valuation: $1.6B (2024), projected 32.7% CAGR through 2034.

AWS Agent Investment

Cited in Alvarez & Marsal article. Reuters reporting on AWS doubling down on AI agents with new business unit.

Simon Willison - LLM Tool Development

Willison, S. (2025). Various LLM tool release notes.

https://simonwillison.net/series/llm-releases/

Coverage of GPT-5, tool usage, structured outputs

Notes on Data Currency

All data reflects information available through early December 2025. Survey data collection periods are noted where available. Market projections represent industry analyst forecasts and should be interpreted as forward-looking estimates rather than guaranteed outcomes.

The "December 2025" timeframe for this analysis means some Q4 2025 developments may still be emerging, and complete year-end data may not yet be available for all metrics cited.