LLM Wrapped

2025

A data-driven look at language model development and deployment across the year.

The Enterprise Shift

While industry discourse proclaimed 2025 "the year of the agent," actual deployment tells a more measured story.

0%
of enterprise developers exploring AI agents
0%
of organizations experimenting
0%
scaling agents in production

The gap between experimentation and production-scale deployment remained significant.

Frontier Model Evolution

2025 saw continued investment in large-scale models

Q1-Q2 2025
Claude Sonnet 3.7 (February) - First agent-oriented LLM
Gemini 2.5 Pro (March) - Deep Think reasoning mode
DeepSeek V3.1 (March) - Hybrid thinking/non-thinking architecture
Q2-Q3 2025
Claude Sonnet 4 & Opus 4 (May) - Tool use, extended thinking
Llama 4 (April) - Multimodal with 10M token context
GPT-5 (August) - Model routing, specialized thinking variants
Grok 4 (July) - Competitive benchmark performance
Q4 2025
Gemini 3 Pro (November) - Replaced Ultra tier
GPT 5.1 (November)
Claude Sonnet 4.5 & Opus 4.5 (December)

First Measurable Killer App

Code generation emerged as AI's first demonstrable commercial success.

0%
Claude market share
0%
OpenAI market share
$0B
Total ecosystem value

This marks the first clear productization beyond general chatbot interfaces.

From Instant to Iterative

2025 marked a shift from single-pass inference to multi-step reasoning architectures.

  • Chain-of-thought prompting

    Moved from research to production

  • Extended thinking modes

    16K-64K token reasoning chains

  • Self-verification loops

    Reflection and iterative improvement

  • Production models

    OpenAI o1 series, Gemini Deep Think, Claude's thinking mode

Improved performance on complex problem-solving at the cost of increased latency and compute.

Beyond Text Processing

Native multimodal processing became standard across frontier models.

0B
Visual searches monthly (Google Lens)
0%
Shopping-related searches
  • Vision: Image understanding, chart/diagram analysis
  • Audio: Voice mode with conversational latency
  • Video: Processing hours of video content (MMCTAgent architecture)
  • Cross-modal: Unified input/output across text, image, audio, video

The Open Weight Movement

Open source models continued closing the capability gap with proprietary systems.

  • Meta Llama 4 - Scout: 10M tokens, Maverick: multimodal
  • DeepSeek R1 - Reasoning models
  • Mistral Mixtral 8x22B - MoE architecture
  • OpenAI GPT-OSS - First open-weight release from OpenAI
Open Source Market Share

13% of AI workloads (down from 19% six months prior)

Scaling Input Capacity

Context windows grew by orders of magnitude

0M
Llama 4 Scout tokens
0M
Gemini 2.5 Pro tokens
0K
Claude tokens

This enables processing entire codebases, lengthy documents, and multi-session conversations without summarization loss.

Deployment Realities

McKinsey survey data reveals uneven progress

Organizations using AI
AI enables innovation
Enterprise-level EBIT impact

Transition from proof-of-concept to enterprise-wide deployment continues to be the primary bottleneck.

Economics of Model Training

Training costs showed significant variance

DeepSeek V3
685B parameters (37B active via MoE)
Training cost: $5.5M
2.788M GPU hours on H800s
Competitive performance with models costing 10x more

Architecture choices (MoE, efficient attention) can dramatically reduce training economics while maintaining capability.

Standardization Efforts

2025 saw the emergence of protocols for agent interoperability

MCP
Model Context Protocol
Anthropic-initiated standard
for AI-to-system integration
A2A
Agent-to-Agent Protocol
Emerging standard
for multi-agent coordination

These represent infrastructure necessary for production agentic systems.

Persistent Issues

Despite advances, fundamental challenges remain

Technical
• "Black box" decision-making
• Hallucination rates improved but not eliminated
• Reasoning still brittle on edge cases
• Energy consumption at scale
Organizational
• ROI measurement remains difficult
• Change management for AI integration
• Data governance and privacy concerns
• Skill gaps in implementation teams

Text-to-Video Progress

Video generation moved from research demos to production tools

  • OpenAI Sora - General release
  • Google Veo - Integrated with Gemini
  • Runway Gen-2 - Commercial deployment
  • Kling O1 - Unified multimodal creation, solved character consistency

60-second 4K generation, improved temporal consistency, character/scene persistence across frames.

Conversational AI Audio

Voice capabilities advanced beyond text-to-speech

  • Real-time conversational latency - Sub-second response
  • Emotional tone modulation - Natural expression
  • Mid-utterance interruption - Human-like interaction
  • Multi-turn conversation - Context retention

Applications: Customer support automation, accessibility tools, language learning

Efficiency-Focused Development

2025 saw increased focus on smaller, specialized models

0+
Tokens/second on device
  • On-device models - Phones, laptops (Gemma 3n, Microsoft Mu)
  • NPU-optimized inference - Edge computing
  • Privacy-preserving - Local deployment
  • Domain-specific - Fine-tuning

A counter-trend to "bigger is better," driven by edge computing, privacy requirements, and cost optimization.

Performance Measurement

New benchmarks emerged to test advanced capabilities

  • Humanity's Last Exam - Reasoning under open-ended conditions
  • GPQA Diamond - Complex question accuracy
  • SWE-bench - Software engineering problem-solving
  • Video-MME - Multimodal video understanding

As models saturate traditional benchmarks, evaluation frameworks continue evolving to test emerging capabilities.

Hype Calibration

Industry predictions showed typical technology adoption patterns

$0B
AI agent market (2024)
$0B
Projected (2030)
0%
CAGR

Gartner positioned AI agents at "Peak of Inflated Expectations" - Classic Gartner Hype Cycle dynamics in action.

Embodied Intelligence

2025 saw early work on robotics integration

  • Google Gemini Robotics On-Device - VLA models
  • Vision-language-action models - Optimized for edge deployment
  • Real-world sensor data - LIDAR, GPS, video integration

This represents a research direction rather than deployed capability, but signals the next frontier beyond purely digital agents.

What 2025 Demonstrated

Technical Progress
• Reasoning capabilities improved
• Multimodal processing standard
• Context windows scaled
• Code generation reached PMF
Deployment Gap
• Experimentation exceeds deployment
• Enterprise-wide scaling challenging
• Human oversight essential
• ROI measurement evolving

2026 Outlook

Based on current trajectories

Likely
• Agentic workflow refinement
• Further multimodal integration
• More efficient training methods
• Expanded vertical applications
Uncertain
• Enterprise-wide deployment acceleration
• Novel capability breakthroughs
• Regulatory framework development
• Open vs. proprietary dynamics

The industry remains in rapid evolution with uncertain convergence points.

My small take: it's not clear that the industry will converge on a single model or architecture. The most likely outcome is a continued diversity of models and architectures, each with its own strengths and weaknesses. This is a good thing, as it will allow for more experimentation and innovation. However, it will also make it more difficult to compare models and architectures, and to understand the trade-offs between them. This is why we need more benchmarks and more standardized evaluation frameworks.

References

Survey Data & Market Research

IBM and Morning Consult Developer Survey
IBM. (2025). "AI Agents in 2025: Expectations vs. Reality." IBM Think Insights.
https://www.ibm.com/think/insights/ai-agents-2025-expectations-vs-reality
Key data: 99% of enterprise developers exploring AI agents, agent adoption patterns
McKinsey Global Survey on AI
McKinsey & Company. (2025). "The state of AI in 2025: Agents, innovation, and transformation."
https://www.mckinsey.com/capabilities/quantumblack/our-insights/the-state-of-ai
Survey period: June 25 - July 29, 2025 | Sample: 1,993 participants across 105 nations | Key data: 88% AI adoption, 62% experimenting with agents, 23% scaling agents, 39% reporting EBIT impact
Menlo Ventures LLM Market Update
Menlo Ventures. (2025). "2025 Mid-Year LLM Market Update: Foundation Model Landscape + Economics."
https://menlovc.com/perspective/2025-mid-year-llm-market-update/
Survey: 150 technical decision-makers at enterprises and startups (June 30 - July 10, 2025) | Key data: Claude 42% code generation market share, OpenAI 21%, $1.9B coding ecosystem, 13% open-source workload share
Gartner Research
Gartner, Inc. (2025). "Gartner Hype Cycle for Artificial Intelligence, 2025."
https://www.gartner.com/en/newsroom/press-releases/2025-08-05-gartner-hype-cycle-identifies-top-ai-innovations-in-2025
Key insight: AI agents and AI-ready data at "Peak of Inflated Expectations"
MarketsandMarkets Projection
Cited in: Alvarez & Marsal. (2025). "Demystifying AI Agents in 2025: Separating Hype From Reality and Navigating Market Outlook."
https://www.alvarezandmarsal.com/thought-leadership/demystifying-ai-agents-in-2025-separating-hype-from-reality-and-navigating-market-outlook
Projection: AI agent market from $5.1B (2024) to $47.1B (2030), 44.8% CAGR
Capgemini Research
Cited in: Collabnix. (2025). "Agentic AI Trends 2025: The Complete Guide to Autonomous Intelligence Revolution."
https://collabnix.com/agentic-ai-trends-2025-the-complete-guide-to-autonomous-intelligence-revolution/
Key data: 82% of organizations plan AI agent integration by 2026

Model Releases & Technical Documentation

Anthropic - Claude Models
Anthropic. (2024-2025). Claude 3.5 Sonnet, Claude 3.7 Sonnet, Claude 4 family releases. Source: Menlo Ventures report references February 2025 Claude 3.7 Sonnet release.
Key features: Extended thinking, MCP integration, agent-first design
OpenAI - GPT-5 Series
Multiple sources reference August 2025 GPT-5 release. InfoQ. (2025). "InfoQ AI, ML and Data Engineering Trends Report - 2025."
https://www.infoq.com/articles/ai-ml-data-engineering-trends-2025/
Google - Gemini Models
TechTarget. (2025). "30 of the best large language models in 2026." Splunk. (2025). "Top LLMs To Use in 2026: Our Best Picks."
https://www.splunk.com/en_us/blog/learn/llms-best-to-use.html
Key data: Gemini 2.5 Pro (March 2025) with Deep Think mode, 1M token context; Gemini 3 Pro (November 2025)
Meta - Llama 4
Shakudo. (2025). "Top 9 Large Language Models as of December 2025."
https://www.shakudo.io/blog/top-9-large-language-models
Key features: Llama 4 Scout with 10M token context window, multimodal capabilities
DeepSeek Models
Multiple sources document DeepSeek V3 and V3.1. Splunk report: DeepSeek-V3-0324 launched March 24, 2025.
Key data: 685B parameters (37B active), $5.5M training cost, 2.788M GPU hours on H800s. DeepSeek V3.1 (August 2025) with hybrid thinking/non-thinking modes
xAI - Grok 4
Medium (2025) report: Grok 4 launched July 2025.
Performance: Comparable SWE-bench to GPT-5 and Claude Opus 4.1
Mistral AI
Shakudo report documents Mixtral 8x22B with Mixture-of-Experts architecture.
Apache 2.0 license, open-source availability

Multimodal & Video Generation

Google Lens Data
Lumar. (2025). "Multimodal Search in 2025: Image, Video, & Voice Search."
https://www.lumar.io/blog/industry-news/multimodal-search-video-image-and-voice-search/
Key data: 20 billion visual searches monthly, 20% shopping-related, fastest growing among 18-24 age group
Kling O1 Video Model
Morningstar. (2025). "Kling O1 Launches as the World's First Unified Multimodal Video Model." Announcement: December 1, 2025.
Features: Unified multimodal creation tool, character consistency
Runway Gen-2
Runway Research. "Gen-2: Generate novel videos with text, images or video clips."
https://runwayml.com/research/gen-2
Capabilities: Text-to-video, image-to-video generation
Microsoft Research - MMCTAgent
Microsoft Research. (2025). "MMCTAgent: Enabling multimodal reasoning over large video and image collections." Published: November 12, 2025.
https://www.microsoft.com/en-us/research/blog/mmctagent-enabling-multimodal-reasoning-over-large-video-and-image-collections/
Architecture: Multi-modal Critical Thinking Agent for long-form video reasoning
General Multimodal AI Overview
Medium. (2025). "Multimodal AI in 2025: Integrating Text, Image, Audio, and Video for Smarter AI." Aya Data. (2025). "Multimodal AI: Breaking Down Barriers Between Text, Image, Audio and Video."
https://www.ayadata.ai/multimodal-ai-breaking-down-barriers-between-text-image-audio-and-video/

Industry Analysis & Trends

Microsoft AI Trends
Microsoft. (2025). "6 AI trends you'll see more of in 2025." Published: May 1, 2025.
https://news.microsoft.com/source/features/ai/6-ai-trends-youll-see-more-of-in-2025/
MIT Technology Review
MIT Technology Review. (2025). "What's next for AI in 2025." Published: January 24, 2025.
https://www.technologyreview.com/2025/01/08/1109188/whats-next-for-ai-in-2025/
Morgan Stanley Technology Analysis
Morgan Stanley. (2025). "5 AI Trends Shaping Innovation and ROI in 2025."
https://www.morganstanley.com/insights/articles/ai-trends-reasoning-frontier-models-2025-tmt
InfoQ Trends Report
InfoQ. (2025). "InfoQ AI, ML and Data Engineering Trends Report - 2025." Published: September 24, 2025.
https://www.infoq.com/articles/ai-ml-data-engineering-trends-2025/
Topics: Reasoning models, AI DevOps, Physical AI, Model Context Protocol
MarkTechPost Analysis
MarkTechPost. (2025). "AI Agent Trends of 2025: A Transformative Landscape." Published: August 10, 2025.
https://www.marktechpost.com/2025/08/10/ai-agent-trends-of-2025-a-transformative-landscape/
AIwire LLM Roundup
AIwire. (2025). "LLM Roundup: A Wave of New Releases Winds Down the Year." Published: November 25, 2025.
https://www.hpcwire.com/aiwire/2025/11/25/llm-roundup-a-wave-of-new-releases-winds-down-the-year/

Open Source & Local Models

Instaclustr Open Source Guide
Instaclustr. (2025). "Top 10 open source LLMs for 2025." Published: October 29, 2025.
https://www.instaclustr.com/education/open-source-ai/top-10-open-source-llms-for-2025/
Pinggy Local LLM Tools
Pinggy. (2025). "Top 5 Local LLM Tools and Models in 2025." Published: June 4, 2025.
https://pinggy.io/blog/top_5_local_llm_tools_and_models_2025/
Coverage: OpenAI GPT-OSS, DeepSeek V3.2-Exp, Qwen models, Llama 4, Gemma 3

Additional Technical Resources

Global Market Insights
Cited in Aya Data article. Multimodal AI market valuation: $1.6B (2024), projected 32.7% CAGR through 2034.
AWS Agent Investment
Cited in Alvarez & Marsal article. Reuters reporting on AWS doubling down on AI agents with new business unit.
Simon Willison - LLM Tool Development
Willison, S. (2025). Various LLM tool release notes.
https://simonwillison.net/series/llm-releases/
Coverage of GPT-5, tool usage, structured outputs
Notes on Data Currency
All data reflects information available through early December 2025. Survey data collection periods are noted where available. Market projections represent industry analyst forecasts and should be interpreted as forward-looking estimates rather than guaranteed outcomes.

The "December 2025" timeframe for this analysis means some Q4 2025 developments may still be emerging, and complete year-end data may not yet be available for all metrics cited.