DeepSeek V4 vs GPT-5.5: open weights reset the price floor — Apr 24

Audio

Read full brief

Your Daily AI Press Review — April 24, 2026: Price War.

DeepSeek just dropped V4-Pro and V4-Flash — open weights, MIT license, 1.6 trillion parameters, priced well below OpenAI and Anthropic. OpenAI countered the same day with its latest model, codenamed Spud, now live in ChatGPT and Codex for paid subscribers. Anthropic, meanwhile, confirmed real quality failures in Claude Code and promised stricter controls. Off the radar, Meituan's LongCat-2.0 quietly crossed one trillion parameters, trained entirely on domestic Chinese compute — a signal the mainstream missed.

DeepSeek released V4-Pro and V4-Flash as open-weight preview models under the MIT license. V4-Pro carries 1.6 trillion total parameters with 49 billion active — the largest open-weights model ever released, surpassing Kimi K2.6 at 1.1 trillion and GLM-5.1 at 754 billion. V4-Flash sits at 284 billion total with 13 billion active. Both support a one-million-token context window and run on both NVIDIA GPUs and Huawei's Ascend chips. The models were trained on up to 33 trillion tokens and refined through distillation from in-house specialist models, with a new hybrid sparse attention architecture that cuts compute dramatically for long contexts.

DeepSeek's pricing undercuts every major competitor. V4-Flash costs $0.14 per million input tokens and $0.28 per million output — cheaper than OpenAI's smallest nano-tier model. V4-Pro runs at $1.74 per million input and $3.48 per million output, making it the cheapest large frontier-class model available. The efficiency gain is structural: at a one-million-token context, V4-Pro requires only 27 percent of the compute and 10 percent of the KV cache compared to DeepSeek's previous V3.2. V4-Flash drops those figures to 10 percent of compute and 7 percent of KV cache. On Artificial Analysis's GDPval-AA benchmark, V4-Pro leads all open-weights models with 1,554 Elo points.

OpenAI released its latest model, internally codenamed Spud, one week after Anthropic's most recent launch. The model is live in ChatGPT and Codex for paid subscribers as of Thursday. OpenAI co-founder Greg Brockman described it as a faster, sharper thinker that handles multi-step workflows more autonomously with less user input, while matching the previous generation's response speed in real-world use. API access is being withheld pending additional cybersecurity guardrails. The model is not yet available through the standard OpenAI API, which is creating friction for developers who rely on programmatic access rather than the ChatGPT interface.

Anthropic confirmed that Claude Code experienced three separate sources of degraded quality, following sustained user complaints about declining output. The company identified and fixed all three issues and has committed to stricter quality controls going forward. The timing is damaging: Anthropic is navigating a potential IPO that analysts have valued near $800 billion, while revenue has tripled to about $30 billion this year. Chief rival OpenAI is actively courting frustrated Claude developers, pitching itself as the more stable alternative. Anthropic's problems over the past two months span product quality, pricing, security, and capacity — and are beginning to compound.

Google and NVIDIA announced new A5X bare-metal instances at Google Cloud Next, running on NVIDIA's Vera Rubin NVL72 rack-scale systems. The companies claim the hardware and software co-design delivers up to ten times lower inference cost compared to prior generations. Google Cloud's Andi Gutmans argued that no competitor currently combines cloud infrastructure, frontier AI models, and a data platform under one roof — positioning Google Cloud as structurally advantaged in the enterprise agent race. The announcement came alongside Google consolidating all enterprise AI agent tools — build, connect, secure, govern — into a single unified platform.

Meta announced it will lay off roughly 8,000 employees, about 10 percent of the company, citing soaring AI costs pressuring margins. Capital expenditures are expected to rise at least 60 percent this year compared to 2025, driven by Meta Superintelligence Labs and core business investment. Free cash flow is projected to fall about 83 percent year over year. Separately, Meta signed a deal to procure millions of Amazon's homegrown AI CPUs — not GPUs — for agentic workloads, signaling a new category of chip procurement optimized for inference-heavy agent pipelines rather than training.

Yann LeCun's startup AMI Labs raised $1 billion with just 12 employees on staff. LeCun, who built the company after leaving Meta, is betting against large language models as the path to general intelligence, pursuing a fundamentally different architecture. The funding round is one of the largest per-employee raises in AI history and reflects continued investor appetite for alternative approaches to the dominant transformer paradigm. AMI Labs has not disclosed its technical approach publicly, but LeCun has consistently argued that LLMs cannot achieve world-model reasoning.

Tencent released Hy3 Preview, its first flagship open-source model since former OpenAI researcher Yao Shunyu joined to lead foundational AI development. The model is described as Tencent's most powerful to date, on par with top Chinese models but still behind OpenAI and Google DeepMind's flagship products. Notably, Hy3 is relatively small for a frontier-class model, suggesting Tencent is prioritizing efficiency over raw parameter count. The release comes as DeepSeek's talent drain accelerates — key researchers including DeepSeek R1 co-author Guo Daya and LLM co-author Wang Bingxuan have been poached by ByteDance and Tencent.

Alibaba's Qwen app struck its first external partnership, linking with China Eastern Airlines to let users manage the full flight booking process — search, purchase, seat selection, and check-in — through a single natural-language chat interface. The integration pushes Qwen's agentic capabilities beyond Alibaba's own ecosystem for the first time. China's State Grid Corporation separately earmarked about $1 billion — 6.8 billion yuan — to deploy thousands of AI-powered robots across its power grid infrastructure, covering remote substation inspection and ultra-high-voltage line maintenance.

On deployments. The US Internal Revenue Service is now running 126 active AI applications across audit selection, fraud detection, taxpayer services, and operational workflows. That number stood at just 10 in August 2022 — a 12-fold increase in under four years. The expansion is accelerating as staffing gaps widen, with AI filling roles that would otherwise require human reviewers. The IRS deployment is one of the largest government-scale AI rollouts in the United States and covers the full tax administration lifecycle, from filing intake to enforcement.

Microsoft is testing Anthropic's Claude Mythos Preview through Project Glasswing, a joint initiative aimed at identifying and mitigating cybersecurity vulnerabilities before they can be exploited. Microsoft used its own open-source benchmark, CTI-REALM, to evaluate Mythos. The model has drawn attention for its ability to autonomously identify and exploit security flaws at a level that appears to surpass conventional enterprise tools. Mythos has not been made publicly available. The collaboration reflects a broader pattern of frontier AI labs partnering with hyperscalers specifically for offensive security evaluation before general deployment.

Citi deployed a Google-powered AI avatar for wealth management clients, marking one of the first uses of a conversational AI persona in high-net-worth financial advisory. The avatar is designed to handle client-facing interactions, drawing on Google's underlying model infrastructure. Citi's move follows a broader push by financial institutions to reduce advisor-to-client friction in the onboarding and portfolio review process. The deployment is live for a subset of wealth clients and represents a direct integration of Google Cloud's enterprise AI stack into a regulated financial services workflow.

Alibaba's AWS division published a framework for deploying multimodal biological foundation models across drug discovery and clinical development workflows. The post details real-world applications where models trained on genomic, proteomic, and clinical data are being used to accelerate therapeutic candidate identification and patient stratification. AWS is positioning its infrastructure as the deployment layer for these BioFMs, targeting pharmaceutical and hospital system clients. The multimodal approach — combining molecular structure, imaging, and clinical text — is being cited as a step-change over single-modality models in predicting treatment response.

Sony AI's autonomous table tennis robot, named Ace, defeated high-level human players in regulated matches. The system is part of Sony's physical AI research program, applying real-time perception and motor control to unstructured competitive environments. Separately, a humanoid robot won a footrace in Beijing, marking the first time a bipedal robot completed a competitive running event against human participants. Both results were reported within 48 hours, reflecting accelerating progress in physical AI systems operating outside controlled laboratory settings.

Sierra, the AI customer service agent startup founded by Bret Taylor, acquired Fragment, a YC-backed French startup. Fragment specializes in structured conversation design for enterprise agents. The acquisition gives Sierra a European engineering foothold and adds Fragment's conversation architecture capabilities to Sierra's existing agent platform. Bret Taylor previously served as Salesforce co-CEO and Twitter board chair. Sierra has been expanding its enterprise customer base and the Fragment deal is its first disclosed acquisition since founding.

Portal26 launched Agentic Token Controls, a new module designed to cap runaway token consumption by autonomous AI agents. The company says uncontrolled agent spend is driving unpredictable costs and operational instability for enterprise deployments. The module lets teams set hard token budgets per agent, per workflow, and per time window, with real-time alerts when thresholds are approached. As agentic workloads scale, token cost management is emerging as a distinct operational discipline separate from model selection or prompt engineering.

On tools. OpenAI's Codex CLI now exposes a semi-official API endpoint — the backend-api/codex/responses path — that third-party tools including OpenCode, Pi, and JetBrains are using to access OpenAI models through existing ChatGPT subscriptions. OpenAI's Romain Huet confirmed on March 30th that this access is intentional and welcome. Developer Simon Willison published a plugin called llm-openai-via-codex that hijacks Codex CLI credentials to make standard API calls, giving practitioners a way to access OpenAI's latest model without waiting for the formal API release. This effectively creates a lower-cost access path for developers already holding paid ChatGPT subscriptions.

Google open-sourced DESIGN.md, the agent prompt behind its AI design tool Stitch. The format is a structured markdown specification that teaches AI agents how to follow brand rules — typography, color systems, spacing, component logic — without requiring manual prompt engineering for each design task. Any team can now write a DESIGN.md file for their own brand and use it as a drop-in context document for AI design agents. The release makes brand-consistent AI-generated design reproducible and auditable, addressing one of the main objections to using generative tools in production design workflows.

Cloudflare launched a free tool that audits whether a website is readable by AI agents — checking for clean HTML structure, accessible metadata, and machine-parseable content. As AI agents increasingly browse the web autonomously to complete tasks, sites optimized only for human readers and search engine crawlers are becoming invisible to agent pipelines. The tool outputs a readiness score and specific remediation steps. Cloudflare is positioning this as the agent-era equivalent of a Lighthouse SEO audit, and the tool is available at no cost to any domain owner.

Microsoft published a free eight-chapter course, LangChain.js for Beginners, with over 70 runnable TypeScript examples covering how to build AI agents that reason, call tools, and retrieve from knowledge bases. The course is open-source and hosted on GitHub. It targets JavaScript developers who have used basic chat completions but have not yet built multi-step agentic workflows. Microsoft's decision to publish this through its developer blog — rather than a third-party platform — signals a direct push to expand the LangChain ecosystem among enterprise JavaScript developers already in the Microsoft toolchain.

Stream2LLM, a new serving system from academic researchers, overlaps context retrieval with LLM inference to cut time-to-first-token in RAG deployments. The system introduces adaptive scheduling for two retrieval patterns: append mode, where context accumulates progressively, and update mode, where cached context is invalidated and refreshed. It uses longest-common-prefix matching to avoid redundant computation when inputs change mid-stream. For enterprise teams running high-concurrency RAG pipelines, Stream2LLM addresses a fundamental latency bottleneck — the gap between when a query arrives and when the model can begin generating — without sacrificing retrieval quality.

One signal to watch. Enterprise LLM agents are leaking sensitive data at rates that scale with capability. A new benchmark called CI-Work, published on arXiv, tested frontier models on enterprise workflow simulations and found privacy violation rates ranging from about 16 percent to nearly 51 percent across models, with sensitive data leakage reaching up to 27 percent. Critically, the paper found that higher task utility correlates with increased privacy violations — and that simply increasing model size or reasoning depth does not fix the problem. Portal26's token control launch and the CI-Work findings together point to a maturing recognition that agentic AI requires governance infrastructure, not just model upgrades.

The AI chip market is fracturing along inference-versus-training lines. Groq's inference chips are being benchmarked at five times lower cost than NVIDIA's Blackwell and twice the throughput speed for inference workloads. Meta's deal to procure millions of Amazon's homegrown CPUs — not GPUs — for agentic tasks reinforces the same pattern. Intel is explicitly betting its recovery on inference and edge workloads, not training. DeepSeek's V4 architecture, which cuts compute requirements to 10 percent of prior models at long context, compounds this: as inference efficiency improves at the model level, the hardware requirements for deployment shift away from the GPU-dominated training stack toward cheaper, more specialized silicon.

Chinese AI infrastructure is decoupling from US hardware faster than public reporting suggests. DeepSeek V4 was trained on Huawei Ascend chips after a mid-2025 training failure forced a migration away from NVIDIA. Meituan's LongCat-2.0-Preview, which crossed one trillion parameters, was trained entirely on domestic Chinese compute clusters. China's State Grid Corporation is deploying AI robots funded at about $1 billion, with no disclosed US hardware dependency. Three separate data points — DeepSeek, Meituan, and State Grid — all within 48 hours, indicate that China's AI production stack is operationally independent of US GPU supply chains at scale.

Off the radar. Meituan's new foundation model, LongCat-2.0-Preview, quietly opened for testing on April 24th with over one trillion parameters and a one-million-token context window — trained entirely on domestic Chinese compute clusters with no disclosed NVIDIA hardware. The model is optimized for agentic tasks including code generation, complex task planning, and enterprise automation. This is the first trillion-parameter model from a Chinese consumer internet company — not a dedicated AI lab — and its full domestic compute stack makes it invisible to US export control pressure. Mainstream Western tech media has not covered it.

South Korea's SK Group announced plans to build an AI data center in Vietnam, marking one of the first major Korean conglomerate investments in Southeast Asian AI infrastructure. Vietnam is emerging as a secondary hub for AI compute buildout in Asia, with lower land and energy costs than South Korea, Japan, or Singapore. SK Group's move follows similar quiet expansions by Singaporean and Taiwanese operators into Vietnam and Indonesia. For enterprises planning Asia-Pacific AI deployments, Vietnam is becoming a viable low-cost inference hosting location outside the established hyperscaler regions.

A research preprint from Japanese researchers, published in Nature, describes an AI system that designs thermoelectric generators — devices that convert waste heat into electricity — about 10,000 times faster than conventional simulation methods. The system identifies material combinations that conduct electricity while blocking heat, a problem that previously required slow experiments and painstaking iteration. The result has direct implications for industrial energy recovery, wearable electronics, and data center cooling. The paper is from Japan and has received almost no coverage in English-language AI or energy media.

China's government is moving to require approval before domestic tech companies can accept US capital, according to Bloomberg. The measure would give Beijing veto power over foreign investment in Chinese AI and technology firms. This directly affects the funding dynamics for companies like DeepSeek, which opened an external fundraising window in mid-April 2026 after previously operating without outside investment. Industry sources cited in Chinese media suggest Tencent was in discussions for an exclusive stake in DeepSeek before the talks broke down. A formal approval requirement would structurally limit Western venture capital access to Chinese AI assets.

On the research front. Apple researchers published ParaRNN at ICLR 2026 as an Oral paper — the highest recognition tier — demonstrating a framework that parallelizes training of nonlinear recurrent neural networks and achieves a 665-times speedup over the traditional sequential approach. The result enables the first 7-billion-parameter classical RNN that matches transformer performance on language modeling benchmarks. RNNs require constant compute per token at inference regardless of context length, unlike transformers whose cost grows quadratically. Apple released the ParaRNN codebase as open source, giving practitioners a new architecture option for resource-constrained deployment where inference cost matters more than training speed.

Google DeepMind published Decoupled DiLoCo, a distributed training architecture that splits compute into asynchronous, fault-isolated islands and achieves 88 percent goodput — meaning 88 percent of theoretical maximum throughput — even under high hardware failure rates. Standard distributed training requires about 198 gigabits per second of inter-datacenter bandwidth and stalls when any chip slows down. Decoupled DiLoCo reduces that bandwidth requirement by orders of magnitude and allows training to continue across geographically distant data centers without tight synchronization. For organizations planning multi-datacenter AI training infrastructure, this removes the single biggest operational fragility in large-scale pre-training runs.

Researchers at Artificial Analysis published benchmark results showing DeepSeek V4-Pro leads all open-weights models with 1,554 Elo points on the GDPval-AA benchmark, which measures general-purpose reasoning and instruction-following across diverse tasks. The CI-Work benchmark from a separate academic team tested fifteen frontier models on enterprise privacy tasks and found that no current model reliably prevents sensitive data leakage in dense retrieval settings — with the best models still violating privacy norms in roughly 16 percent of cases. Together, these two results define the current frontier: open-weights models are now competitive on capability benchmarks, but no model — open or closed — has solved the enterprise privacy problem at the architectural level.

This podcast has a daily production cost. If you enjoy it, support it — the link is on the podcast page. Thank you.