DeepSeek V4 Resets Open-Source Pricing, DOJ Backs xAI in Colorado, and AI Costs Outpace Headcount

Audio

Read full brief

Your Weekly AI Press Review — Week of April 26, 2026: Open-Source Pressure.

This Friday, DeepSeek dropped the biggest open-source model ever — and priced it to hurt. Today's episode covers that release and every other major Friday move, from the DOJ stepping into AI regulation to Meta's chip deal with Amazon. We've also got off-radar signals your Bloomberg terminal skipped, including a Chinese industrial AI startup closing its third round in twelve months. Then we close with what to watch next week.

DeepSeek released V4 Pro on Friday — the largest open-weight model ever shipped. It carries 1.6 trillion total parameters, with 49 billion active per token. The smaller V4 Flash runs 284 billion parameters, 13 billion active. Both support one million token context windows. V4 Pro was trained on 33 trillion tokens using FP4 quantization and a novel hybrid attention architecture called Compressed Sparse Attention. DeepSeek says V4 Pro outperforms every open-weight peer and rivals GPT-5.4 and Gemini 3.0 Pro on several coding benchmarks — though it trails frontier models on knowledge tests by an estimated three to six months.

The pricing is the real story. DeepSeek is charging $1.74 per million input tokens and $3.48 per million output tokens for V4 Pro. That's roughly one-fifth the cost of GPT-5.5 at the API. V4 Flash is even cheaper — $0.14 input and $0.28 output — undercutting every comparable small model on the market. At one million token context, V4 Pro uses only 27 percent of the compute FLOPs of a standard transformer. The model is MIT-licensed and available on Hugging Face. DeepSeek's API now supports OpenAI and Anthropic interface standards, making migration frictionless.

The US Department of Justice intervened Friday in xAI's lawsuit against Colorado's AI regulation law. The DOJ sided with Elon Musk's xAI, arguing Colorado's Senate Bill 205 violates the 14th Amendment's equal protection clause. The law — scheduled to take effect June 30 — imposes disclosure and risk-mitigation requirements on high-risk AI systems used in employment, housing, healthcare, and financial services decisions. The federal intervention transforms a single-company challenge into a direct confrontation between the Trump administration and Colorado. The Trump White House has been pushing for a single national AI framework to preempt state-level regulation.

Meta signed a deal Friday to use millions of AWS Graviton chips for its growing AI workloads. Graviton is an ARM-based CPU — not a GPU — designed for inference and agentic compute tasks like real-time reasoning, code execution, and multi-step coordination. The deal pulls Meta spending back toward AWS and away from Google Cloud, which Meta had committed to in a six-year, ten billion dollar deal last August. AWS timed the announcement to land as Google Cloud Next wrapped up — a pointed signal to its cloud rival.

xAI launched its new flagship voice model on Friday, topping the tau-voice benchmark at 67.3 percent. The nearest competitor, Gemini Flash Live, scored 43.8 percent. xAI's own previous voice model scored 38.3 percent. GPT Realtime scored 35.3 percent. The model is a full-duplex system — it processes incoming speech and generates responses simultaneously, handling interruptions, accents, and background noise in real time. It's already deployed at scale powering Starlink's live phone operations and is available via the xAI API.

The UK government quietly revised its estimate of AI datacenter carbon emissions — upward by a factor of more than 100. New figures published Friday put potential CO2 output from AI infrastructure at between 34 million and 123 million tonnes over the next ten years. The previous estimate, since deleted, had projected a maximum of 0.142 million tonnes in a single year. The revision appeared in an update to the UK's compute roadmap. The lower end of the new range assumes faster AI efficiency gains and accelerated grid decarbonization — neither of which is guaranteed.

The Trump administration missed three key AI regulatory deadlines set in its December executive order. All three provisions were due March 11. The FTC was supposed to issue guidance on how consumer protection law applies to AI outputs. The Commerce Department was due to publish an evaluation of state AI laws. A third provision also went undelivered. The missed deadlines are raising questions about how forcefully the administration can follow through on its push to preempt state-level AI regulation — particularly as the Colorado confrontation escalates.

A Federal Reserve Board study published Friday found that US programmer job growth has nearly halved since ChatGPT launched in November 2022. Before that date, programming-heavy jobs were growing at just under 5 percent annually. Since then, growth has flatlined in IT services and software development. After controlling for broader tech sector pressures — rate hikes, the post-COVID correction — programmer employment is still falling by about 3 percentage points per year. Stretched over three years, the gap represents roughly 500,000 jobs that would likely have existed without LLMs.

Cohere and Aleph Alpha announced a merger backed by a 600 million dollar structured financing commitment from Schwarz Group, Germany's largest retailer. The deal is structured as a Series E. Toronto-based Cohere has raised about 1.6 billion dollars since 2019, with Nvidia among its backers. Aleph Alpha, based in Germany, focuses on custom AI for regulated sectors including finance and healthcare, with a strong European public sector client base. The combined entity positions itself as the primary European alternative to US hyperscaler AI platforms — a direct pitch to EU sovereignty concerns.

GPT-5.5 launched earlier this week and costs about 20 percent more than its predecessor at the API level — after accounting for a 40 percent reduction in token usage. The model tops the Artificial Analysis Intelligence Index with 60 points, three ahead of Claude Opus 4.7 and Gemini 3.1 Pro. But it has a documented hallucination problem. On BullshitBench — a test of 100 nonsensical-but-plausible questions — GPT-5.5 pushes back on bad premises only about 45 percent of the time. Anthropic's Claude models lead that leaderboard. GPT-5.5 Pro fared worse, at around 35 percent.

A benchmark called BankerToolBench, released by Handshake AI and McGill University, tested nine top AI models against real investment banking workflows. Around 500 current and former bankers from Goldman Sachs, JPMorgan, Evercore, Morgan Stanley, and Lazard evaluated the outputs. Not a single AI output was rated ready for client delivery. More than half of the bankers said they'd use the output as a starting point. Each task took a human banker an average of five hours — some up to 21 hours. The benchmark grades actual Excel models, PowerPoint decks, PDF reports, and Word memos against rubrics averaging 150 individual criteria.

The IRS is now running 126 active AI applications across audit selection, fraud detection, taxpayer services, and operational workflows. That number stood at 10 in August 2022. Of those 126 applications, 61 percent are still in development. Machine learning models now score millions of returns simultaneously for noncompliance risk. The criminal investigation function uses AI tools including Palantir systems to process suspicious activity reports. Revenue agents also have access to generative AI for drafting audit documents — with the AI producing a first draft that the agent reviews and finalizes.

Anthropic's internal experiment called Project Deal ran for one week in December 2025, letting 69 Claude agents negotiate and trade real goods for employees. The result: agents running the more capable Opus model consistently secured better prices than those running the smaller Haiku model. Aggressive negotiation instructions made no statistically significant difference. The most striking finding — users of the weaker Haiku model rated their deals just as fair as Opus users, despite receiving objectively worse outcomes. Anthropic flags this as a form of invisible inequality in AI-assisted decision-making.

Alibaba released Qwen3.6-27B on Friday — a 27 billion parameter dense model that outperforms its 397 billion parameter predecessor on nearly every coding benchmark. It scored 77.2 on SWE-bench Verified versus 76.2 for the larger model. On Terminal-Bench 2.0, it scored 59.3 versus 52.5. The model is available on Hugging Face, ModelScope, and the Alibaba Cloud API. As a dense model, it's easier to deploy than mixture-of-experts architectures. The result continues a pattern of Chinese open-source labs delivering outsized benchmark performance at dramatically smaller parameter counts.

Cloudflare launched a free tool on Friday called isitagentready.com, which scores any website on its readiness for AI agent access. Cloudflare scanned the 200,000 most-visited domains and found that only 3.9 percent serve content in Markdown when requested by an agent. Fewer than 15 sites in the entire set have deployed MCP Server Cards or API Catalog standards. Cloudflare's internal tests show that Markdown responses reduce token consumption by up to 80 percent compared to HTML. The tool scores sites across four axes: discoverability, content format, access control, and agentic capabilities.

Tencent released its first flagship model under former OpenAI researcher Yao Shunyu on Thursday. The open-source model, called Hy3 Preview, has 295 billion parameters — smaller than its predecessor HY 2.0, which had over 400 billion. Tencent says it's already deployed in Yuanbao and CodeBuddy. Agentic capabilities were described as the most significantly improved area. The model is on par with top Chinese peers but still trails OpenAI and Google DeepMind's frontier offerings. The release marks Yao Shunyu's first public output since joining Tencent to lead foundational AI development.

DeepSeek is raising external capital for the first time — but deliberately small. The company is seeking to sell no more than 3 percent of its equity, according to sources cited by the South China Morning Post. The round is not driven by cash needs. DeepSeek wants to establish a market valuation benchmark to give employees clarity on stock option values — a move aimed at stemming departures amid aggressive poaching by well-funded rivals. Large state-backed funds are expected to participate. One investor cited a potential valuation exceeding 100 billion dollars.

Intel's Q1 2026 earnings call, reported Friday, showed AI-driven business lines accounting for 60 percent of 13.6 billion dollars in quarterly revenue — up 40 percent year-on-year. CEO Lip-Bu Tan argued that inference and agentic workloads are pulling the CPU back to the center of enterprise compute. He cited demand from agents, robots, and edge devices as the primary growth vector. Intel expects design commitments for its 14A process node to emerge in the second half of 2026. The company is betting that the shift from training to inference restores the CPU's relevance after years of GPU dominance.

On deployments. Home Depot reported this Friday that its AI voice agent identifies customer intent in roughly 10 seconds — and resolves calls four times faster than traditional menu-based systems. The agent can initiate service requests, send product links, complete purchases, and assemble a shopping cart from a verbal project description. The system is live in pilot. It represents a direct conversion of call center cost into a revenue-generating channel — the same pattern showing up across retail and banking earnings this quarter.

Wells Fargo's AI assistant Fargo surpassed one billion customer interactions — three years after launch. Management cited the milestone on its earnings call this week as evidence of sustained self-service adoption at scale. The bank is using that engagement data to reduce handoffs and guide customers toward digital resolution without human agents. Bank of America and other large US banks are reporting similar trajectories, with AI-assisted servicing now a standard line item in quarterly management commentary.

Ulta Beauty deployed a Google Gemini-powered AI shopping assistant called Ulta AI on Ulta.com this week. The assistant draws on insights from 46 million loyalty members to deliver personalized product guidance. Separately, Ulta is rolling out agentic commerce across Google surfaces — including AI Mode in Search and the Gemini app — over the next month. Shoppers will be able to receive recommendations, compare options, and complete checkout within Google's conversational interfaces using the Universal Commerce Protocol standard.

NVIDIA reported this Friday that over 10,000 of its own employees across engineering, legal, marketing, finance, and HR are actively using GPT-5.5-powered Codex. The company says debugging cycles that previously stretched across days are now closing in hours. Features that required weeks of work are shipping overnight. NVIDIA is running Codex on its own GB200 NVL72 infrastructure, which delivers 35 times lower cost per million tokens and 50 times higher token throughput per megawatt versus prior-generation systems.

AppZen launched its AP Inbox Service Center this week — eight prebuilt AI agents that automate vendor email handling for finance teams. The agents cover payment status responses, bank change verification, duplicate invoice detection, W-9 compliance routing, and remittance assistance. AppZen's own customer data shows AP reviewers currently spend as much as one week per month on that work. The agents can be configured without IT involvement through AppZen's Agent Studio. Every decision is auditable, showing what the agent evaluated and why it acted.

The IRS is using AI to detect fraud in real time during the return filing process itself — flagging emerging compliance threats as returns arrive. Revenue agents use generative AI to draft information document requests and audit reports, with the AI producing a first draft for human review. The criminal investigation function uses Palantir-built tools to process suspicious activity reports at speeds that previously required many hours of agent time per case. The IRS ran 10 AI applications in August 2022. It now runs 126.

China's State Grid Corporation earmarked 6.8 billion yuan — about one billion dollars — for AI-powered robots in 2026 alone. The plan calls for purchasing around 8,500 robots this year to inspect remote substations and maintain ultra-high-voltage power lines. About 5.8 billion yuan of that goes to hardware procurement. When similar plans from China Southern Power Grid are included, total industry investment in embodied AI for the power sector is expected to surpass 10 billion yuan this year. Shenzhen's robotics industry hit a record 242 billion yuan in output in 2025 — up 20 percent year-on-year.

Off the radar. A Chinese industrial AI startup called Zhiyong Kaiwu closed its third funding round in twelve months — raising nearly 100 million yuan in an angel-plus round led by Ruifeng Capital, with strategic investment from Luxshare Precision's family office. The company, founded in Guangzhou in January 2024, builds multi-agent systems for factory floors using an industrial semantic engine that converts complex manufacturing logic into AI-executable instructions. At Luxshare Precision, a single AI scheduling agent is performing the work of six human employees. SOP automation rates have reached 80 percent. New-hire onboarding time dropped from 1.5 days to 2 hours. Production anomaly response speed improved eightfold. The core team comes from Microsoft China, Alibaba, Tencent, and IBM — and the company claims millisecond-level response times via native OPC UA protocol support.

Huawei's HiFloat4 training format outperformed the Western-developed MXFP4 standard in head-to-head tests on Ascend NPU chips. Huawei researchers trained three model families — OpenPangu-1B, Llama3-8B, and Qwen3-MoE-30B — and found HiFloat4 achieves roughly 1 percent relative loss error versus BF16 baseline, compared to 1.5 percent for MXFP4. The gap widens as models scale. HiFloat4 requires fewer stabilization tricks to reach that performance level. The development is a direct consequence of US export controls — Chinese labs are now building proprietary low-precision formats explicitly optimized for their own hardware, reducing dependence on CUDA-ecosystem standards.

DeepSeek's V4 models are natively compatible with Huawei Ascend NPUs — a geopolitical signal as significant as the benchmark numbers. Huawei Ascend chips currently represent about one-quarter the supply of H100s in China. DeepSeek's Compressed Sparse Attention architecture requires only 27 percent of standard FLOPs at one million token context — making it viable on constrained hardware. The combination of MIT licensing, Ascend compatibility, and sub-two-dollar-per-million-token pricing creates a credible path for Chinese enterprises to run frontier-class inference entirely on domestic silicon, without touching NVIDIA or CUDA.

The UAE announced plans to run 50 percent of all government operations on autonomous AI agents within two years. Sheikh Mohammed bin Rashid Al Maktoum made the announcement publicly, framing AI as an executive partner that improves services and speeds decisions. Every federal employee will be trained to work with AI systems. If executed, the UAE would become the first government to operate at this scale of autonomous AI deployment. The announcement has received almost no coverage in Western financial media — despite its direct implications for government technology procurement, sovereign AI infrastructure investment, and the regulatory models that other Gulf states are likely to follow.

Apple researchers published ParaRNN at ICLR 2026 — a framework for parallelized training of nonlinear recurrent neural networks that achieves a 665-times speedup over the traditional sequential approach. The paper enables training of 7 billion parameter classical RNNs that match transformer performance on language modeling benchmarks. RNNs require constant compute per token at inference time — unlike transformers, which scale quadratically with sequence length. If Apple deploys this architecture in on-device inference, it could dramatically reduce the compute cost of running capable models on iPhones and MacBooks — without requiring cloud connectivity. The codebase is open-source.

Looking ahead to next week. Anthropic's IPO trajectory is the most consequential thing to watch. The company's revenue has tripled to 30 billion dollars this year, but Axios reported Thursday that compounding problems — model quality complaints, pricing missteps, capacity constraints, and a security incident — are giving OpenAI an opening to poach enterprise customers. A potential valuation near 800 billion dollars is on the table. Any signal from Anthropic on IPO timing, or any new enterprise defection to OpenAI's Codex platform, will move the narrative significantly.

The Colorado AI regulation confrontation escalates next week. The DOJ filed its intervention Friday on behalf of xAI. Colorado's Senate Bill 205 is scheduled to take effect June 30. The state attorney general's office declined to comment Friday. Watch for Colorado's formal response and any signal from other states — Texas, Illinois, and New York all have pending AI legislation — on whether they view the DOJ intervention as a deterrent or an invitation to accelerate their own frameworks before a federal preemption bill passes.

Samsung and SK hynix both hit fresh highs in Seoul this week on AI chip demand. SK hynix earnings surged on HBM memory orders tied to Nvidia's Blackwell and Vera Rubin ramp. Next week's Asian market open on Monday will test whether that momentum holds after DeepSeek V4's efficiency claims — models that require dramatically less KV cache memory could soften HBM demand projections if enterprise buyers start modeling lower memory requirements into their infrastructure plans.

Watch for enterprise reaction to DeepSeek V4's pricing in the coming week. At $1.74 per million input tokens for a frontier-class open-weight model, V4 Pro creates direct pressure on every proprietary API vendor. OpenAI, Anthropic, and Google all charge between five and fifteen times more for comparable capability. The question for next week is whether large enterprise buyers — particularly in financial services and legal, where the BankerToolBench results showed no model is client-ready — use V4's pricing as leverage in contract renegotiations with their current AI vendors.

Intel's earnings commentary this Friday flagged that design commitments for its 14A process node are expected to emerge in the second half of 2026. That timeline makes the next two quarters critical for Intel Foundry's commercial credibility. Watch for any customer announcements — particularly from fabless AI chip designers looking for a TSMC alternative — that would validate Lip-Bu Tan's claim that the CPU is reinserting itself as the foundation of the AI era. Any slip in that timeline would be a significant negative signal for Intel's recovery thesis.

This podcast has a daily production cost. If you enjoy it, support it — the link is on the podcast page. Thank you.