星火 SparkCN

痛点分析发布于 2026/05/26

痛点为 AI 基于上游原始证据的初步提炼；未包含额外中国市场检索。

痛点

这篇 newsletter 标题为“Some ideas for what comes next, May 2026”，来自 Interconnects AI，讨论 AI 领域下一步发展方向。用户（可能是 AI 从业者或研究者）在阅读这类内容时，需要从大量观点中筛选出与自己工作相关的趋势，但原文只提供了摘要和标题，缺乏结构化分析或可操作的建议。这导致用户需要自行消化、整理和判断哪些想法值得跟进，容易造成信息过载和决策延迟。由于没有评论或回复数据，无法确认用户的具体反馈，但这类内容通常面向需要保持前沿认知的群体，其痛点在于从碎片化观点中提炼有效信息的过程耗时且低效。

Article

Newsletter article

As the years of AI progress go by, it’s been accompanied by a slowly rising tide of consequence. Models are getting more capable, how we work is changing quickly, economics of AI are becoming real, just as real-world risks come to the forefront. 2026 is the first year where I don’t think there’ll be any breaks from this. The hard part to prepare for is that there’s a good chance things just continue to ratchet up from here – more disruption, more surprises, more stakes. On my end, there’s been a growing list of topics that are very fateful to how I see the current state of AI, but I haven’t even gotten to write about them (at least not from all the angles I want to)! All of these are closely related to the implications of different models reaching new capability levels and how I use that to infer what may come next. Share 1. Open models haven’t had their true agent moment like Opus 4.5 The time gap between open and closed models is very often discussed, but the reality is that we have a nice time-gating that’s independent of debatable benchmarks – if open-weight models do or do not become super useful in agentic harnesses. The Opus 4.5 in Claude Code moment of December 2025 was so loud and obvious, that if open models hit this performance level for price points as low as $5/month, there will be an explosion in usage. Right now we are about 5-6 months in with no equivalent open model. I suspect the robustness of the best closed frontier models that I write about could make this moment take a good amount longer, say closer to 12+ months. In this time, Claude Code and Codex may seem like different categories of products. In the standard flurry of new, state-of-the-art open models from a variety of labs, benchmarks will definitely keep climbing, but the open-closed gap should become more interpretable as real-world use becomes the real litmus test. 2. Gemini still doesn’t have a meaningful competitor for Claude Code and Codex The best exclamation point I can offer to reinforce my prediction that open models are further behind than the benchmarks claim is that even the mighty Google doesn’t have a clear competitor for Claude Code and Codex. I’m sure the Gemini team is pushing very hard on this. I still need to do a lot more testing on Gemini 3.5 Flash, but reading reviews makes it clear that it’s not a substitute for how I’m working today. It’s maybe not the Gemini team explicitly specializing for Google’s existing products (search, YouTube, etc.), but the model seems to suit them. If Google doesn’t have a powerful tool here soon, I don’t expect the open model labs to either. The open models are going to be used more for automated, enterprise agents and low-cost domains, rather than being the driving tool of modern knowledge work. This will feed directly into the economic engine of funding future models, where the agents like Claude Code and Codex are the current best path to massive AI revenue growth. I discussed how the current environment is quietly driving labs in China to specialize on AI Proem with Grace Shao and this is central to my expectations of open models specializing over the next few years instead of competing with OpenAI, Anthropic, and Google. Interconnects AI is a reader-supported publication. Consider becoming a subscriber. 3. I don’t expect an open-weights Mythos this year While I don’t think Mythos is a general “god model” that will crush the competition in every domain, I do think it’s a remarkable technical achievement in software engineering and cybersecurity. Mythos is obviously a watershed moment for those fields. Having spoken to most of the Chinese labs – particularly those with the most prominent, large, open MoE models like Kimi, Z.ai, DeepSeek, and Qwen – I think they’re heavily resource limited and don’t have an immediate path to scaling up training processes like the big labs in the U.S. For the labs which are more corporate, which comes with more resources, such as Alibaba and Bytedance, they also have more conservative stances on safety and security. Mythos is a bellwether of the massive acceleration in training and research compute available to the largest American companies. Epoch AI recently had a nice piece on the compute available to various labs (~Google 25%, Meta 11%, OpenAI 11%, Anthropic 6%). All of these numbers are vastly higher than any Chinese lab. 4. American open models are slowly gaining steam Nvidia with Nemotron, Google with Gemma, Arcee AI and others are slowly stabilizing the open model ecosystem in the U.S. There’s a lot that’s hard to measure here, especially in the rise of local agents like OpenClaw and Hermes, but there are adoption numbers of American models that we haven’t seen since Llama 3. Gemma 4’s models are all tying or outperforming the equivalently sized Qwen 3.5/3.6 models — where Qwen has for years now been the default open model at these sizes. These Qwen 3.5/3.6 models have been tricky to get working in a lot of post-training research, partially due to architecture/tooling and partially likely due to modeling (i.e. the model is not easy to finetune for some training decision). I’ve heard few complaints about Gemma, but it also could be because Gemma is not yet the researcher default. There's a simple reality that we've seen recently with models like GPT-OSS, Nemotron 3, and now Gemma 4, that if a model is in the right range of benchmarks and released by an American lab with a truly permissive license, it'll get a large amount of adoption (in this cycle, recall that Gemma 4 adopted the Apache 2.0 License, changing from one with use-case restrictions on earlier Gemmas). This early phase of American growth in open models is establishing key brands directly with developers. The consensus is that more neolabs like Reflection and Thinking Machines are likely to participate in this space, but being too patient will lose the time when new agentic workflows and enterprise relationships are built. 5. Anthropic and OpenAI are just getting up to speed in model iterations I expect the rest of this year to be a ruthless competition between these two flagship companies. I’m at an interesting balance where I think GPT 5.5 is a bit smarter of a model and I love the Codex App, so I’m structuring much of my work to be possible there. At the same time, for a lot of writing-related and broader surface area tasks I really still love Claude. These models are rapidly changing how we work, I run Codex from my phone while doing other things, am setting up automated open model analysis jobs on the back of agents, and expect to be able to scale the research side of Interconnects widely. AI is beginning to drive companies to the two extremes in the scaling era. The biggest companies will be way bigger than ever, using resources and mass talent to have sustained progress at the frontier of raw AI capabilities. On the other side, tiny businesses like Interconnects thrive by using agents to refine, present, and sell niche expertise. The mass social job displacement that’ll come is going to reduce employability for various knowledge workers that don’t fit into either of these extremes for the raw technical side (big or small companies), while sustaining and maybe even amplifying careers that interface directly with humans (e.g. doctors) or other power structures with means to sustain themselves (law/government). 6. More existing power structures will assert themselves on AI Just in the last few days while writing this, we had the Pope release an over 40,000 word document on where AI is going and China expand personnel movement restrictions on top AI researchers across industry. At the same time, the U.S. has designated Anthropic a supply chain risk and continues to use its models for national security . The list of news like this is only going to grow. Existing power structures are realizing there’s a finite time window for them to exert themselves in the AI dynamic — an intuition that could be mapped to influence going down as AI models get more powerful. This intuition is potentially dangerous, as it sets up meaningful conflict in who controls the technology (as I discussed with Dean Ball after the Anthropic-DoW spat). Next: Where technical becomes social These largely technical and power trends accelerating are going to put more pressure on the social and political anti-AI sentiments within the U.S. This is currently the most obvious barrier to continued AI development and beneficial diffusion. Reflecting on this, many people in the tech discourse get too focused on the details, where yes a lot of data-center-detractors are making genuinely wrong factual claims in defense of their position. The real position that a large swath of Americans has is that they have a voice in saying no to the current trend — by not granting permission to build data centers. This is a voice that they haven’t been granted by the tech industry that changed the face of the global economy and power structures in the last few decades. This is setting us up for a challenging year ahead for the industry. The labs are aggregating and concentrating talent to peak levels. There are few neutral messengers to communicate the reality of AI to the public. The frontier labs leadership is largely gearing up to IPO and stay ahead in the capabilities race. With the status quo, there are few actions to unwind this path toward social conflict . It takes individuals in the AI ecosystem to zag and go against the groupthink of needing to make your wealth today, of needing to be at a lab to do impactful work, and so on. I’m personally continuing to bet on this, by trying to make a vibrant and diverse open model ecosystem supported by clear, unbiased information. If you agree with this and have been watching from the sidelines, it’s a good time to get involved, before the situation spirals into something uncontrollable.

§ Dossier

Feed context

Feed title: Interconnects AI
Feed URL: https://www.interconnects.ai/feed
Author: Nathan Lambert
Published: 2026/05/26
Enclosure: https://substackcdn.com/image/fetch/$s_!-711!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F723ba9cb-c351-4b89-8860-2ac4eda7f335_2288x1584.png

源数据· Raw Archive

source: Newsletter
upstream_source: newsletter_rss
upstream_item_id: interconnects:582196977da5e72aaff8c93b
daily_ranking_item_id: 6bf3b3f0-97a3-44e4-bb33-c7f4abde9947
rank_date: 2026-05-27
rank: 3
name: Some ideas for what comes next, May 2026
tagline: Interconnects AI
description: Gemini Flash 3.5, Mythos, open-closed balance, America's open-source surge, emerging power struggles and more.
votes_count: 0
comments_count: 0
created_at_on_source: 2026-05-26T15:39:02.000Z
source_url: https://www.interconnects.ai/p/some-ideas-for-what-comes-next-may
website_url: https://www.interconnects.ai

topics

Interconnects AI

media / source-specific data

{
  "newsletter_rss": {
    "author": "Nathan Lambert",
    "feed_id": "interconnects",
    "feed_url": "https://www.interconnects.ai/feed",
    "categories": [],
    "feed_title": "Interconnects AI",
    "published_at": "2026-05-26T15:39:02.000Z"
  }
}

raw_payload

{
  "link": "https://www.interconnects.ai/p/some-ideas-for-what-comes-next-may",
  "title": "Some ideas for what comes next, May 2026",
  "author": "Nathan Lambert",
  "feed_id": "interconnects",
  "entry_id": "https://www.interconnects.ai/p/some-ideas-for-what-comes-next-may",
  "feed_url": "https://www.interconnects.ai/feed",
  "categories": [],
  "feed_title": "Interconnects AI",
  "fetched_at": "2026-05-26T22:02:28.634Z",
  "raw_excerpt": "Gemini Flash 3.5, Mythos, open-closed balance, America's open-source surge, emerging power struggles and more.",
  "summary_raw": "Gemini Flash 3.5, Mythos, open-closed balance, America's open-source surge, emerging power struggles and more.",
  "published_at": "2026-05-26T15:39:02.000Z",
  "feed_site_url": "https://www.interconnects.ai",
  "content_excerpt": "Gemini Flash 3.5, Mythos, open-closed balance, America's open-source surge, emerging power struggles and more.",
  "feed_description": "The cutting edge of AI, from inside the frontier AI labs, minus the hype. The border between high-level and technical thinking. Read by leading engineers, researchers, and investors.",
  "snapshot_version": "newsletter_rss_entry_v1",
  "content_raw_excerpt": "As the years of AI progress go by, it&#8217;s been accompanied by a slowly rising tide of consequence. Models are getting more capable, how we work is changing quickly, economics of AI are becoming real, just as real-world risks come to the forefront. 2026 is the first year where I don&#8217;t think there&#8217;ll be any breaks from this. The hard part to prepare for is that there&#8217;s a good chance things just continue to ratchet up from here &#8211; more disruption, more surprises, more stakes. On my end, there&#8217;s been a growing list of topics that are very fateful to how I see the current state of AI, but I haven&#8217;t even gotten to write about them (at least not from all the angles I want to)! All of these are closely related to the implications of different models reaching "
}

source_raw_snapshot

{
  "id": "e6085e3a-b5e8-416c-aaf8-e364030d9227",
  "daily_ranking_item_id": "6bf3b3f0-97a3-44e4-bb33-c7f4abde9947",
  "source": "newsletter_rss",
  "external_id": "interconnects:582196977da5e72aaff8c93b",
  "feed_id": "interconnects",
  "feed_url": "https://www.interconnects.ai/feed",
  "fetched_at": "2026-05-26T22:02:28.634Z",
  "feed_raw": {
    "rss": {
      "channel": {
        "item": [
          {
            "guid": {
              "#text": "https://www.interconnects.ai/p/some-ideas-for-what-comes-next-may",
              "@_isPermaLink": "false"
            },
            "link": "https://www.interconnects.ai/p/some-ideas-for-what-comes-next-may",
            "title": "Some ideas for what comes next, May 2026",
            "pubDate": "Tue, 26 May 2026 15:39:02 GMT",
            "enclosure": {
              "@_url": "https://substackcdn.com/image/fetch/$s_!-711!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F723ba9cb-c351-4b89-8860-2ac4eda7f335_2288x1584.png",
              "@_type": "image/jpeg",
              "@_length": "0"
            },
            "dc:creator": "Nathan Lambert",
            "description": "Gemini Flash 3.5, Mythos, open-closed balance, America's open-source surge, emerging power struggles and more.",
            "content:encoded": "<p>As the years of AI progress go by, it&#8217;s been accompanied by a slowly rising tide of consequence. Models are getting more capable, how we work is changing quickly, economics of AI are becoming real, just as real-world risks come to the forefront. 2026 is the first year where I don&#8217;t think there&#8217;ll be any breaks from this. The hard part to prepare for is that there&#8217;s a good chance things just continue to ratchet up from here &#8211; more disruption, more surprises, more stakes.</p><p>On my end, there&#8217;s been a growing list of topics that are very fateful to how I see the current state of AI, but I haven&#8217;t even gotten to write about them (at least not from all the angles I want to)! All of these are closely related to the implications of different models reaching new capability levels and how I use that to infer what may come next.</p><p class=\"button-wrapper\" data-attrs=\"{&quot;url&quot;:&quot;https://www.interconnects.ai/p/some-ideas-for-what-comes-next-may?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}\" data-component-name=\"ButtonCreateButton\"><a class=\"button primary\" href=\"https://www.interconnects.ai/p/some-ideas-for-what-comes-next-may?utm_source=substack&utm_medium=email&utm_content=share&action=share\"><span>Share</span></a></p><h3>1. Open models haven&#8217;t had their true agent moment like Opus 4.5</h3><p>The time gap between open and closed models is very often discussed, but the reality is that we have a nice time-gating that&#8217;s independent of debatable benchmarks &#8211; if open-weight models do or do not become super useful in agentic harnesses. The <a href=\"https://www.interconnects.ai/p/claude-code-hits-different\">Opus 4.5 in Claude Code moment</a> of December 2025 was so loud and obvious, that if open models hit this performance level for price points as low as $5/month, there will be an explosion in usage.</p><p>Right now we are about 5-6 months in with no equivalent open model. I suspect the robustness of the best closed frontier models that I write about could make this moment take a good amount longer, say closer to 12+ months. In this time, Claude Code and Codex may seem like different categories of products. In the standard flurry of new, state-of-the-art open models from a variety of labs, benchmarks will definitely keep climbing, but the open-closed gap should become more interpretable as real-world use becomes the real litmus test.</p><h3>2. Gemini still doesn&#8217;t have a meaningful competitor for Claude Code and Codex</h3><p>The best exclamation point I can offer to reinforce my prediction that open models are further behind than the benchmarks claim is that even the mighty Google doesn&#8217;t have a clear competitor for Claude Code and Codex. I&#8217;m sure the Gemini team is pushing very hard on this.</p><p>I still need to do a lot more testing on Gemini 3.5 Flash, but reading reviews makes it clear that it&#8217;s not a substitute for how I&#8217;m working today. It&#8217;s maybe not the Gemini team explicitly specializing for Google&#8217;s existing products (search, YouTube, etc.), but the model seems to suit them. If Google doesn&#8217;t have a powerful tool here soon, I don&#8217;t expect the open model labs to either. The open models are going to be used more for automated, enterprise agents and low-cost domains, rather than being the driving tool of modern knowledge work. This will feed directly into the economic engine of funding future models, where the agents like Claude Code and Codex are the current best path to massive AI revenue growth.</p><p><em>I discussed how the current environment is quietly driving labs in China to specialize on <a href=\"https://aiproem.substack.com/p/nathan-lambert-reflects-on-chinas\">AI Proem</a> with Grace Shao and this is central to my <a href=\"https://www.interconnects.ai/p/the-next-phase-of-open-models\">expectations of open models specializing</a> over the next few years instead of competing with OpenAI, Anthropic, and Google.</em></p><div class=\"subscription-widget-wrap-editor\" data-attrs=\"{&quot;url&quot;:&quot;https://www.interconnects.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}\" data-component-name=\"SubscribeWidgetToDOM\"><div class=\"subscription-widget show-subscribe\"><div class=\"preamble\"><p class=\"cta-caption\">Interconnects AI is a reader-supported publication. Consider becoming a subscriber.</p></div><form class=\"subscription-widget-subscribe\"><input type=\"email\" class=\"email-input\" name=\"email\" placeholder=\"Type your email&#8230;\" tabindex=\"-1\"><input type=\"submit\" class=\"button primary\" value=\"Subscribe\"><div class=\"fake-input-wrapper\"><div class=\"fake-input\"></div><div class=\"fake-button\"></div></div></form></div></div><h3>3. I don&#8217;t expect an open-weights Mythos this year</h3><p>While I don&#8217;t think Mythos is a general &#8220;god model&#8221; that will crush the competition in every domain, I do think it&#8217;s a remarkable technical achievement in software engineering and cybersecurity. Mythos is obviously a watershed moment for those fields. Having spoken to most of the Chinese labs &#8211; particularly those with the most prominent, large, open MoE models like Kimi, Z.ai, DeepSeek, and Qwen &#8211; I think they&#8217;re heavily resource limited and don&#8217;t have an immediate path to scaling up training processes like the big labs in the U.S. For the labs which are more corporate, which comes with more resources, such as Alibaba and Bytedance, they also have more conservative stances on safety and security.<br><br>Mythos is a bellwether of the massive acceleration in training and research compute available to the largest American companies.</p><p><em>Epoch AI recently had a nice <a href=\"https://epoch.ai/gradient-updates/frontier-labs-dont-use-most-ai-compute\">piece</a> on the compute available to various labs (~Google 25%, Meta 11%, OpenAI 11%, Anthropic 6%). All of these numbers are vastly higher than any Chinese lab.</em></p><h3>4. American open models are slowly gaining steam</h3><p>Nvidia with Nemotron, Google with Gemma, Arcee AI and others are slowly stabilizing the open model ecosystem in the U.S. There&#8217;s a lot that&#8217;s hard to measure here, especially in the rise of local agents like OpenClaw and Hermes, but there are adoption numbers of American models that we haven&#8217;t seen since Llama 3.<br><br>Gemma 4&#8217;s models are all tying or outperforming the equivalently sized Qwen 3.5/3.6 models &#8212; where Qwen has for years now been the default open model at these sizes. These Qwen 3.5/3.6 models have been tricky to get working in a lot of post-training research, partially due to architecture/tooling and partially likely due to modeling (i.e. the model is not easy to finetune for some training decision). I&#8217;ve heard few complaints about Gemma, but it also could be because Gemma is not yet the <em>researcher</em> default.</p><div class=\"captioned-image-container\"><figure><a class=\"image-link image2 is-viewable-img\" target=\"_blank\" href=\"https://substackcdn.com/image/fetch/$s_!-711!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F723ba9cb-c351-4b89-8860-2ac4eda7f335_2288x1584.png\" data-component-name=\"Image2ToDOM\"><div class=\"image2-inset\"><picture><source type=\"image/webp\" srcset=\"https://substackcdn.com/image/fetch/$s_!-711!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F723ba9cb-c351-4b89-8860-2ac4eda7f335_2288x1584.png 424w, https://substackcdn.com/image/fetch/$s_!-711!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F723ba9cb-c351-4b89-8860-2ac4eda7f335_2288x1584.png 848w, https://substackcdn.com/image/fetch/$s_!-711!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F723ba9cb-c351-4b89-8860-2ac4eda7f335_2288x1584.png 1272w, https://substackcdn.com/image/fetch/$s_!-711!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F723ba9cb-c351-4b89-8860-2ac4eda7f335_2288x1584.png 1456w\" sizes=\"100vw\"><img src=\"https://substackcdn.com/image/fetch/$s_!-711!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F723ba9cb-c351-4b89-8860-2ac4eda7f335_2288x1584.png\" width=\"1456\" height=\"1008\" data-attrs=\"{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/723ba9cb-c351-4b89-8860-2ac4eda7f335_2288x1584.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1008,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:207124,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.interconnects.ai/i/199119723?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F723ba9cb-c351-4b89-8860-2ac4eda7f335_2288x1584.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}\" class=\"sizing-normal\" alt=\"\" srcset=\"https://substackcdn.com/image/fetch/$s_!-711!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F723ba9cb-c351-4b89-8860-2ac4eda7f335_2288x1584.png 424w, https://substackcdn.com/image/fetch/$s_!-711!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F723ba9cb-c351-4b89-8860-2ac4eda7f335_2288x1584.png 848w, https://substackcdn.com/image/fetch/$s_!-711!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F723ba9cb-c351-4b89-8860-2ac4eda7f335_2288x1584.png 1272w, https://substackcdn.com/image/fetch/$s_!-711!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F723ba9cb-c351-4b89-8860-2ac4eda7f335_2288x1584.png 1456w\" sizes=\"100vw\" loading=\"lazy\"></picture><div class=\"image-link-expand\"><div class=\"pencraft pc-display-flex pc-gap-8 pc-reset\"><button tabindex=\"0\" type=\"button\" class=\"pencraft pc-reset pencraft icon-container restack-image\"><svg role=\"img\" width=\"20\" height=\"20\" viewBox=\"0 0 20 20\" fill=\"none\" stroke-width=\"1.5\" stroke=\"var(--color-fg-primary)\" stroke-linecap=\"round\" stroke-linejoin=\"round\" xmlns=\"http://www.w3.org/2000/svg\"><g><title></title><path d=\"M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882\"></path></g></svg></button><button tabindex=\"0\" type=\"button\" class=\"pencraft pc-reset pencraft icon-container view-image\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"20\" height=\"20\" viewBox=\"0 0 24 24\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"2\" stroke-linecap=\"round\" stroke-linejoin=\"round\" class=\"lucide lucide-maximize2 lucide-maximize-2\"><polyline points=\"15 3 21 3 21 9\"></polyline><polyline points=\"9 21 3 21 3 15\"></polyline><line x1=\"21\" x2=\"14\" y1=\"3\" y2=\"10\"></line><line x1=\"3\" x2=\"10\" y1=\"21\" y2=\"14\"></line></svg></button></div></div></div></a></figure></div><p>There's a simple reality that we've seen recently with models like GPT-OSS, Nemotron 3, and now Gemma 4, that if a model is in the right range of benchmarks and released by an American lab with a truly permissive license, it'll get a large amount of adoption (in this cycle, recall that Gemma 4 adopted the Apache 2.0 License, changing from one with use-case restrictions on earlier Gemmas). This early phase of American growth in open models is establishing key brands directly with developers. The consensus is that more neolabs like Reflection and Thinking Machines are likely to participate in this space, but being too patient will lose the time when new agentic workflows and enterprise relationships are built.</p><h3>5. Anthropic and OpenAI are just getting up to speed in model iterations</h3><p>I expect the rest of this year to be a ruthless competition between these two flagship companies. I&#8217;m at an interesting balance where I think GPT 5.5 is a bit smarter of a model and I love the Codex App, so I&#8217;m structuring much of my work to be possible there. At the same time, for a lot of writing-related and broader surface area tasks I really still love Claude. These models are rapidly changing how we work, I run Codex from my phone while doing other things, am setting up automated open model analysis jobs on the back of agents, and expect to be able to scale the research side of Interconnects widely.</p><p>AI is beginning to drive companies to the two extremes in the scaling era. The biggest companies will be way bigger than ever, using resources and mass talent to have sustained progress at the frontier of raw AI capabilities. On the other side, tiny businesses like Interconnects thrive by using agents to refine, present, and sell niche expertise. The mass social job displacement that&#8217;ll come is going to reduce employability for various knowledge workers that don&#8217;t fit into either of these extremes for the raw technical side (big or small companies), while sustaining and maybe even amplifying careers that interface directly with humans (e.g. doctors) or other power structures with means to sustain themselves (law/government).</p><h3>6. More existing power structures will assert themselves on AI</h3><p>Just in the last few days while writing this, we had the Pope release <a href=\"https://www.vatican.va/content/leo-xiv/en/encyclicals/documents/20260515-magnifica-humanitas.html\">an over 40,000 word document</a> on where AI is going<em><strong> </strong></em>and <a href=\"https://www.bloomberg.com/news/articles/2026-05-26/china-expands-travel-curbs-to-top-ai-talent-at-private-firms\">China expand personnel movement restrictions</a> on top AI researchers across industry. At the same time, the U.S. has <a href=\"https://www.axios.com/2026/04/19/nsa-anthropic-mythos-pentagon\">designated Anthropic a supply chain risk and continues to use its models for national security</a>. The list of news like this is only going to grow. Existing power structures are realizing there&#8217;s a finite time window for them to exert themselves in the AI dynamic &#8212; an intuition that could be mapped to influence going down as AI models get more powerful. This intuition is potentially dangerous, as it sets up meaningful conflict in who controls the technology (as I <a href=\"https://www.interconnects.ai/p/how-anthropic-vs-dow-impacts-open\">discussed</a> with Dean Ball after the Anthropic-DoW spat).</p><div><hr></div><h3>Next: Where technical becomes social</h3><p>These largely technical and <em>power</em> trends accelerating are going to put more pressure on the social and political anti-AI sentiments within the U.S. This is currently the most obvious barrier to continued AI development and beneficial diffusion. Reflecting on this, many people in the tech discourse get too focused on the details, where yes a lot of data-center-detractors are making genuinely wrong factual claims in defense of their position. </p><p>The real position that a large swath of Americans has is that they have a voice in saying no to the current trend &#8212; by not granting permission to build data centers. This is a voice that they haven&#8217;t been granted by the tech industry that changed the face of the global economy and power structures in the last few decades. </p><p>This is setting us up for a challenging year ahead for the industry. The labs are aggregating and concentrating talent to peak levels. There are few neutral messengers to communicate the reality of AI to the public. The frontier labs leadership is largely gearing up to IPO and stay ahead in the capabilities race. With the status quo, there are few actions to unwind this <a href=\"https://jasmi.news/p/warning-shots\">path toward social conflict</a>. </p><p>It takes individuals in the AI ecosystem to zag and go against the groupthink of needing to make your wealth today, of needing to be at a lab to do impactful work, and so on. I&#8217;m personally continuing to bet on this, by trying to make a vibrant and diverse open model ecosystem supported by clear, unbiased information. If you agree with this and have been watching from the sidelines, it&#8217;s a good time to get involved, before the situation spirals into something uncontrollable.</p>"
          },
          {
            "guid": {
              "#text": "https://www.interconnects.ai/p/latest-open-artifacts-21-open-model",
              "@_isPermaLink": "false"
            },
            "link": "https://www.interconnects.ai/p/latest-open-artifacts-21-open-model",
            "title": "Latest open artifacts (#21): Open model bonanza! Gemma 4, DeepSeek V4, Kimi K2.6, MiMo 2.5, GLM-5.1 & others. On CAISI's V4 assessment.",
            "pubDate": "Sat, 16 May 2026 17:00:11 GMT",
            "enclosure": {
              "@_url": "https://substackcdn.com/image/fetch/$s_!S79s!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff60acac9-5993-474e-84f9-8805792bddef_1024x576.jpeg",
              "@_type": "image/jpeg",
              "@_length": "0"
            },
            "dc:creator": "Florian Brand",
            "description": "An eventful month with one flagship release after another",
            "content:encoded": "<p>This month was packed, with all open frontier labs, including DeepSeek, releasing new models. The latter prompted an evaluation by the <a href=\"https://www.nist.gov/news-events/news/2026/05/caisi-evaluation-deepseek-v4-pro\">Center for AI Standards and Innovation (CAISI)</a>, which has evaluated open models and their risks in the past. Their result is that open models lag behind the American frontier, with the gap becoming wider over time:</p><div class=\"captioned-image-container\"><figure><a class=\"image-link image2 is-viewable-img\" target=\"_blank\" href=\"https://substackcdn.com/image/fetch/$s_!4DPW!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d954e79-7538-48b7-bd3c-c6cd21421329_2800x1856.png\" data-component-name=\"Image2ToDOM\"><div class=\"image2-inset\"><picture><source type=\"image/webp\" srcset=\"https://substackcdn.com/image/fetch/$s_!4DPW!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d954e79-7538-48b7-bd3c-c6cd21421329_2800x1856.png 424w, https://substackcdn.com/image/fetch/$s_!4DPW!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d954e79-7538-48b7-bd3c-c6cd21421329_2800x1856.png 848w, https://substackcdn.com/image/fetch/$s_!4DPW!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d954e79-7538-48b7-bd3c-c6cd21421329_2800x1856.png 1272w, https://substackcdn.com/image/fetch/$s_!4DPW!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d954e79-7538-48b7-bd3c-c6cd21421329_2800x1856.png 1456w\" sizes=\"100vw\"><img src=\"https://substackcdn.com/image/fetch/$s_!4DPW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d954e79-7538-48b7-bd3c-c6cd21421329_2800x1856.png\" width=\"1456\" height=\"965\" data-attrs=\"{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9d954e79-7538-48b7-bd3c-c6cd21421329_2800x1856.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:965,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Comparison of aggregate capabilities over time of the most capable publicly released U.S. and PRC models according to a suite of benchmarks covering five domains.&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}\" class=\"sizing-normal\" alt=\"Comparison of aggregate capabilities over time of the most capable publicly released U.S. and PRC models according to a suite of benchmarks covering five domains.\" title=\"Comparison of aggregate capabilities over time of the most capable publicly released U.S. and PRC models according to a suite of benchmarks covering five domains.\" srcset=\"https://substackcdn.com/image/fetch/$s_!4DPW!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d954e79-7538-48b7-bd3c-c6cd21421329_2800x1856.png 424w, https://substackcdn.com/image/fetch/$s_!4DPW!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d954e79-7538-48b7-bd3c-c6cd21421329_2800x1856.png 848w, https://substackcdn.com/image/fetch/$s_!4DPW!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d954e79-7538-48b7-bd3c-c6cd21421329_2800x1856.png 1272w, https://substackcdn.com/image/fetch/$s_!4DPW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d954e79-7538-48b7-bd3c-c6cd21421329_2800x1856.png 1456w\" sizes=\"100vw\" fetchpriority=\"high\"></picture><div class=\"image-link-expand\"><div class=\"pencraft pc-display-flex pc-gap-8 pc-reset\"><button tabindex=\"0\" type=\"button\" class=\"pencraft pc-reset pencraft icon-container restack-image\"><svg role=\"img\" width=\"20\" height=\"20\" viewBox=\"0 0 20 20\" fill=\"none\" stroke-width=\"1.5\" stroke=\"var(--color-fg-primary)\" stroke-linecap=\"round\" stroke-linejoin=\"round\" xmlns=\"http://www.w3.org/2000/svg\"><g><title></title><path d=\"M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882\"></path></g></svg></button><button tabindex=\"0\" type=\"button\" class=\"pencraft pc-reset pencraft icon-container view-image\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"20\" height=\"20\" viewBox=\"0 0 24 24\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"2\" stroke-linecap=\"round\" stroke-linejoin=\"round\" class=\"lucide lucide-maximize2 lucide-maximize-2\"><polyline points=\"15 3 21 3 21 9\"></polyline><polyline points=\"9 21 3 21 3 15\"></polyline><line x1=\"21\" x2=\"14\" y1=\"3\" y2=\"10\"></line><line x1=\"3\" x2=\"10\" y1=\"21\" y2=\"14\"></line></svg></button></div></div></div></a></figure></div><p>For the report, they calculate an Elo score based on <a href=\"https://en.wikipedia.org/wiki/Item_response_theory\">Item Response Theory</a>, which is commonly used to compare different models, even when they were tested on a different set of benchmarks. For V4, CAISI used nine different benchmarks:</p><div class=\"captioned-image-container\"><figure><a class=\"image-link image2 is-viewable-img\" target=\"_blank\" href=\"https://substackcdn.com/image/fetch/$s_!cVg4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F206a5c61-cc5b-423e-8db6-28f5f209291c_1546x1284.png\" data-component-name=\"Image2ToDOM\"><div class=\"image2-inset\"><picture><source type=\"image/webp\" srcset=\"https://substackcdn.com/image/fetch/$s_!cVg4!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F206a5c61-cc5b-423e-8db6-28f5f209291c_1546x1284.png 424w, https://substackcdn.com/image/fetch/$s_!cVg4!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F206a5c61-cc5b-423e-8db6-28f5f209291c_1546x1284.png 848w, https://substackcdn.com/image/fetch/$s_!cVg4!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F206a5c61-cc5b-423e-8db6-28f5f209291c_1546x1284.png 1272w, https://substackcdn.com/image/fetch/$s_!cVg4!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F206a5c61-cc5b-423e-8db6-28f5f209291c_1546x1284.png 1456w\" sizes=\"100vw\"><img src=\"https://substackcdn.com/image/fetch/$s_!cVg4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F206a5c61-cc5b-423e-8db6-28f5f209291c_1546x1284.png\" width=\"1456\" height=\"1209\" data-attrs=\"{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/206a5c61-cc5b-423e-8db6-28f5f209291c_1546x1284.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1209,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:215444,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.interconnects.ai/i/197676648?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F206a5c61-cc5b-423e-8db6-28f5f209291c_1546x1284.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}\" class=\"sizing-normal\" alt=\"\" srcset=\"https://substackcdn.com/image/fetch/$s_!cVg4!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F206a5c61-cc5b-423e-8db6-28f5f209291c_1546x1284.png 424w, https://substackcdn.com/image/fetch/$s_!cVg4!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F206a5c61-cc5b-423e-8db6-28f5f209291c_1546x1284.png 848w, https://substackcdn.com/image/fetch/$s_!cVg4!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F206a5c61-cc5b-423e-8db6-28f5f209291c_1546x1284.png 1272w, https://substackcdn.com/image/fetch/$s_!cVg4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F206a5c61-cc5b-423e-8db6-28f5f209291c_1546x1284.png 1456w\" sizes=\"100vw\"></picture><div class=\"image-link-expand\"><div class=\"pencraft pc-display-flex pc-gap-8 pc-reset\"><button tabindex=\"0\" type=\"button\" class=\"pencraft pc-reset pencraft icon-container restack-image\"><svg role=\"img\" width=\"20\" height=\"20\" viewBox=\"0 0 20 20\" fill=\"none\" stroke-width=\"1.5\" stroke=\"var(--color-fg-primary)\" stroke-linecap=\"round\" stroke-linejoin=\"round\" xmlns=\"http://www.w3.org/2000/svg\"><g><title></title><path d=\"M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882\"></path></g></svg></button><button tabindex=\"0\" type=\"button\" class=\"pencraft pc-reset pencraft icon-container view-image\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"20\" height=\"20\" viewBox=\"0 0 24 24\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"2\" stroke-linecap=\"round\" stroke-linejoin=\"round\" class=\"lucide lucide-maximize2 lucide-maximize-2\"><polyline points=\"15 3 21 3 21 9\"></polyline><polyline points=\"9 21 3 21 3 15\"></polyline><line x1=\"21\" x2=\"14\" y1=\"3\" y2=\"10\"></line><line x1=\"3\" x2=\"10\" y1=\"21\" y2=\"14\"></line></svg></button></div></div></div></a></figure></div><p>The huge Elo difference is explained by DeepSeek V4s bad score in CTF-Archive-Diamond (which was only run with a subset of the benchmark and extrapolated with IRT for V4), PortBench (a CAISI-private benchmark) and ARC-AGI-2 (with a different scoring method than the public leaderboards). The differences in these benchmark have a huge impact on the overall Elo, which can exacerbate the difference in capabilities. </p><p>When using <a href=\"https://epoch.ai/eci\">Epoch AI&#8217;s ECI</a>, which also uses IRT over a set of different benchmarks, we see that the gap roughly stays between 3-7 months since R1:</p><div class=\"captioned-image-container\"><figure><a class=\"image-link image2 is-viewable-img\" target=\"_blank\" href=\"https://substackcdn.com/image/fetch/$s_!qx4F!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F608dea3a-c43c-4bf0-a5c2-f5b9bb7e63d3_2400x1500.png\" data-component-name=\"Image2ToDOM\"><div class=\"image2-inset\"><picture><source type=\"image/webp\" srcset=\"https://substackcdn.com/image/fetch/$s_!qx4F!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F608dea3a-c43c-4bf0-a5c2-f5b9bb7e63d3_2400x1500.png 424w, https://substackcdn.com/image/fetch/$s_!qx4F!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F608dea3a-c43c-4bf0-a5c2-f5b9bb7e63d3_2400x1500.png 848w, https://substackcdn.com/image/fetch/$s_!qx4F!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F608dea3a-c43c-4bf0-a5c2-f5b9bb7e63d3_2400x1500.png 1272w, https://substackcdn.com/image/fetch/$s_!qx4F!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F608dea3a-c43c-4bf0-a5c2-f5b9bb7e63d3_2400x1500.png 1456w\" sizes=\"100vw\"><img src=\"https://substackcdn.com/image/fetch/$s_!qx4F!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F608dea3a-c43c-4bf0-a5c2-f5b9bb7e63d3_2400x1500.png\" width=\"1456\" height=\"910\" data-attrs=\"{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/608dea3a-c43c-4bf0-a5c2-f5b9bb7e63d3_2400x1500.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:910,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2762591,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.interconnects.ai/i/197676648?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F608dea3a-c43c-4bf0-a5c2-f5b9bb7e63d3_2400x1500.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}\" class=\"sizing-normal\" alt=\"\" srcset=\"https://substackcdn.com/image/fetch/$s_!qx4F!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F608dea3a-c43c-4bf0-a5c2-f5b9bb7e63d3_2400x1500.png 424w, https://substackcdn.com/image/fetch/$s_!qx4F!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F608dea3a-c43c-4bf0-a5c2-f5b9bb7e63d3_2400x1500.png 848w, https://substackcdn.com/image/fetch/$s_!qx4F!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F608dea3a-c43c-4bf0-a5c2-f5b9bb7e63d3_2400x1500.png 1272w, https://substackcdn.com/image/fetch/$s_!qx4F!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F608dea3a-c43c-4bf0-a5c2-f5b9bb7e63d3_2400x1500.png 1456w\" sizes=\"100vw\"></picture><div class=\"image-link-expand\"><div class=\"pencraft pc-display-flex pc-gap-8 pc-reset\"><button tabindex=\"0\" type=\"button\" class=\"pencraft pc-reset pencraft icon-container restack-image\"><svg role=\"img\" width=\"20\" height=\"20\" viewBox=\"0 0 20 20\" fill=\"none\" stroke-width=\"1.5\" stroke=\"var(--color-fg-primary)\" stroke-linecap=\"round\" stroke-linejoin=\"round\" xmlns=\"http://www.w3.org/2000/svg\"><g><title></title><path d=\"M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882\"></path></g></svg></button><button tabindex=\"0\" type=\"button\" class=\"pencraft pc-reset pencraft icon-container view-image\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"20\" height=\"20\" viewBox=\"0 0 24 24\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"2\" stroke-linecap=\"round\" stroke-linejoin=\"round\" class=\"lucide lucide-maximize2 lucide-maximize-2\"><polyline points=\"15 3 21 3 21 9\"></polyline><polyline points=\"9 21 3 21 3 15\"></polyline><line x1=\"21\" x2=\"14\" y1=\"3\" y2=\"10\"></line><line x1=\"3\" x2=\"10\" y1=\"21\" y2=\"14\"></line></svg></button></div></div></div></a><figcaption class=\"image-caption\">The open&lt;&gt;closed gap in ECI (from https://mcnair.center/china/)</figcaption></figure></div><p>However, both CAISI and ECI paint an incomplete picture, as both use standardized (and simple) setups to compare the capabilities of models. To be more concrete: Coding tasks are evaluated using access to bash and a for-loop with a fixed budget of tokens, not with a harness such as Claude Code or OpenCode, which models are trained in! These setups result in benchmarks claiming that porting applications to another language is <a href=\"https://programbench.com/\">currently not possible</a>, while <a href=\"https://github.com/oven-sh/bun/pull/30412\">Bun has been ported from Zig to Rust with 1 million LOC changes</a><a class=\"footnote-anchor\" data-component-name=\"FootnoteAnchorToDOM\" id=\"footnote-anchor-1\" href=\"#footnote-1\" target=\"_self\">1</a>.</p><p>Therefore, we would argue that a frontier comparison of open and closed models would also need to elicit the capabilities of all models better, which means the usage of the preferred harnesses, as well as model-specific prompting.</p><p>This section was written primarily by Florian. An interesting dynamic within Interconnects is that Florian believes more in the proximity of open frontier models to closed alternatives in true performance. Nathan thinks the benchmarks are imperfect as well, but thinks the closed models are ahead by more. We&#8217;re going to continue to unpack this in our future content.</p><p class=\"button-wrapper\" data-attrs=\"{&quot;url&quot;:&quot;https://www.interconnects.ai/p/latest-open-artifacts-21-open-model?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}\" data-component-name=\"ButtonCreateButton\"><a class=\"button primary\" href=\"https://www.interconnects.ai/p/latest-open-artifacts-21-open-model?utm_source=substack&utm_medium=email&utm_content=share&action=share\"><span>Share</span></a></p><h3><strong>Our Picks</strong></h3><ul><li><p><strong><a href=\"https://huggingface.co/XiaomiMiMo/MiMo-V2.5-Pro\">MiMo-V2.5-Pro</a></strong> by <a href=\"https://huggingface.co/XiaomiMiMo\">XiaomiMiMo</a>: Avid Artifacts readers know that Xiaomi has been working on open models for a while; its debut was <a href=\"https://www.interconnects.ai/p/latest-open-artifacts-10-new-deepseek\">exactly one year ago</a>. The progress of its releases is remarkable, with 2.5 Pro (released under Apache 2.0) being neck and neck with other flagship models such as Kimi K2.6 and GLM-5.1 in both benchmarks and <a href=\"https://x.com/Designarena/status/2054776484833952000?s=20\">real-world usage</a>.</p><div class=\"captioned-image-container\"><figure><a class=\"image-link image2 is-viewable-img\" target=\"_blank\" href=\"https://substackcdn.com/image/fetch/$s_!25fp!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F34228788-0da1-4aea-840e-5eae68bbfa17_1200x786.jpeg\" data-component-name=\"Image2ToDOM\"><div class=\"image2-inset\"><picture><source type=\"image/webp\" srcset=\"https://substackcdn.com/image/fetch/$s_!25fp!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F34228788-0da1-4aea-840e-5eae68bbfa17_1200x786.jpeg 424w, https://substackcdn.com/image/fetch/$s_!25fp!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F34228788-0da1-4aea-840e-5eae68bbfa17_1200x786.jpeg 848w, https://substackcdn.com/image/fetch/$s_!25fp!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F34228788-0da1-4aea-840e-5eae68bbfa17_1200x786.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!25fp!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F34228788-0da1-4aea-840e-5eae68bbfa17_1200x786.jpeg 1456w\" sizes=\"100vw\"><img src=\"https://substackcdn.com/image/fetch/$s_!25fp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F34228788-0da1-4aea-840e-5eae68bbfa17_1200x786.jpeg\" width=\"1200\" height=\"786\" data-attrs=\"{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/34228788-0da1-4aea-840e-5eae68bbfa17_1200x786.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:786,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Image&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}\" class=\"sizing-normal\" alt=\"Image\" title=\"Image\" srcset=\"https://substackcdn.com/image/fetch/$s_!25fp!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F34228788-0da1-4aea-840e-5eae68bbfa17_1200x786.jpeg 424w, https://substackcdn.com/image/fetch/$s_!25fp!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F34228788-0da1-4aea-840e-5eae68bbfa17_1200x786.jpeg 848w, https://substackcdn.com/image/fetch/$s_!25fp!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F34228788-0da1-4aea-840e-5eae68bbfa17_1200x786.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!25fp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F34228788-0da1-4aea-840e-5eae68bbfa17_1200x786.jpeg 1456w\" sizes=\"100vw\" loading=\"lazy\"></picture><div class=\"image-link-expand\"><div class=\"pencraft pc-display-flex pc-gap-8 pc-reset\"><button tabindex=\"0\" type=\"button\" class=\"pencraft pc-reset pencraft icon-container restack-image\"><svg role=\"img\" width=\"20\" height=\"20\" viewBox=\"0 0 20 20\" fill=\"none\" stroke-width=\"1.5\" stroke=\"var(--color-fg-primary)\" stroke-linecap=\"round\" stroke-linejoin=\"round\" xmlns=\"http://www.w3.org/2000/svg\"><g><title></title><path d=\"M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882\"></path></g></svg></button><button tabindex=\"0\" type=\"button\" class=\"pencraft pc-reset pencraft icon-container view-image\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"20\" height=\"20\" viewBox=\"0 0 24 24\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"2\" stroke-linecap=\"round\" stroke-linejoin=\"round\" class=\"lucide lucide-maximize2 lucide-maximize-2\"><polyline points=\"15 3 21 3 21 9\"></polyline><polyline points=\"9 21 3 21 3 15\"></polyline><line x1=\"21\" x2=\"14\" y1=\"3\" y2=\"10\"></line><line x1=\"3\" x2=\"10\" y1=\"21\" y2=\"14\"></line></svg></button></div></div></div></a></figure></div></li><li><p><strong><a href=\"https://huggingface.co/google/gemma-4-26B-A4B-it\">gemma-4-26B-A4B-it</a></strong> by <a href=\"https://huggingface.co/google\">google</a> (full Interconnects post <a href=\"https://www.interconnects.ai/p/gemma-4-and-what-makes-an-open-model\">here</a>): The long-awaited update to the Gemma series, featuring multiple sizes: <a href=\"https://huggingface.co/google/gemma-4-E2B-it\">4B</a>, <a href=\"https://huggingface.co/google/gemma-4-E4B-it\">9B</a>, and <a href=\"https://huggingface.co/google/gemma-4-31B-it\">31B</a> dense models, as well as a 26B-A4B MoE. Even more importantly, with Gemma 4, Google has decided to use Apache 2.0 as its license, which removes the uncertainty and legal challenges around interpreting custom licenses.</p></li><li><p><strong><a href=\"https://huggingface.co/moonshotai/Kimi-K2.6\">Kimi-K2.6</a></strong> by <a href=\"https://huggingface.co/moonshotai\">moonshotai</a>: An update to the Kimi series, delivering stronger performance across the board and making it one of the best open models out there yet again. They also focus on long-horizon performance, showing that open models are capable of running over hours to complete tasks or optimize performance. Given the focus of everyone to build <a href=\"https://github.com/karpathy/autoresearch\">autoresearch</a>-like systems, seeing open models catch up is important.</p><div class=\"captioned-image-container\"><figure><a class=\"image-link image2 is-viewable-img\" target=\"_blank\" href=\"https://substackcdn.com/image/fetch/$s_!K_mJ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe08af2e3-9ce3-4ace-bec7-e8aecce5e120_10188x6520.webp\" data-component-name=\"Image2ToDOM\"><div class=\"image2-inset\"><picture><source type=\"image/webp\" srcset=\"https://substackcdn.com/image/fetch/$s_!K_mJ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe08af2e3-9ce3-4ace-bec7-e8aecce5e120_10188x6520.webp 424w, https://substackcdn.com/image/fetch/$s_!K_mJ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe08af2e3-9ce3-4ace-bec7-e8aecce5e120_10188x6520.webp 848w, https://substackcdn.com/image/fetch/$s_!K_mJ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe08af2e3-9ce3-4ace-bec7-e8aecce5e120_10188x6520.webp 1272w, https://substackcdn.com/image/fetch/$s_!K_mJ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe08af2e3-9ce3-4ace-bec7-e8aecce5e120_10188x6520.webp 1456w\" sizes=\"100vw\"><img src=\"https://substackcdn.com/image/fetch/$s_!K_mJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe08af2e3-9ce3-4ace-bec7-e8aecce5e120_10188x6520.webp\" width=\"1456\" height=\"932\" data-attrs=\"{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e08af2e3-9ce3-4ace-bec7-e8aecce5e120_10188x6520.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:932,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;K2.6 Qwen3.5-0.8B Mac inference optimization case&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}\" class=\"sizing-normal\" alt=\"K2.6 Qwen3.5-0.8B Mac inference optimization case\" title=\"K2.6 Qwen3.5-0.8B Mac inference optimization case\" srcset=\"https://substackcdn.com/image/fetch/$s_!K_mJ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe08af2e3-9ce3-4ace-bec7-e8aecce5e120_10188x6520.webp 424w, https://substackcdn.com/image/fetch/$s_!K_mJ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe08af2e3-9ce3-4ace-bec7-e8aecce5e120_10188x6520.webp 848w, https://substackcdn.com/image/fetch/$s_!K_mJ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe08af2e3-9ce3-4ace-bec7-e8aecce5e120_10188x6520.webp 1272w, https://substackcdn.com/image/fetch/$s_!K_mJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe08af2e3-9ce3-4ace-bec7-e8aecce5e120_10188x6520.webp 1456w\" sizes=\"100vw\" loading=\"lazy\"></picture><div class=\"image-link-expand\"><div class=\"pencraft pc-display-flex pc-gap-8 pc-reset\"><button tabindex=\"0\" type=\"button\" class=\"pencraft pc-reset pencraft icon-container restack-image\"><svg role=\"img\" width=\"20\" height=\"20\" viewBox=\"0 0 20 20\" fill=\"none\" stroke-width=\"1.5\" stroke=\"var(--color-fg-primary)\" stroke-linecap=\"round\" stroke-linejoin=\"round\" xmlns=\"http://www.w3.org/2000/svg\"><g><title></title><path d=\"M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882\"></path></g></svg></button><button tabindex=\"0\" type=\"button\" class=\"pencraft pc-reset pencraft icon-container view-image\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"20\" height=\"20\" viewBox=\"0 0 24 24\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"2\" stroke-linecap=\"round\" stroke-linejoin=\"round\" class=\"lucide lucide-maximize2 lucide-maximize-2\"><polyline points=\"15 3 21 3 21 9\"></polyline><polyline points=\"9 21 3 21 3 15\"></polyline><line x1=\"21\" x2=\"14\" y1=\"3\" y2=\"10\"></line><line x1=\"3\" x2=\"10\" y1=\"21\" y2=\"14\"></line></svg></button></div></div></div></a></figure></div></li><li><p><strong><a href=\"https://huggingface.co/poolside/Laguna-XS.2\">Laguna-XS.2</a></strong> by <a href=\"https://huggingface.co/poolside\">poolside</a>: Poolside AI has released its first public coding-focused models, including the open-weight XS.2. Its size (33B-A3B) makes it attractive for local use, with performance on par with other models in that size range. The accompanying <a href=\"https://poolside.ai/blog/laguna-a-deeper-dive\">blog post</a> is worth a read, as is <a href=\"https://poolside.ai/blog/through-the-looking-glass\">the deep dive</a> into reward hacking during coding evaluations.</p></li><li><p><strong><a href=\"https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash\">DeepSeek-V4-Flash</a></strong> by <a href=\"https://huggingface.co/deepseek-ai\">deepseek-ai</a>: DeepSeek has finally released its successor to the V3 series, which it kept updating for months. It comes in two sizes: Pro, which is a 1.6T-A49B MoE, and Flash, a 284B-13B model. Based on others&#8217; experience, the latter model seems to be the real star of the show, as its performance is relatively strong, while Pro seems to underdeliver relative to its size. The <a href=\"https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro/blob/main/DeepSeek_V4.pdf\">tech report</a> goes into great detail, including the architectural changes used to achieve better and cheaper long-context performance.</p></li></ul><h3><strong>Models</strong></h3><h4>General Purpose</h4><ul><li><p><strong><a href=\"https://huggingface.co/Qwen/Qwen3.6-35B-A3B\">Qwen3.6-35B-A3B</a></strong> by <a href=\"https://huggingface.co/Qwen\">Qwen</a>: An update to the Qwen 3.5 series targeting one of the most widely used sizes.</p></li><li><p><strong><a href=\"https://huggingface.co/LiquidAI/LFM2.5-350M\">LFM2.5-350M</a></strong> by <a href=\"https://huggingface.co/LiquidAI\">LiquidAI</a>: With 28T tokens for 350M parameters, this model might be the most overtrained model out there.</p></li><li><p><strong><a href=\"https://huggingface.co/arcee-ai/Trinity-Large-Thinking\">Trinity-Large-Thinking</a></strong> by <a href=\"https://huggingface.co/arcee-ai\">arcee-ai</a>: The reasoning version of Trinity, one of the best Western open models. It has topped the OpenRouter charts for a while and can power agentic applications such as OpenClaw.</p></li><li><p><strong><a href=\"https://huggingface.co/zai-org/GLM-5.1\">GLM-5.1</a></strong> by <a href=\"https://huggingface.co/zai-org\">zai-org</a>: An update to GLM-5, improving scores across the board. The focus for this update is on long-horizon tasks.</p></li></ul>\n      <p>\n          <a href=\"https://www.interconnects.ai/p/latest-open-artifacts-21-open-model\">\n              Read more\n          </a>\n      </p>\n   "
          },
          {
            "guid": {
              "#text": "https://www.interconnects.ai/p/how-open-model-ecosystems-compound",
              "@_isPermaLink": "false"
            },
            "link": "https://www.interconnects.ai/p/how-open-model-ecosystems-compound",
            "title": "How open model ecosystems compound",
            "pubDate": "Tue, 12 May 2026 15:54:47 GMT",
            "enclosure": {
              "@_url": "https://substack-post-media.s3.amazonaws.com/public/images/39c7cc76-02ac-4c38-bbab-7d89dca53d0b_3182x1790.png",
              "@_type": "image/jpeg",
              "@_length": "0"
            },
            "dc:creator": "Nathan Lambert",
            "description": "Further reflections on China's high-participation, open-first AI ecosystem.",
            "content:encoded": "<h5>Note: Voice-overs for paywalled posts are available for paid subscribes in podcast apps if you click on settings on Interconnects, then manage your description. Thanks for listening!</h5><p>Most of the compute to build a leading frontier model comes from R&amp;D costs, rather than the compute to train the final, big model end-to-end. In an ecosystem like China, where all the leading players are open, this creates a potential meaningful advantage in cost structures that&#8217;ll let labs keep building longer than outside observers would expect.</p><p>There are two recent pieces of research, one from Ai2 <a href=\"https://arxiv.org/abs/2605.01158\">documenting the development of Olmo 3</a> and one from Epoch AI <a href=\"https://epoch.ai/gradient-updates/r-and-d-vs-training-compute\">studying public documentation of costs from various frontier labs</a>, that put the estimate of compute spent on R&amp;D rather than the final model at about 80% (with meaningful error bars).</p><div class=\"captioned-image-container\"><figure><a class=\"image-link image2 is-viewable-img\" target=\"_blank\" href=\"https://substackcdn.com/image/fetch/$s_!VBXD!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e980106-980c-4092-bbf1-f472323b8da0_1026x1283.png\" data-component-name=\"Image2ToDOM\"><div class=\"image2-inset\"><picture><source type=\"image/webp\" srcset=\"https://substackcdn.com/image/fetch/$s_!VBXD!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e980106-980c-4092-bbf1-f472323b8da0_1026x1283.png 424w, https://substackcdn.com/image/fetch/$s_!VBXD!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e980106-980c-4092-bbf1-f472323b8da0_1026x1283.png 848w, https://substackcdn.com/image/fetch/$s_!VBXD!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e980106-980c-4092-bbf1-f472323b8da0_1026x1283.png 1272w, https://substackcdn.com/image/fetch/$s_!VBXD!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e980106-980c-4092-bbf1-f472323b8da0_1026x1283.png 1456w\" sizes=\"100vw\"><img src=\"https://substackcdn.com/image/fetch/$s_!VBXD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e980106-980c-4092-bbf1-f472323b8da0_1026x1283.png\" width=\"1026\" height=\"1283\" data-attrs=\"{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0e980106-980c-4092-bbf1-f472323b8da0_1026x1283.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1283,&quot;width&quot;:1026,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}\" class=\"sizing-normal\" alt=\"\" title=\"\" srcset=\"https://substackcdn.com/image/fetch/$s_!VBXD!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e980106-980c-4092-bbf1-f472323b8da0_1026x1283.png 424w, https://substackcdn.com/image/fetch/$s_!VBXD!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e980106-980c-4092-bbf1-f472323b8da0_1026x1283.png 848w, https://substackcdn.com/image/fetch/$s_!VBXD!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e980106-980c-4092-bbf1-f472323b8da0_1026x1283.png 1272w, https://substackcdn.com/image/fetch/$s_!VBXD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e980106-980c-4092-bbf1-f472323b8da0_1026x1283.png 1456w\" sizes=\"100vw\" fetchpriority=\"high\"></picture><div class=\"image-link-expand\"><div class=\"pencraft pc-display-flex pc-gap-8 pc-reset\"><button tabindex=\"0\" type=\"button\" class=\"pencraft pc-reset pencraft icon-container restack-image\"><svg role=\"img\" width=\"20\" height=\"20\" viewBox=\"0 0 20 20\" fill=\"none\" stroke-width=\"1.5\" stroke=\"var(--color-fg-primary)\" stroke-linecap=\"round\" stroke-linejoin=\"round\" xmlns=\"http://www.w3.org/2000/svg\"><g><title></title><path d=\"M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882\"></path></g></svg></button><button tabindex=\"0\" type=\"button\" class=\"pencraft pc-reset pencraft icon-container view-image\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"20\" height=\"20\" viewBox=\"0 0 24 24\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"2\" stroke-linecap=\"round\" stroke-linejoin=\"round\" class=\"lucide lucide-maximize2 lucide-maximize-2\"><polyline points=\"15 3 21 3 21 9\"></polyline><polyline points=\"9 21 3 21 3 15\"></polyline><line x1=\"21\" x2=\"14\" y1=\"3\" y2=\"10\"></line><line x1=\"3\" x2=\"10\" y1=\"21\" y2=\"14\"></line></svg></button></div></div></div></a></figure></div><p>In a world where research and development is most of the compute, the Chinese system is designed around quickly learning from your peers and avoiding double-spending research compute &#8212; or infra effort.  It&#8217;s far from perfect, but it&#8217;s the closest analog to the OSS ecosystem that one can get for building LLMs. The public discussion of AI has always emphasized that the <em>models</em> are expensive in a way that naturally lets passive readers think this is compute just dedicated to the artifact &#8212; <a href=\"https://www.interconnects.ai/p/deepseek-v3-and-the-actual-cost-of\">as we saw with DeepSeek V3</a>.</p><p class=\"button-wrapper\" data-attrs=\"{&quot;url&quot;:&quot;https://www.interconnects.ai/p/how-open-model-ecosystems-compound?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}\" data-component-name=\"ButtonCreateButton\"><a class=\"button primary\" href=\"https://www.interconnects.ai/p/how-open-model-ecosystems-compound?utm_source=substack&utm_medium=email&utm_content=share&action=share\"><span>Share</span></a></p><p>This had me revisiting the core issue of open-source AI, and how it doesn&#8217;t have the feedback loops akin to open-source software (OSS) users back to the creation itself, that creates immense value following <a href=\"https://en.wikipedia.org/wiki/Linus%27s_law\">Linus&#8217;s law</a> of &#8220;given enough eyeballs, all bugs are shallow&#8221;. This self-reinforcement of OSS makes deployment at scale the cheapest possible outcome &#8212; all the users together share the costs of fixing bugs and adding features.</p><p>Within open-source AI, almost all the cost falls on the model developer. At the same time, there are huge benefits to releasing the model openly that do reduce costs, but they only help reduce <em>future</em> development and deployment costs for the creator themselves, but more importantly the ecosystem widely.</p><p>Open AI models, tools, infrastructure, and everything in between are a cost reduction in development, not plug and play cost reduction on apples to apples solutions or products. If someone is going to be just using AI off-the-shelf with minimal iteration or internal development, using open models will almost always be more expensive. Using closed, integrated, hosted solutions achieves low price points by economies of scale across general usage.</p><p>The open-source ecosystem can only try to mirror the OSS-style financial and performance gain in continued performance. The Chinese labs, through incredibly thorough technical reports and intentional knowledge sharing across labs effectively are de-risking ideas for their peer companies to not necessarily need to invest as many resources in.</p><p>For this to work, the current norm where AI companies <em>fork</em> open-source tools, to evolve them into internal-only versions, will likely need to fade out. It&#8217;s too common of a trope for open-source AI companies to have their selling point being better performance via enterprise agreements or internal tools, as the fully open tools that people start with are falling behind in accessibility. A prime example is at-scale RL training of MoE models &#8212; no truly open recipe exists. It&#8217;s unclear if the open-supporting, but partially closed tools like Thinking Machine&#8217;s <a href=\"https://thinkingmachines.ai/tinker/\">Tinker</a> and Prime Intellect&#8217;s <a href=\"https://www.primeintellect.ai/blog/lab\">Lab</a> can be open enough for the advantages of an open ecosystem to sustain themselves. The more open the stack is, and the more information is shared, the more costs are reduced in future iterations.</p><p>The same reasoning that causes companies to fork open-source tools to make internal versions applies to why there isn&#8217;t a shared, single foundation model that everyone builds on. Building the best model today becomes an art of integrating your hardware, data, and infrastructure, while evolving all of them at a relatively high rate that lets you keep up with the frontier of performance. Given that all signs point to LLMs continuing their steady march in performance improvements for years, it seems unlikely to expect this equilibrium to change in the near term. This is exactly why I wrote my post on the <a href=\"https://www.interconnects.ai/p/the-inevitable-need-for-an-open-model\">inevitable need for an open model consortium</a> &#8211; this shared resource is far more efficient and may become the only financially viable way to compete at the future frontier scale with open models.</p><p>It&#8217;s worth noting that, of course, the closed labs also see the investigations of the open frontier model companies and can benefit from them, but with the assumption that the closed labs are <a href=\"https://www.interconnects.ai/p/reading-todays-open-closed-performance\">some months ahead in the development tree</a>, they often naturally stand to benefit less from the shared insights. The stronger the open-source community is, the more cost incentive there is for the various companies to be relatively close together on the same Pareto curve of performance.</p><p>This realization of the difference between <em>development</em> costs, or a process-focused technology, rather than some shared foundation that all the labs build on directly was downstream of a question I got in feedback to <a href=\"https://www.interconnects.ai/p/notes-from-inside-chinas-ai-labs\">my recent China trip summary</a>. The question was: &#8220;Was there any chance of the Chinese ecosystem converging on a single base model to save costs?&#8221; The follow-up to this question was on if any of the open-weight companies in China are using open-source in strategically meaningful ways. There are many more useful questions to ask here, especially when trying to understand the different operational patterns of the ecosystems.</p><h2>China&#8217;s foundation model development model</h2><p>I found the following interview conducted by Bill Gurley with Dan Wang, author of Breakneck, and Patrick McGee, author of Apple in China, (both books I strongly recommend &#8211; must reads) very thought provoking on the biggest differences between technology cultures in the U.S. and China.</p><div id=\"youtube2-XpyqKn_1ZP4\" class=\"youtube-wrap\" data-attrs=\"{&quot;videoId&quot;:&quot;XpyqKn_1ZP4&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}\" data-component-name=\"Youtube2ToDOM\"><div class=\"youtube-inner\"><iframe src=\"https://www.youtube-nocookie.com/embed/XpyqKn_1ZP4?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0\" frameborder=\"0\" loading=\"lazy\" gesture=\"media\" allow=\"autoplay; fullscreen\" allowautoplay=\"true\" allowfullscreen=\"true\" width=\"728\" height=\"409\"></iframe></div></div><p>I get a lot of exposure to these differences at this point in my open-source AI arc. There&#8217;s a deep yearning to influence Western audiences and thinking that has bubbled up out of the Chinese AI ecosystem in the last year. This was obviously a strong pretext for why the <a href=\"https://readsail.com/\">SAIL</a> group got such access in our recent trip &#8211; it&#8217;s not a given that anyone in the AI ecosystem will talk to senior leadership at so many companies.</p>\n      <p>\n          <a href=\"https://www.interconnects.ai/p/how-open-model-ecosystems-compound\">\n              Read more\n          </a>\n      </p>\n   "
          },
          {
            "guid": {
              "#text": "https://www.interconnects.ai/p/notes-from-inside-chinas-ai-labs",
              "@_isPermaLink": "false"
            },
            "link": "https://www.interconnects.ai/p/notes-from-inside-chinas-ai-labs",
            "title": "Notes from inside China's AI labs",
            "pubDate": "Thu, 07 May 2026 15:42:43 GMT",
            "enclosure": {
              "@_url": "https://substack-post-media.s3.amazonaws.com/public/images/2b353f46-1b83-4750-9dc0-72877a402f19_1024x768.jpeg",
              "@_type": "image/jpeg",
              "@_length": "0"
            },
            "dc:creator": "Nathan Lambert",
            "description": "Lessons from my trip to talk to most of the leading AI labs in China.",
            "content:encoded": "<p>Staring out the window on a new, high-speed train from Hangzhou to Shanghai I&#8217;m gifted with views of dramatic ridgelines speckled with wind turbines that are silhouetted against the setting sun. The mountains cast a backdrop to a mix of spanning fields and clustered skyscrapers. I&#8217;m returning from China with great humility. It&#8217;s a very warming, human experience to go somewhere so foreign and be so welcomed. I had the honor of meeting so many people in the AI ecosystem who I knew from afar, and they greeted me with big smiles and cheer, reminding me how global my work and the AI ecosystem is.</p><div class=\"subscription-widget-wrap-editor\" data-attrs=\"{&quot;url&quot;:&quot;https://www.interconnects.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}\" data-component-name=\"SubscribeWidgetToDOM\"><div class=\"subscription-widget show-subscribe\"><div class=\"preamble\"><p class=\"cta-caption\">Interconnects AI is a reader-supported publication. Consider becoming a subscriber.</p></div><form class=\"subscription-widget-subscribe\"><input type=\"email\" class=\"email-input\" name=\"email\" placeholder=\"Type your email&#8230;\" tabindex=\"-1\"><input type=\"submit\" class=\"button primary\" value=\"Subscribe\"><div class=\"fake-input-wrapper\"><div class=\"fake-input\"></div><div class=\"fake-button\"></div></div></form></div></div><h2>The mentality of Chinese researchers</h2><p>The Chinese companies building language models are set up as the perfect fast-followers for the technology, building on long-standing cultural traditions in education and work, along with subtly different approaches to building technology companies. When you look at the outputs, the latest, biggest models enabling agentic workflows, and the ingredients, excellent scientists, large-scale data, and accelerated computing, the Chinese and American labs look largely similar. The lasting differences emerge in how these are organized and conditioned.</p><p>I&#8217;ve long thought that a reason that the Chinese labs are so good at catching up and keeping up with the frontier is that they&#8217;re culturally aligned for this task, but without talking to people directly I felt like it wasn&#8217;t my place to attribute substantial influence to this hunch. Speaking with many wonderful, humble, and open scientists at the leading Chinese labs has crystallized a lot of my beliefs.</p><p>So much of building the best LLMs today comes down to meticulous work across the entire stack, from data to architecture details and RL algorithm implementations. All points of the model can give some improvements, and fitting them in together is a complex process where the work of some brilliant individuals needs to get shelved in favor of the overall model maximizing a multi-objective optimization.</p><p>Where American researchers are obviously also brilliant at solving the individual components, there&#8217;s more of a culture of speaking up for yourself in the U.S. As a scientist, you&#8217;re more successful when you speak up for your work and modern culture is pushing the new path to fame of &#8220;leading AI scientists&#8221;. This results in direct conflict. The Llama organization is heavily rumored to have collapsed under the political weight of these interests embedding themselves in a hierarchical organization. I&#8217;ve heard of other labs saying that it can be needed to pay off a top researcher to get them to stop complaining about their idea not making it in the final model. Whether or not that&#8217;s exactly true, the idea is clear. Ego and desires for career advancement do get in the way of making the best models. A small, directional shift in this sort of culture between the U.S. and China can have a meaningful impact on the final outputs.</p><p>Some of this has to do with who is building the models in China. There&#8217;s an immediate reality at all of the labs that a large proportion of the core contributors are active students. The labs are quite young, and it reminds me of our setup at Ai2, where students are seen as peers and directly integrated in the LLM team. This is incredibly different from the top labs in the US, where the likes of OpenAI, Anthropic, Cursor, etc. simply don&#8217;t offer internships. Other companies like Google nominally have internships related to Gemini, but there&#8217;s a lot of concern about whether your internship will be siloed and away from anything real.</p><p>To summarize how the slight change in culture can improve the ability to build models:</p><ul><li><p>More willingness to do non-flashy work in order to improve the final model,</p></li><li><p>People new to building AI can be free of prior phases of AI hype cycles, allowing them to adapt to the new modern techniques faster (in fact, one of the Chinese scientists I talked to really actively attached to this strength),</p></li><li><p>Less ego enabling org charts to scale slightly, as there&#8217;s less gamifying the system, and</p></li><li><p>Abundant talent well-suited to solving problems with a proof of concept elsewhere, etc.</p></li></ul><p>This slight inclination towards skills that complement building today&#8217;s language models stands in contrast to a known stereotype that Chinese researchers tend to produce less creative, field-spawning, 0-to-1 academic style research. Among the more academic lab visits on our trip, many leaders talk about cultivating this more ambitious research culture. At the same time, some technical leaders we talked to were skeptical about whether such a rewiring in the approach to science is likely in the near term, because it&#8217;ll take a redesign of the education and incentive systems that is too big to happen within the current economic equilibrium. This culture seems to be training students and engineers that are excellent at the LLM building game. They also, of course, have an extremely abundant quantity.</p><p>These students told me about a similar brain drain happening in China as in the U.S., where many who previously considered academic paths now intend to stay in industry. The funniest quote was from a researcher who was interested in being a professor to be close to the education system, but remarked that education is solved with LLMs &#8211; &#8220;why would a student talk to me!&#8221;</p><p>The students have a benefit of coming at LLMs with fresh eyes. Over the last few years we&#8217;ve seen the key paradigm of LLMs shift from scaling MoE&#8217;s, to scaling RL, to enabling agents. Doing any of these well involves absorbing an insane amount of context quickly, both from the broader literature and the technical stack at your company. Students are used to doing this and excited to humbly drop all presumptions about what should work. They dive in head first and dedicate their life to getting the chance to improve the models.</p><p>These students are also so magically direct and free of some of the philosophical chatter that can distract scientists. When asking questions on how they feel about the economics or long-term social risks of models, far fewer Chinese researchers have sophisticated opinions and a drive to influence this. Their role is to build the best model.</p><p>This difference is subtle, and easy to deny, but it is best felt when having long conversations with an elegant, brilliant researcher who can clearly communicate well in English, basic questions on more philosophical aspects of AI hang in the air with a simple confusion. It&#8217;s a category error to them. One researcher even quoted the famous Dan Wang premise of China being run by engineers, relative to the lawyers of the U.S. when probing in these areas, to emphasize their desire to build. There&#8217;s no track in China that systematically enables the growth of star power for Chinese scientists, akin to mega mainstream podcasts like Dwarkesh or Lex.</p><p>Trying to get Chinese scientists to comment on the coming economic uncertainty fueled by AI, questions beyond the capabilities of simple AGI, or moral debates on how models should behave all served to capture the upbringing and education of these scientists (edited<a class=\"footnote-anchor\" data-component-name=\"FootnoteAnchorToDOM\" id=\"footnote-anchor-1\" href=\"#footnote-1\" target=\"_self\">1</a>). They are extremely dedicated to their work, but have grown up in a system where debates and opinions on how society should be structured and changed are not encouraged. </p><p>Zooming out &#8212; Beijing especially felt much like the Bay Area, where a competitive lab is a short walk or Uber away. I got off a flight and stopped by Alibaba&#8217;s Beijing campus on the way to the hotel. Then, in 36 hours we went to all of Z.ai, Moonshot AI, Tsinghua University, Meituan, Xiaomi, and 01.ai. Travel by Didi is easy, and if you select an XL in China you&#8217;re often paired with electric mini vans that have massage chairs. We asked the researchers about the talent wars, and they said it&#8217;s very similar to what we&#8217;re experiencing in the U.S. It&#8217;s normal for researchers to bounce around, and much of where people choose to go is based on the best current vibes.</p><p>In China, the LLM community feels far more like an ecosystem than battling tribes. Across many off the record conversations, it&#8217;s nothing but respect for peers. All of the Chinese labs fear Bytedance with their popular Doubao model, which is the only frontier closed lab in China. At the same time, all of the labs have massive respect for DeepSeek as the lab with the best research taste in execution. When you meet with lab members off the record in the States, sparks fly quickly.</p><p>The most striking part of the humility of Chinese researchers is how they also often shrug on the business side, saying it&#8217;s not their problem, where everyone in the U.S. seems to be obsessed with various ecosystem-level industrial trends, from data sellers to compute or fundraising.</p><p class=\"button-wrapper\" data-attrs=\"{&quot;url&quot;:&quot;https://www.interconnects.ai/p/notes-from-inside-chinas-ai-labs?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}\" data-component-name=\"ButtonCreateButton\"><a class=\"button primary\" href=\"https://www.interconnects.ai/p/notes-from-inside-chinas-ai-labs?utm_source=substack&utm_medium=email&utm_content=share&action=share\"><span>Share</span></a></p><h2>Where China&#8217;s AI industry differs (and matches) the Western labs</h2><p>The thing that makes building an AI model today so interesting is that it&#8217;s not just about getting a group of great researchers in one building together to produce an engineering marvel. It used to be this, but to sustain AI businesses, the LLMs are becoming a mix of building, deploying, funding, and getting adoption for this creation. The leading AI companies exist in complex ecosystems that supply money, compute, data and more in order to keep pushing the frontier. </p><p>The integration of these various inputs to creating and sustaining LLMs is fairly well conceptualized and mapped for the Western ecosystem, as typified by Anthropic and OpenAI, so finding big differences in how the Chinese labs think about it points at where the different companies can be making meaningfully different bets on the future. Of course, these futures can be heavily dictated by the constraints on funding and/or compute.</p><p>I&#8217;ve documented the biggest &#8220;AI Industry&#8221; level take-aways from talking to these labs:</p><ol><li><p><strong>Early signs of domestic AI demand.</strong> There&#8217;s a much-touted hypothesis that the Chinese AI market will be smaller because Chinese companies don&#8217;t tend to pay for software &#8211; thus, never unlocking a giant inference market supporting labs. This is only true for software spend that maps to the SaaS ecosystem, which is historically tiny in China, where on the other hand there is obviously still a large cloud market in China. A crucial unanswered question &#8211; one which the Chinese labs themselves debate &#8211; on if spending for AI in the enterprise tracks the SaaS market (small) or the cloud market (fundamental). On net, it feels like AI is trending closer to the cloud, and no one was actively worried about a market growing around the new tools.</p></li><li><p><strong>Most developers are Claude-pilled.</strong> Most of the AI developers in China are obsessed with Claude and how it&#8217;s changed how they build software, despite Claude nominally being banned in China. Just because China has historically been hesitant to buy software does <em>not</em> give me the impression that there won&#8217;t be a massive surge in inference demand. Chinese technical staff are so practical, humble, and motivated &#8211; a fact that seems stronger than any commitment to previous habits in not spending.<br><br>Some Chinese researchers mention building with their own tools, such as the Kimi or GLM CLIs, but <em>all</em> of them mention building with Claude. There were also surprisingly few mentions of Codex, which is definitely surging in popularity in the Bay Area.</p></li><li><p><strong>Chinese companies have a technology ownership mentality.</strong> The Chinese culture is combining with a roaring economic engine to create unpredictable outcomes. I&#8217;m left with a lasting feeling that the numerous AI models reflect a practical, current equilibrium of the many technology businesses here. There&#8217;s no master plan. The industry is defined by a respect for ByteDance and Alibaba, the incumbents expected to win large portions of all markets with their substantial resources. DeepSeek is the respected technical leader, but far from a market leader. They set the direction, but aren&#8217;t set up to win economically.<br><br>This leaves companies like <a href=\"https://huggingface.co/meituan-longcat/LongCat-Next\">Meituan</a> or <a href=\"https://huggingface.co/inclusionAI/Ling-2.6-1T\">Ant Group</a>, where people in the West can be surprised they&#8217;re building these models. In reality, they see LLMs obviously as being central to future technology products, so they need a strong base. When they fine-tune the strong, general purpose model it hardens their stack from getting the open community to provide feedback on it, and they can keep internal, fine-tuned versions of the model for their products. The &#8220;open-first&#8221; mentality in the industry is largely defined by practicality &#8212; it helps make their models get strong feedback, it gives back to the open-source community, and empowers their mission.</p></li><li><p><strong>Government aid is real, but unclear how big.</strong> It&#8217;s often asserted that the Chinese government is actively helping with the open LLM race. This is a government that&#8217;s decentralized across many levels, each of which doesn&#8217;t have a clear playbook for what exactly they do. Neighborhoods in Beijing compete for tech companies to house their offices there. The &#8220;help&#8221; offered to these companies almost certainly involved removing bureaucratic red tape like permits, but how far does it go? Can levels of the government help attract talent? Can they help smuggle chips? Across the visit, there were many mentions of government interest or help, but far too little to report the details as assertive or have a confident worldview of how government can bend the trajectory of AI in China. <br><br>There were certainly no hints of the top levels of the Chinese government influencing any technical decisions in the models.</p></li><li><p><strong>The data industry is far less developed. </strong>Having heard so much about the likes of Anthropic or OpenAI spending $10M+ for single environments, with cumulative spend on the order of hundreds of millions per year to push the frontier of RL, we were eager to know if Chinese labs are either buying the same environments from companies in the U.S. or supported by a mirrored domestic ecosystem. The answer was not quite complete that there&#8217;s <em>no</em> data industry, but rather that their experience was that the data industry was relatively poor quality and it is often better to build the environments or data in-house. Researchers themselves spend meaningful time making the RL training environments, and some of the bigger companies like ByteDance and Alibaba can have in-house data labelling teams to support this. This all mirrors the build-not-buy mentality from the previous bullet.</p></li><li><p><strong>Desperation for more Nvidia chips. </strong>Nvidia compute is the gold-standard for training and everyone is limited in progress by not having more of it. If supply was there, it is obvious that they would buy it. Other accelerators, including but not limited to Huawei, were spoken positively of for inference. Countless labs have access to Huawei chips.</p></li></ol><p>These points paint a very different picture of an AI ecosystem, where quickly mapping how Western labs operate to their Chinese counterparts will often result in a category error. The crucial question is if these different ecosystems will produce meaningfully different types of models, or if the Chinese models will always be explained by being similar to the U.S. frontier models of 3-9 months ago.</p><p class=\"button-wrapper\" data-attrs=\"{&quot;url&quot;:&quot;https://www.interconnects.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}\" data-component-name=\"ButtonCreateButton\"><a class=\"button primary\" href=\"https://www.interconnects.ai/subscribe?\"><span>Subscribe now</span></a></p><h2>Conclusion: The global equilibrium</h2><p>I knew so little about China going into the trip and came out with the feeling of just starting to learn. China isn&#8217;t a place that can be expressed by rules or recipes, but one with very different dynamics and chemistry. The culture is so old, so deep, and still completely intertwined with how domestic technology is built. I have much more learning ahead.</p><p>So much of the current power structures in the US use their current worldviews of China as crucial mental devices for decision making. Having talked, in person, either formally or informally to pretty much every leading AI lab in China, there are a lot of qualities and instincts in China that&#8217;ll be very hard to model with Western decision making. Even after asking directly about <em>why</em> these labs release their top models openly, the intersection between ownership mentality and genuine ecosystem support is hard for me to connect the dots on. </p><p>The labs here are practical and not necessarily absolutists around open-source, where every model they build would be released openly, but there&#8217;s a deep intentionality in supporting developers, the ecosystem, and using it as a way to learn more about their models.</p><p>Almost every major Chinese technology company is building their own general purpose LLMs, as we see with the likes of Meituan (delivery service) and Xiaomi (broad consumer technology company) releasing open weight models. The equivalent companies in the U.S. would just buy services. These companies aren&#8217;t building LLMs out of a race to be relevant with the hot new thing, but a deep fundamental yearning to control their own stack and develop the most important technologies of the day. When I look up from my laptop and always see bunches of cranes on the horizon, it obviously fits in the with the broader culture and energy around building in China.</p><p>The humanity, charm, and genuine warmth of Chinese researchers is extremely humanizing. At a personal level, the cut-throat geopolitical conversation we&#8217;re used to in the U.S. hasn&#8217;t permeated them at all. The world can use more of this simple positivity. As a citizen of the AI community, I currently worry more about the fissures appearing within members and groups around labels of nationality. </p><p>I&#8217;d be lying if I said I didn&#8217;t want US labs to be clear leaders in every part of the AI stack &#8212; especially with open models where I spend my time &#8212; I&#8217;m American, and that&#8217;s an honest preference. With this, I want the open ecosystem itself to thrive globally, as this can create safer, more accessible, and more useful AI for the world, and right now the question is whether American labs will take the steps to own that leadership position. </p><p>As of finishing this piece, more <a href=\"https://x.com/andrewcurran_/status/2052023542582292855?s=46\">rumors</a> are swirling of executive orders influencing open models, which can further complicate this synergy between American leadership and the global ecosystem &#8212; it doesn&#8217;t fill me with confidence.</p><p>Thank you to all the wonderful people I got to talk to at Moonshot, Zhipu, Meituan, Xiaomi, Qwen, Ant Ling, 01.ai, and others. Everyone has been so welcoming and gracious with their time. I&#8217;ll keep sharing my thoughts on China as they crystallize, across culture generally and AI specifically. It is obvious that this knowledge will be directly relevant to the story unfolding at the frontier of AI development.</p><div class=\"footnote\" data-component-name=\"FootnoteToDOM\"><a id=\"footnote-1\" href=\"#footnote-anchor-1\" class=\"footnote-number\" contenteditable=\"false\" target=\"_self\">1</a><div class=\"footnote-content\"><p>Edit 05/07: In this paragraph in the original I misattributed an unwillingness to speak on broader issues to humility, which can of course play a part, but this habit is also shaped by the system which they were trained and raised, a system they are successful in and adept at navigating.</p><p>What I removed: &#8230; capture the upbringing and education of these scientists extreme humility of these scientists. It&#8217;s more than just being dedicated to <em>their</em> work, but they don&#8217;t want to comment on issues they&#8217;re not informed on.&#8230;</p></div></div>"
          },
          {
            "guid": {
              "#text": "https://www.interconnects.ai/p/the-distillation-panic",
              "@_isPermaLink": "false"
            },
            "link": "https://www.interconnects.ai/p/the-distillation-panic",
            "title": "The distillation panic",
            "pubDate": "Mon, 04 May 2026 15:56:44 GMT",
            "enclosure": {
              "@_url": "https://substack-post-media.s3.amazonaws.com/public/images/b94fcb01-2fef-44ce-9d1d-4b0d9b7a737e_3182x1790.png",
              "@_type": "image/jpeg",
              "@_length": "0"
            },
            "dc:creator": "Nathan Lambert",
            "description": "&#8216;Distillation attacks&#8217; is a horrible term for what is happening right now.",
            "content:encoded": "<p>&#8216;Distillation attacks&#8217; is a horrible term for what is happening right now. Yes, some Chinese labs are hacking or jailbreaking APIs to attempt to extract more signal from model APIs &#8212; stopping this is important to maintain the U.S.&#8217;s lead in AI capabilities. Referring to this as distillation attack is going to irrevocably associate all distillation with this behavior, and distillation generally is a core technique needed to diffuse AI capabilities broadly through academic and economic activities.</p><p>We went through this sort of language transition with the open source vs open weight debate. All the terms just reduced to open models &#8211; very few people in the large AI community know exactly how open-source differs from open-weights. And terminology matters, as the less informed people who still care about &#8212; and influence &#8212; the technology are bound by different terms they use. If we&#8217;re not careful with the discourse around distillation, many people could associate this broad technique used for research and development of new models as an act at the boundary of corporate manipulation and crime.</p><p class=\"button-wrapper\" data-attrs=\"{&quot;url&quot;:&quot;https://www.interconnects.ai/p/the-distillation-panic?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}\" data-component-name=\"ButtonCreateButton\"><a class=\"button primary\" href=\"https://www.interconnects.ai/p/the-distillation-panic?utm_source=substack&utm_medium=email&utm_content=share&action=share\"><span>Share</span></a></p><p>I&#8217;ve recently written a more <a href=\"https://www.interconnects.ai/p/how-much-does-distillation-really\">technical piece</a> on estimating how impactful state-of-the-art distillation methods are on leading Chinese models, and this piece follows to push for caution in any hasty actions to target the methods with policy. To set the stage, recall Anthropic&#8217;s recent blog post where they <a href=\"https://www.anthropic.com/news/detecting-and-preventing-distillation-attacks\">detailed &#8220;distillation attacks&#8221; made by 3 Chinese labs</a>.</p><blockquote><p>These labs used a technique called &#8220;distillation,&#8221; which involves training a less capable model on the outputs of a stronger one. Distillation is a widely used and legitimate training method. For example, frontier AI labs routinely distill their own models to create smaller, cheaper versions for their customers. But distillation can also be used for illicit purposes: competitors can use it to acquire powerful capabilities from other labs in a fraction of the time, and at a fraction of the cost, that it would take to develop them independently.</p></blockquote><p>This is a clever paragraph, where they normalize distillation generally and explain how a few people can use it illicitly, without detailing how illicit use often involves other more explicit behavior like jailbreaking, hacking, or identity spoofing of the API.</p><p>Distillation itself is an industry standard. It&#8217;s used extensively, primarily in post-training, by smaller players to create specialized or smaller models. In my <a href=\"https://rlhfbook.com/c/12-synthetic-data\">book</a> coming this summer, I describe it as follows:</p><blockquote><p>The term distillation has been the most powerful form of discussion around the role of synthetic data in language models. Distillation as a term comes from a technical definition of teacher-student knowledge distillation from the deep learning literature.</p><p>Distillation colloquially refers to using the outputs from a stronger model to train a smaller model.</p><p>In post-training, this general notion of distillation takes two common forms:</p><ol><li><p>As a data engine to use across wide swaths of the post-training process: Completions for instructions, preference data (or Constitutional AI), or verification for RL.</p></li><li><p>To transfer specific skills from a stronger model to a weaker model, which is often done for specific skills such as mathematical reasoning or coding.</p></li></ol></blockquote><p>With this definition, it&#8217;s easy to see how distillation takes many forms. Of course, if you just take the outputs from GPT-5.5 and train a recent open-weight base model with them to host a competitive product, that&#8217;s one thing. But, a lot of the things that fall under the bucket of distillation are complex, multi-stage processes that muddle the exact impact of the model you distilled from.</p><p>Modern LLM processes could look like using a GPT API to build an initial batch of synthetic data to build a specialized small data-processing model. A good example is a model like olmOCR (or many other models in this category) that are trained to convert PDFs to clean text. This specialized model would be used to create large amounts of data. Finally, you train another model (often from scratch) with the new data you created. Is this final model distilled from GPT?</p><p>When done via a closed, API-based model, distillation sits in the grey area of the terms of service that you agree to when signing up to the Claude or GPT platform. They generally forbid the use of the API to create competing language model products, but this term has largely gone unenforced. The open-source community used to worry deeply at being cut off from these cutting-edge APIs for doing research or creating public datasets, but to date only <a href=\"https://www.theverge.com/2023/12/15/24003151/bytedance-china-openai-microsoft-competitor-llm\">one prominent case of corporate accounts being restricted exists</a> (at least until the recent Chinese companies).</p><p>This is all to say that distillation is an industry standard technique, and the use of closed APIs to perform distillation has always been a grey area. Nvidia&#8217;s latest Nemotron models, as one of the only models with open post-training datasets, are technically in large part distilled from Chinese, open-weight models. The Olmo models we&#8217;ve built at Ai2 are distilled from a mix of open and closed models. This grey area was brought to the forefront again when it turned out that xAI has been distilling from OpenAI. Quoting from the recent trial <a href=\"https://x.com/MTSlive/status/2049886679876632724\">proceedings</a> between Elon and OpenAI:</p><blockquote><p>OpenAI&#8217;s counsel asked Musk whether xAI has ever &#8220;distilled&#8221; technology from OpenAI.</p><p>Musk: &#8220;Generally AI companies distill other AI companies.&#8221;</p><p>&#8220;Is that a yes?&#8221; Savitt asked.</p><p>Musk: &#8220;Partly.&#8221;</p></blockquote><p>xAI is likely the largest, and most successful AI company willing to thread the grey area that is distillation from their competitors. On the other side, the majority of startups and research groups with fewer resources than them have very likely engaged in distillation of some capacity from Claude, GPT, or Gemini models.</p><div class=\"subscription-widget-wrap-editor\" data-attrs=\"{&quot;url&quot;:&quot;https://www.interconnects.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}\" data-component-name=\"SubscribeWidgetToDOM\"><div class=\"subscription-widget show-subscribe\"><div class=\"preamble\"><p class=\"cta-caption\">Interconnects AI is a reader-supported publication. Consider becoming a subscriber.</p></div><form class=\"subscription-widget-subscribe\"><input type=\"email\" class=\"email-input\" name=\"email\" placeholder=\"Type your email&#8230;\" tabindex=\"-1\"><input type=\"submit\" class=\"button primary\" value=\"Subscribe\"><div class=\"fake-input-wrapper\"><div class=\"fake-input\"></div><div class=\"fake-button\"></div></div></form></div></div><p>In the above Anthropic blog post, the problem with the distillation attacks by a few Chinese labs is less the distillation and more the means of attack. It is documented that Chinese labs are actively working to get around the intended use of the API, e.g. to provide additional reasoning data that is very useful for training.</p><p>Of course no one should be able to access information from a model that a developer didn&#8217;t intend to reveal in their APIs (e.g., reasoning traces which would be helpful for training). Associating all of distillation with these attacks, which is to date an industry standard for post-training, from open and closed models alike will be a massive own goal.</p><p>What these few labs are doing should be referred to as jailbreaking or abuse, rather than distillation.</p><p>The discourse around these actions is creating a troubling discussion that&#8217;s marching towards a mix of regulatory capture or regulatory exuberance that&#8217;s most likely to harm the U.S.&#8217;s ecosystem more than China&#8217;s. Even if we ban, most likely through potential legal action and other penalties, this type of API abuse, the Chinese companies will likely still do it. We&#8217;ve seen this playbook with Chinese multimedia models taking a flexible view of copyrighted content that no U.S. player is willing to take the risk on.</p><p>This distillation discussion has quickly snowballed, with a <a href=\"https://www.congress.gov/bill/119th-congress/house-bill/8283/text\">bill moving out of a committee in Congress</a>, an <a href=\"https://whitehouse.gov/wp-content/uploads/2026/04/NSTM-4.pdf\">executive order</a> pushing for action, and <a href=\"https://www.semafor.com/article/04/29/2026/house-committee-probes-cursor-parent-airbnb-over-chinese-ai\">congressional oversight</a> targeting U.S. companies building on Chinese models (which are downstream of distillation). This multi-pronged regulatory environment could yield truly horrible outcomes &#8211; such as figuring out a way to effectively ban open-weight models in the U.S. that are built in China by groups abusing closed LLM APIs.</p><p>It is obvious that no bill will literally ban open models, but they can create grey area that exposes entities to unwanted risk or require certain provisions that are bureaucratically very challenging to fulfill, squashing small open source contributors.</p><p>In that scenario, the groups who lose are Western academics and smaller companies building models for the long-tail of AI uses. The ecosystem here could be made permanently irrelevant with the removal of nearly all Chinese open-weight models. There is no immediate substitute and building new models with meaningful community adoption has a lead time measured in 6+ months. In the time it takes to build a new domestic open-source ecosystem, countless researchers would&#8217;ve moved onto closed training platforms or into new areas.</p><p>Altogether, I&#8217;m hoping this flurry of discussion around distillation becomes a nothing-burger and not a hasty, multi-pronged policy push. We need to avoid two things:</p><ol><li><p>A wholesale negative connotation of the word distillation, which is used extensively across the AI ecosystem.</p></li><li><p>A domestic ban of the open-weight models built by organizations engaged in some portion of distillation.</p></li></ol><p>In addition to this, I want the leading U.S. AI companies to be able to provide their APIs without having their IP leak. They should share more information on why it is hard for them to secure their APIs, but that&#8217;s an issue out of scope for my expertise.</p><p>I&#8217;ll conclude with a proposal from my friend Kevin Xu at <a href=\"https://www.interconnectedcapital.com/\">Interconnected Capital</a> (and great <a href=\"https://interconnect.substack.com/\">Substack</a>) on why this current distillation dynamic may actually be good for the leading labs.</p><p>If all the Chinese companies are addicted to distillation as a way of getting close to the frontier, then they&#8217;ll never actually learn the techniques needed to take an outright lead. If we cut off the Chinese&#8217;s obvious crutch in model building, we&#8217;ll gain a short-term lead in AI, but in the long-term that may be what they needed to get on a more competitive long-term trajectory. </p><p>This is the same debate we&#8217;re having with other technologies where the U.S. currently has a lead, e.g. with advanced semiconductor technologies. So I understand the trade-offs, but we not should crack down on all of distillation.</p>"
          },
          {
            "guid": {
              "#text": "https://www.interconnects.ai/p/reading-todays-open-closed-performance",
              "@_isPermaLink": "false"
            },
            "link": "https://www.interconnects.ai/p/reading-todays-open-closed-performance",
            "title": "Reading today's open-closed performance gap",
            "pubDate": "Mon, 20 Apr 2026 18:25:02 GMT",
            "enclosure": {
              "@_url": "https://substack-post-media.s3.amazonaws.com/public/images/707c69ea-fa59-4eba-8034-25b0af9b5443_3182x1790.png",
              "@_type": "image/jpeg",
              "@_length": "0"
            },
            "dc:creator": "Nathan Lambert",
            "description": "The complex factors that determine the single evaluation number so many focus on. Plus, how this changes in the future.",
            "content:encoded": "<p>It&#8217;s a clear, current equilibrium that open models will be in <a href=\"https://www.interconnects.ai/p/open-models-in-perpetual-catch-up\">perpetual catch-up of closed models</a>, but this gap being viewed as a single number, a &#8220;distance&#8221;, covers up a nuanced and crucial dynamic at what capabilities the models are covering. The most popular benchmark to comment on this gap is the <a href=\"https://artificialanalysis.ai/evaluations/artificial-analysis-intelligence-index\">Artificial Analysis Intelligence Index</a> &#8212; a composite benchmark of ~10 sub-evals that they maintain over time to capture the &#8220;frontier&#8221; of current language model capabilities. </p><p>Particularly, I spend a lot of time understanding how dynamics that <em>feed into</em> that index are misunderstood by the natural tendency to reduce performance and trends to one number. Examples include:</p><ul><li><p>How benchmarks evolve over time, becoming more or less correlated with how people actually use models,</p></li><li><p>How different models&#8217; real-world performance relates to their benchmark rankings, and</p></li><li><p>How training regimes evolve over time to move said benchmarks.</p></li></ul><p>Agentic benchmarks are in a decent place, but benchmarks are no longer as trusted as a correlate to real-world performance. A key example to this gray area is Gemini 3&#8217;s incredible benchmarks and remarkable irrelevance in where AI tools currently are being tested and deployed (agents). These trends point to obvious and lasting flaws in our measurements.</p><p class=\"button-wrapper\" data-attrs=\"{&quot;url&quot;:&quot;https://www.interconnects.ai/p/reading-todays-open-closed-performance?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}\" data-component-name=\"ButtonCreateButton\"><a class=\"button primary\" href=\"https://www.interconnects.ai/p/reading-todays-open-closed-performance?utm_source=substack&utm_medium=email&utm_content=share&action=share\"><span>Share</span></a></p><p>At the root of this dynamic &#8212; the dance of correlating model real-world performance and benchmark scores &#8212; is the constant shift of the industry. As all the models, i.e. both open and closed, evolve over time, the topics of focus for benchmarking shifts about every 12 to 18 months. All of the domains of interest have very different training domains associated with them, especially in post-training. The longer a single paradigm goes on, the better the industry gets at measuring performance. In a new era of rapid post-training improvements, I&#8217;m at a relative minimum in my personal confidence in benchmarks.</p><h3>Task evolution and LLM paradigms</h3><p>Right after ChatGPT the focus was a mix of chat, math, and simple code. Instruction tuning and RLHF dominated. Chat capabilities saturated and faded quickly, then mathematics became less focal. Through 2025 and to today, especially once reasoning models became the default, the focus shifted to more complex coding and other simpler agentic tasks. We&#8217;re at the tail end of this first era. Recent training recipes are all dominated by reinforcement learning with verifiable rewards (RLVR), but the domains it is applied in have shifted dramatically from basic question-answer checking to complex environments.</p><p>What we&#8217;re seeing is that the closed, frontier labs are investing astounding sums of money in mastering these current foci &#8212; i.e. code, terminal tasks, etc. &#8212; while starting to push into more diverse knowledge work tasks. These newer tasks encompass specialized domains, such as accounting, law, healthcare, etc. They are still agentic, but require more expertise and often integrations with existing software or domain-specific tools.</p><p>We have very limited evidence on the true balance of capabilities of these newer domains, but these are the areas I&#8217;m focusing on when I say open models will struggle to keep up. The problem is that evaluating <em>complex</em> language model workflows is also a challenging research problem in itself. </p><p>The tasks are getting harder and the data needed to hillclimb on them is getting more private (relative to code, which has swaths of code on GitHub). Leading open model labs are helped by dynamics happening in the data industry that are economically similar to building chip fabs. The few, leading labs in the U.S. pay astronomical sums to buy new environments and datasets, then the fast-following labs (often in China), buy these later at a steep discount. </p><p>This is a key missed point &#8212; that the levers non-frontier labs pull to keep up constantly shift over time. A focus on distillation as the key lever to Chinese models&#8217; progress reflects a blind-spot to the importance of RL environments to current training regimes. If an environment can be built either as a single evaluation in the Artificial Analysis Index, or to mirror it, currently the Chinese labs will be able to keep up. </p><div class=\"subscription-widget-wrap-editor\" data-attrs=\"{&quot;url&quot;:&quot;https://www.interconnects.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}\" data-component-name=\"SubscribeWidgetToDOM\"><div class=\"subscription-widget show-subscribe\"><div class=\"preamble\"><p class=\"cta-caption\">Interconnects AI is a reader-supported publication. Consider becoming a subscriber.</p></div><form class=\"subscription-widget-subscribe\"><input type=\"email\" class=\"email-input\" name=\"email\" placeholder=\"Type your email&#8230;\" tabindex=\"-1\"><input type=\"submit\" class=\"button primary\" value=\"Subscribe\"><div class=\"fake-input-wrapper\"><div class=\"fake-input\"></div><div class=\"fake-button\"></div></div></form></div></div><h3>Economic pressure to reinvent &#8220;the frontier&#8221;</h3><p>The question worth dwelling on is: How crucial is the current set of tasks (again, coding and terminal tasks), where the likes of OpenAI and Anthropic have a massive business-adoption advantage over leading open weight models (and even Google alike), is crucial to maintaining revenue numbers? In order to maintain these record growth numbers and trajectories, there needs to keep being a meaningful edge in performance. Many companies would love to reduce their token expenditure cost if they can swap in a far cheaper, open model equivalent. </p><p>If agentic coding abilities saturate and the &#8220;frontier&#8221; of AI performance moves elsewhere, a large amount of the enterprise revenue could be reliant on well-formed customer relationships, inertia, and better product development, rather than the models being leaps and bounds better.</p><p>This precarious position is what I describe as the frontier labs needing to constantly reinvent themselves, and the field&#8217;s prospects, for monetizing the vast buildout of AI infrastructure. I still tend to fall on the side that the buildout will be worth it, and Anthropic and OpenAI will be astronomically profitable businesses, so I take this as a faith of a mix of them continuing to unlock compelling, new, valuable use-cases for the models, and that the benchmarks the open models are closing in on as <em>not being a complete signal</em>. </p><p>I operate with a sort of presumption where the leading open models from China are focused <em>slightly</em> more on benchmarks than the leading closed labs in the U.S. They&#8217;re incentivized to do so &#8212; they want to present the image as constantly being on the heels of the best closed models. Saying the Chinese labs are only in this narrative because they&#8217;re overfitting to benchmarks would be incredibly naive and incorrect. They&#8217;re genuinely strong models, and these dynamics of overselling and real innovation are a fine balance.</p><p>There are a few out-of-distribution benchmarks where open-weight models are very far behind, such as <a href=\"https://htihle.github.io/weirdml.html\">WeirdML</a> or <a href=\"https://epoch.ai/benchmarks/arc-agi-2/\">ARC AGI 2</a>, but there are countless random benchmarks that show these open models as being unexpectedly strong. When you use the models, you can pick up on this lack of robustness (e.g. in long-context capabilities, and needing to reset your agent context more often than Claude/Codex), but they&#8217;re not a category error in the sense that they&#8217;re fundamentally different classes of models. They&#8217;re far closer than many would&#8217;ve expected.</p><h3>How long can open models keep up?</h3>\n      <p>\n          <a href=\"https://www.interconnects.ai/p/reading-todays-open-closed-performance\">\n              Read more\n          </a>\n      </p>\n   "
          },
          {
            "guid": {
              "#text": "https://www.interconnects.ai/p/my-bets-on-open-models-mid-2026",
              "@_isPermaLink": "false"
            },
            "link": "https://www.interconnects.ai/p/my-bets-on-open-models-mid-2026",
            "title": "My bets on open models, mid-2026",
            "pubDate": "Wed, 15 Apr 2026 18:20:00 GMT",
            "enclosure": {
              "@_url": "https://substack-post-media.s3.amazonaws.com/public/images/8be08b29-d70a-43f3-8422-6b952816ddab_3182x1790.png",
              "@_type": "image/jpeg",
              "@_length": "0"
            },
            "dc:creator": "Nathan Lambert",
            "description": "What I expect to come next and why, focused on the open-closed gap.",
            "content:encoded": "<p>We&#8217;re living through the period of time when we&#8217;ll learn if open models can keep up with closed labs. The obvious answer is that no, they won&#8217;t. This answer is a form of saying they won&#8217;t keep up in <em>every area</em>. This framing closes off a popular prediction where the open models completely <em>catch up</em>, as in all models saturate and open and closed models only become increasingly similar. In living through this, it&#8217;s evidently very unclear when the longer-term stable balance of capabilities will solidify. </p><p>This is a very complex dynamic, where the core point we monitor is a <a href=\"https://www.interconnects.ai/p/open-models-in-perpetual-catch-up\">capability gap between models</a>. At the same time, this gap is intertwined with evolving dynamics in the funding of open models, who builds open models, how techniques like distillation that enable fast-following translate through new application domains, potential regulation hampering the open-source AI ecosystem, and of course who actually uses open models. </p><p>The capabilities gap is one signal in a complex sea of forces, pushing supply and demand into different shapes. In many cases the demand &#8212; where obviously tons of individuals, organizations, and sovereigns want, or need, open models &#8212; is largely separated from supply. Supply is fully dictated by economics. The question of &#8220;which business strategies support releasing open models&#8221; is still at stake.</p><div class=\"subscription-widget-wrap-editor\" data-attrs=\"{&quot;url&quot;:&quot;https://www.interconnects.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}\" data-component-name=\"SubscribeWidgetToDOM\"><div class=\"subscription-widget show-subscribe\"><div class=\"preamble\"><p class=\"cta-caption\">Interconnects AI is a reader-supported publication. To receive new posts and support my work, consider becoming a subscriber.</p></div><form class=\"subscription-widget-subscribe\"><input type=\"email\" class=\"email-input\" name=\"email\" placeholder=\"Type your email&#8230;\" tabindex=\"-1\"><input type=\"submit\" class=\"button primary\" value=\"Subscribe\"><div class=\"fake-input-wrapper\"><div class=\"fake-input\"></div><div class=\"fake-button\"></div></div></form></div></div><p>With this complexity, I wanted to distill my key beliefs down into a clear list. These are downstream of 10+ pieces I&#8217;ve written or recorded on open models this spring (which are linked throughout).</p><ol><li><p><strong>It&#8217;s surprising that the top closed models did </strong><em><strong>not</strong></em><strong> show a growing capability margin over open models</strong>, based on compute differences for training and research, especially in the second half of 2025 and through today.</p></li><li><p><strong>Open model labs are technically very strong</strong> at keeping pace on well-established benchmarks. This will continue and reflects a balance of abundant talent and sufficient computing power. </p></li><li><p><strong>Chinese open-weight labs focus </strong><em><strong>slightly</strong></em><strong> more on benchmark scores</strong> than comparable closed labs in the U.S. <a href=\"https://www.interconnects.ai/p/how-much-does-distillation-really\">Distillation</a> helps the Chinese LLM companies do so, but it&#8217;s not a panacea. Changes in the distillation dynamic (e.g. regulation) will not be a determining factor on the balance of capabilities. This increase in focus is a natural evolution of their incentives in keeping the narrative on keeping up with the frontier alive, which is crucial to fundraising and adoption.</p></li><li><p><strong>To date, closed models tend to be more robust and generally useful than similarly scoring open models</strong>. Closed models have certain hard-to-measure qualities that are not well captured in current or past benchmarks. This will be key to enabling closed models to dominate in markets where an individual user constantly presents new challenges, i.e. supporting knowledge workers as a direct assistant.</p></li><li><p><strong>The open vs. closed model race, as monitored through benchmarks, will largely be a game of economic staying power</strong> and fast-following, until the market structure constricts. I expect Chinese open-weight labs to face funding difficulties first, as soon as later this year. Funding difficulties will be seen in different capability trajectories 3-9 months later.</p></li><li><p><strong>The RL dominated training era has increased the relevance of distribution to real-world use-cases as a key factor in continued capabilities improvements</strong>. These are tasks where users directly use tools like Claude Code or Codex to solve problems in their job with agents. This is the first clear technical area that closed labs can dominate open-weight models on capabilities, potentially <a href=\"https://cursor.com/blog/real-time-rl-for-composer\">leveraging online RL directly</a> based on user feedback.</p></li><li><p><strong>Open models will be increasingly adopted in repetitive automation tasks</strong>, as measured in the relative share of the API market, for repetitive tasks across the ecosystem. This takes the form of many new AI-native applications, business backend automation, etc. The success of this will <a href=\"https://www.interconnects.ai/p/the-next-phase-of-open-models\">drive more investment in domain-specific, efficient open models</a>.</p></li></ol><p>This is a complex picture, where the long-term trajectory is more of an economics question rather than an ability one. Many other outlets can paint a far more simplistic narrative that &#8220;<a href=\"https://www.nytimes.com/2026/04/13/opinion/china-ai-america-chipmakers.html\">China will assuredly catch us in AI</a>&#8221; and get more distribution because it is a simple story. The reality is complex. Only real AI revenue begets more investment, eventually that&#8217;ll be linked to the ability to keep improving models at a rapid rate. Economic realities have not yet impacted scaling open models, as a general category.</p><p class=\"button-wrapper\" data-attrs=\"{&quot;url&quot;:&quot;https://www.interconnects.ai/p/my-bets-on-open-models-mid-2026?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}\" data-component-name=\"ButtonCreateButton\"><a class=\"button primary\" href=\"https://www.interconnects.ai/p/my-bets-on-open-models-mid-2026?utm_source=substack&utm_medium=email&utm_content=share&action=share\"><span>Share</span></a></p><p>This economic-focused angle relates to my positions on the open model ecosystem more broadly.</p><ol start=\"8\"><li><p><strong>Recurring calls to ban certain types of open models will continue to come but are in practice impossible to implement.</strong> Training strong AI models (i.e. near but not at the frontier) is a relatively small cost compared to large-scale deployments. E.g. if the U.S. bans open models over a certain compute threshold, another sovereign entity will eventually train them and release them publicly, with the models entering the U.S. market with less oversight.</p></li><li><p><strong>The second derivative of influence on open models has shifted, and the U.S. will slowly regain ground in <a href=\"https://www.interconnects.ai/p/8-plots-that-explain-the-state-of\">adoption metrics</a></strong> of open models starting in early 2027 (it takes a long time for China&#8217;s velocity to slow, then flip). Examples include Google&#8217;s <a href=\"https://www.interconnects.ai/p/gemma-4-and-what-makes-an-open-model\">Gemma 4</a> (a wild success), <a href=\"https://www.interconnects.ai/p/why-nvidia-builds-open-models-with\">Nvidia&#8217;s Nemotron</a>, and <a href=\"https://www.interconnects.ai/p/arcee-ai-goes-all-in-on-open-models\">Arcee AI</a>.</p></li><li><p>As ever-stronger closed models are built, previewed, and released, there will be more <strong>safety-shocks saying that open-weight versions of the strongest AI models never can be allowed to exist</strong>, similar to reactions to <a href=\"https://www.interconnects.ai/p/claude-mythos-and-misguided-open\">Claude Mythos</a>. These can spur burdensome regulation on open models.</p></li><li><p>With the above, there will also be <strong>increased long-term interest in open models</strong>, as sovereign entities and existing power structures realize the coming, super powerful AI tools<a href=\"https://www.interconnects.ai/p/how-anthropic-vs-dow-impacts-open\"> cannot land in the hands of only one or a few companies</a>. These entities will see open models as a different governance paradigm.</p></li><li><p><strong>New funding structures for open models will emerge</strong>, as many stakeholders realize <a href=\"https://www.interconnects.ai/p/the-inevitable-need-for-an-open-model\">dependencies on single, for-profit companies for access to intelligence are unreliable</a>.</p></li><li><p><strong>Local agents, OpenClaw, and other personal agents represent a large, to date, mostly ignored market for open model usage</strong>. It is a sort of dark matter, with pervasive, massive potential for influence on the balance of open-to-closed models.</p></li></ol><p>A single word governs this post and is intentionally repeated &#8212; complex.</p><p>This complex reality has been driving me to think more deeply about how to clearly describe the open model gap, and why I can hold it in my head that I expect American closed labs to clearly draw ahead, despite the fairly unequivocal evidence in support of the capabilities of recent open-weight models. More on the nuance in the open-closed gap in another piece coming soon, so <a href=\"https://www.interconnects.ai/subscribe\">please subscribe</a>!</p><p>Let me know any positions that I missed.</p>"
          },
          {
            "guid": {
              "#text": "https://www.interconnects.ai/p/what-ive-been-building-atom-report",
              "@_isPermaLink": "false"
            },
            "link": "https://www.interconnects.ai/p/what-ive-been-building-atom-report",
            "title": "What I’ve been building: ATOM Report, post-training course, finishing my book, and ongoing research",
            "pubDate": "Tue, 14 Apr 2026 20:41:12 GMT",
            "enclosure": {
              "@_url": "https://substackcdn.com/image/fetch/$s_!Bv0Q!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2d8ba64-922d-4000-9d57-12cb5524a238_1200x675.jpeg",
              "@_type": "image/jpeg",
              "@_length": "0"
            },
            "dc:creator": "Nathan Lambert",
            "description": "What I've been up to!",
            "content:encoded": "<p>This post is a roundup of my recent efforts that did not warrant a standalone Interconnects post, why I&#8217;m spending time on them, and what they accomplished.</p><ol><li><p><a href=\"https://www.interconnects.ai/i/194224428/1-the-atom-report-measuring-the-open-language-model-ecosystem\">The ATOM Report: Measuring the Open Language Model Ecosystem</a></p></li><li><p><a href=\"https://www.interconnects.ai/i/194224428/2-rlhf-book-is-done-and-ready-for-pre-order\">RLHF Book is done &amp; ready for pre-order!</a></p></li><li><p><a href=\"https://www.interconnects.ai/i/194224428/3-a-post-training-course-im-making\">A post-training course I&#8217;m making</a></p></li><li><p><a href=\"https://www.interconnects.ai/i/194224428/4-recent-technical-research\">Recent technical research</a></p></li></ol><p class=\"button-wrapper\" data-attrs=\"{&quot;url&quot;:&quot;https://www.interconnects.ai/p/what-ive-been-building-atom-report?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}\" data-component-name=\"ButtonCreateButton\"><a class=\"button primary\" href=\"https://www.interconnects.ai/p/what-ive-been-building-atom-report?utm_source=substack&utm_medium=email&utm_content=share&action=share\"><span>Share</span></a></p><h2>1. The ATOM Report: Measuring the Open Language Model Ecosystem</h2><p><a href=\"https://arxiv.org/abs/2604.07190\">https://arxiv.org/abs/2604.07190</a></p><p>To accompany The ATOM Project <a href=\"https://atomproject.ai/\">memo</a>, arguably a manifesto, making the case for investment in open models in the U.S. &#8211; originally launched in August 2025 &#8211; we&#8217;ve released an updated technical report with our latest data, analysis, and storytelling within the open language model ecosystem. The ATOM Report is dense with the methods Florian and I use to keep track of the open ecosystem. It covers GPT-OSS&#8217;s rise, inference market share, the influence of China&#8217;s mid-tier players like Moonshot, Z.ai, &amp; MiniMax, signs of the U.S.&#8217;s progress on open models, and much more.</p><div class=\"captioned-image-container\"><figure><a class=\"image-link image2 is-viewable-img\" target=\"_blank\" href=\"https://substackcdn.com/image/fetch/$s_!JZNn!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F89b0ff17-1243-46dd-a81e-96c975f20a7b_2582x1992.png\" data-component-name=\"Image2ToDOM\"><div class=\"image2-inset\"><picture><source type=\"image/webp\" srcset=\"https://substackcdn.com/image/fetch/$s_!JZNn!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F89b0ff17-1243-46dd-a81e-96c975f20a7b_2582x1992.png 424w, https://substackcdn.com/image/fetch/$s_!JZNn!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F89b0ff17-1243-46dd-a81e-96c975f20a7b_2582x1992.png 848w, https://substackcdn.com/image/fetch/$s_!JZNn!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F89b0ff17-1243-46dd-a81e-96c975f20a7b_2582x1992.png 1272w, https://substackcdn.com/image/fetch/$s_!JZNn!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F89b0ff17-1243-46dd-a81e-96c975f20a7b_2582x1992.png 1456w\" sizes=\"100vw\"><img src=\"https://substackcdn.com/image/fetch/$s_!JZNn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F89b0ff17-1243-46dd-a81e-96c975f20a7b_2582x1992.png\" width=\"1456\" height=\"1123\" data-attrs=\"{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/89b0ff17-1243-46dd-a81e-96c975f20a7b_2582x1992.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1123,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:330823,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.interconnects.ai/i/194224428?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F89b0ff17-1243-46dd-a81e-96c975f20a7b_2582x1992.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}\" class=\"sizing-normal\" alt=\"\" srcset=\"https://substackcdn.com/image/fetch/$s_!JZNn!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F89b0ff17-1243-46dd-a81e-96c975f20a7b_2582x1992.png 424w, https://substackcdn.com/image/fetch/$s_!JZNn!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F89b0ff17-1243-46dd-a81e-96c975f20a7b_2582x1992.png 848w, https://substackcdn.com/image/fetch/$s_!JZNn!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F89b0ff17-1243-46dd-a81e-96c975f20a7b_2582x1992.png 1272w, https://substackcdn.com/image/fetch/$s_!JZNn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F89b0ff17-1243-46dd-a81e-96c975f20a7b_2582x1992.png 1456w\" sizes=\"100vw\" fetchpriority=\"high\"></picture><div class=\"image-link-expand\"><div class=\"pencraft pc-display-flex pc-gap-8 pc-reset\"><button tabindex=\"0\" type=\"button\" class=\"pencraft pc-reset pencraft icon-container restack-image\"><svg role=\"img\" width=\"20\" height=\"20\" viewBox=\"0 0 20 20\" fill=\"none\" stroke-width=\"1.5\" stroke=\"var(--color-fg-primary)\" stroke-linecap=\"round\" stroke-linejoin=\"round\" xmlns=\"http://www.w3.org/2000/svg\"><g><title></title><path d=\"M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882\"></path></g></svg></button><button tabindex=\"0\" type=\"button\" class=\"pencraft pc-reset pencraft icon-container view-image\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"20\" height=\"20\" viewBox=\"0 0 24 24\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"2\" stroke-linecap=\"round\" stroke-linejoin=\"round\" class=\"lucide lucide-maximize2 lucide-maximize-2\"><polyline points=\"15 3 21 3 21 9\"></polyline><polyline points=\"9 21 3 21 3 15\"></polyline><line x1=\"21\" x2=\"14\" y1=\"3\" y2=\"10\"></line><line x1=\"3\" x2=\"10\" y1=\"21\" y2=\"14\"></line></svg></button></div></div></div></a></figure></div><p>In particular, the paper details our updates to the <a href=\"https://atomproject.ai/relative-adoption-metric\">Relative Adoption Metric (RAM)</a>, which we use to evaluate the adoption of recent models in a time-varying and size-normalized manner. Here&#8217;s a sampling of recent, primarily Chinese, models on the RAM score. The RAM score is designed so that a score &gt;1 indicates a model is, at that point in time, on track to be a top 10 most downloaded model of its size category, ever. It reduces a messy landscape to one, easily interpretable number!</p><div class=\"captioned-image-container\"><figure><a class=\"image-link image2 is-viewable-img\" target=\"_blank\" href=\"https://substackcdn.com/image/fetch/$s_!TeBR!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ef64b7a-04f2-4ed8-9cc4-966b775e9f59_1918x1336.png\" data-component-name=\"Image2ToDOM\"><div class=\"image2-inset\"><picture><source type=\"image/webp\" srcset=\"https://substackcdn.com/image/fetch/$s_!TeBR!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ef64b7a-04f2-4ed8-9cc4-966b775e9f59_1918x1336.png 424w, https://substackcdn.com/image/fetch/$s_!TeBR!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ef64b7a-04f2-4ed8-9cc4-966b775e9f59_1918x1336.png 848w, https://substackcdn.com/image/fetch/$s_!TeBR!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ef64b7a-04f2-4ed8-9cc4-966b775e9f59_1918x1336.png 1272w, https://substackcdn.com/image/fetch/$s_!TeBR!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ef64b7a-04f2-4ed8-9cc4-966b775e9f59_1918x1336.png 1456w\" sizes=\"100vw\"><img src=\"https://substackcdn.com/image/fetch/$s_!TeBR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ef64b7a-04f2-4ed8-9cc4-966b775e9f59_1918x1336.png\" width=\"1456\" height=\"1014\" data-attrs=\"{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6ef64b7a-04f2-4ed8-9cc4-966b775e9f59_1918x1336.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1014,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:323626,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.interconnects.ai/i/194224428?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ef64b7a-04f2-4ed8-9cc4-966b775e9f59_1918x1336.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}\" class=\"sizing-normal\" alt=\"\" srcset=\"https://substackcdn.com/image/fetch/$s_!TeBR!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ef64b7a-04f2-4ed8-9cc4-966b775e9f59_1918x1336.png 424w, https://substackcdn.com/image/fetch/$s_!TeBR!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ef64b7a-04f2-4ed8-9cc4-966b775e9f59_1918x1336.png 848w, https://substackcdn.com/image/fetch/$s_!TeBR!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ef64b7a-04f2-4ed8-9cc4-966b775e9f59_1918x1336.png 1272w, https://substackcdn.com/image/fetch/$s_!TeBR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ef64b7a-04f2-4ed8-9cc4-966b775e9f59_1918x1336.png 1456w\" sizes=\"100vw\"></picture><div class=\"image-link-expand\"><div class=\"pencraft pc-display-flex pc-gap-8 pc-reset\"><button tabindex=\"0\" type=\"button\" class=\"pencraft pc-reset pencraft icon-container restack-image\"><svg role=\"img\" width=\"20\" height=\"20\" viewBox=\"0 0 20 20\" fill=\"none\" stroke-width=\"1.5\" stroke=\"var(--color-fg-primary)\" stroke-linecap=\"round\" stroke-linejoin=\"round\" xmlns=\"http://www.w3.org/2000/svg\"><g><title></title><path d=\"M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882\"></path></g></svg></button><button tabindex=\"0\" type=\"button\" class=\"pencraft pc-reset pencraft icon-container view-image\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"20\" height=\"20\" viewBox=\"0 0 24 24\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"2\" stroke-linecap=\"round\" stroke-linejoin=\"round\" class=\"lucide lucide-maximize2 lucide-maximize-2\"><polyline points=\"15 3 21 3 21 9\"></polyline><polyline points=\"9 21 3 21 3 15\"></polyline><line x1=\"21\" x2=\"14\" y1=\"3\" y2=\"10\"></line><line x1=\"3\" x2=\"10\" y1=\"21\" y2=\"14\"></line></svg></button></div></div></div></a></figure></div><p>We used the data to also analyze the recent <a href=\"https://www.interconnects.ai/p/gemma-4-and-what-makes-an-open-model\">Gemma 4</a> release, which is showing incredible early adoption numbers. We&#8217;ll stay tuned on it!</p><div class=\"captioned-image-container\"><figure><a class=\"image-link image2 is-viewable-img\" target=\"_blank\" href=\"https://substackcdn.com/image/fetch/$s_!u86h!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9e2abfe-443e-48e8-bfda-bb5855dee388_1936x1056.jpeg\" data-component-name=\"Image2ToDOM\"><div class=\"image2-inset\"><picture><source type=\"image/webp\" srcset=\"https://substackcdn.com/image/fetch/$s_!u86h!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9e2abfe-443e-48e8-bfda-bb5855dee388_1936x1056.jpeg 424w, https://substackcdn.com/image/fetch/$s_!u86h!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9e2abfe-443e-48e8-bfda-bb5855dee388_1936x1056.jpeg 848w, https://substackcdn.com/image/fetch/$s_!u86h!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9e2abfe-443e-48e8-bfda-bb5855dee388_1936x1056.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!u86h!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9e2abfe-443e-48e8-bfda-bb5855dee388_1936x1056.jpeg 1456w\" sizes=\"100vw\"><img src=\"https://substackcdn.com/image/fetch/$s_!u86h!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9e2abfe-443e-48e8-bfda-bb5855dee388_1936x1056.jpeg\" width=\"1456\" height=\"794\" data-attrs=\"{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b9e2abfe-443e-48e8-bfda-bb5855dee388_1936x1056.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:794,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Image&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}\" class=\"sizing-normal\" alt=\"Image\" title=\"Image\" srcset=\"https://substackcdn.com/image/fetch/$s_!u86h!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9e2abfe-443e-48e8-bfda-bb5855dee388_1936x1056.jpeg 424w, https://substackcdn.com/image/fetch/$s_!u86h!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9e2abfe-443e-48e8-bfda-bb5855dee388_1936x1056.jpeg 848w, https://substackcdn.com/image/fetch/$s_!u86h!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9e2abfe-443e-48e8-bfda-bb5855dee388_1936x1056.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!u86h!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9e2abfe-443e-48e8-bfda-bb5855dee388_1936x1056.jpeg 1456w\" sizes=\"100vw\" loading=\"lazy\"></picture><div class=\"image-link-expand\"><div class=\"pencraft pc-display-flex pc-gap-8 pc-reset\"><button tabindex=\"0\" type=\"button\" class=\"pencraft pc-reset pencraft icon-container restack-image\"><svg role=\"img\" width=\"20\" height=\"20\" viewBox=\"0 0 20 20\" fill=\"none\" stroke-width=\"1.5\" stroke=\"var(--color-fg-primary)\" stroke-linecap=\"round\" stroke-linejoin=\"round\" xmlns=\"http://www.w3.org/2000/svg\"><g><title></title><path d=\"M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882\"></path></g></svg></button><button tabindex=\"0\" type=\"button\" class=\"pencraft pc-reset pencraft icon-container view-image\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"20\" height=\"20\" viewBox=\"0 0 24 24\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"2\" stroke-linecap=\"round\" stroke-linejoin=\"round\" class=\"lucide lucide-maximize2 lucide-maximize-2\"><polyline points=\"15 3 21 3 21 9\"></polyline><polyline points=\"9 21 3 21 3 15\"></polyline><line x1=\"21\" x2=\"14\" y1=\"3\" y2=\"10\"></line><line x1=\"3\" x2=\"10\" y1=\"21\" y2=\"14\"></line></svg></button></div></div></div></a></figure></div><p>Subscribe to the (infrequent) <a href=\"https://atomproject.substack.com/\">ATOM Project Substack</a> for more updates like this!</p><h2>2. RLHF Book is done &amp; ready for pre-order!</h2><p><a href=\"http://rlhfbook.com/\">http://rlhfbook.com/</a></p><p>The goal of this book was to write the book I wished I had when I was getting started in post-training language models. This project has been on my mind for a long time. I bought the domain rlhfbook.com and started to take it more seriously on May 20th, 2024. Here we are!</p><p>Last week, it was sent to production with the Manning team. This means content edits are done, and it&#8217;ll be sent to print in ~2 months. In the meantime, I&#8217;m spending my time developing the accompanying code and course (more on that below).</p><p>You can preorder on <a href=\"https://amzn.to/4cwCDJQ\">Amazon</a> or <a href=\"https://www.manning.com/books/the-rlhf-book\">Manning</a> (currently cheaper).</p><div class=\"captioned-image-container\"><figure><a class=\"image-link image2 is-viewable-img\" target=\"_blank\" href=\"https://substackcdn.com/image/fetch/$s_!Bv0Q!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2d8ba64-922d-4000-9d57-12cb5524a238_1200x675.jpeg\" data-component-name=\"Image2ToDOM\"><div class=\"image2-inset\"><picture><source type=\"image/webp\" srcset=\"https://substackcdn.com/image/fetch/$s_!Bv0Q!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2d8ba64-922d-4000-9d57-12cb5524a238_1200x675.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Bv0Q!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2d8ba64-922d-4000-9d57-12cb5524a238_1200x675.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Bv0Q!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2d8ba64-922d-4000-9d57-12cb5524a238_1200x675.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Bv0Q!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2d8ba64-922d-4000-9d57-12cb5524a238_1200x675.jpeg 1456w\" sizes=\"100vw\"><img src=\"https://substackcdn.com/image/fetch/$s_!Bv0Q!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2d8ba64-922d-4000-9d57-12cb5524a238_1200x675.jpeg\" width=\"1200\" height=\"675\" data-attrs=\"{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e2d8ba64-922d-4000-9d57-12cb5524a238_1200x675.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:675,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Image&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}\" class=\"sizing-normal\" alt=\"Image\" title=\"Image\" srcset=\"https://substackcdn.com/image/fetch/$s_!Bv0Q!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2d8ba64-922d-4000-9d57-12cb5524a238_1200x675.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Bv0Q!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2d8ba64-922d-4000-9d57-12cb5524a238_1200x675.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Bv0Q!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2d8ba64-922d-4000-9d57-12cb5524a238_1200x675.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Bv0Q!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2d8ba64-922d-4000-9d57-12cb5524a238_1200x675.jpeg 1456w\" sizes=\"100vw\" loading=\"lazy\"></picture><div class=\"image-link-expand\"><div class=\"pencraft pc-display-flex pc-gap-8 pc-reset\"><button tabindex=\"0\" type=\"button\" class=\"pencraft pc-reset pencraft icon-container restack-image\"><svg role=\"img\" width=\"20\" height=\"20\" viewBox=\"0 0 20 20\" fill=\"none\" stroke-width=\"1.5\" stroke=\"var(--color-fg-primary)\" stroke-linecap=\"round\" stroke-linejoin=\"round\" xmlns=\"http://www.w3.org/2000/svg\"><g><title></title><path d=\"M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882\"></path></g></svg></button><button tabindex=\"0\" type=\"button\" class=\"pencraft pc-reset pencraft icon-container view-image\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"20\" height=\"20\" viewBox=\"0 0 24 24\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"2\" stroke-linecap=\"round\" stroke-linejoin=\"round\" class=\"lucide lucide-maximize2 lucide-maximize-2\"><polyline points=\"15 3 21 3 21 9\"></polyline><polyline points=\"9 21 3 21 3 15\"></polyline><line x1=\"21\" x2=\"14\" y1=\"3\" y2=\"10\"></line><line x1=\"3\" x2=\"10\" y1=\"21\" y2=\"14\"></line></svg></button></div></div></div></a></figure></div><h2>3. A post-training course I&#8217;m making</h2><p><a href=\"https://rlhfbook.com/course\">https://rlhfbook.com/course</a></p><p>The goal of my book is for it to be the central resource for people looking to transition from beginner to expert in post-training. It&#8217;s not necessarily an entry-level book, but as AI models become stronger, it needs to be a <em>community</em>-building effort as well. The first step I&#8217;ve made to expand the scope from just a book to a complete learning experience is building a lecture series. The lectures will be freely available on YouTube and incorporate community questions &amp; answers (as standalone videos in between lectures).</p><p>You can watch the first batch of videos below, and subscribe on YouTube for future ones. I&#8217;m going to build on the book platform more this summer, as I develop the book <a href=\"https://rlhfbook.com/code\">codebases</a> and host in-person events.</p><ul><li><p><a href=\"https://www.youtube.com/watch?v=jQPiH-KB4B0&amp;list=PLL1tdVxB1CpVpEtMHxwuR4uI4Lxjw00_y&amp;index=3\">Welcome video &amp; YouTube playlist</a></p></li><li><p><a href=\"https://youtu.be/o6l6tJQgUg4\">RLHF and Post-training Overview | RLHF Book Course, Lecture 1</a></p></li><li><p><a href=\"https://youtu.be/4gIwiSPmQkU\">RLHF Foundations, IFT, Reward Modeling, Rejection Sampling | RLHF Course Lecture 2</a></p></li><li><p><a href=\"https://youtu.be/K_Sj_-1BUMM\">Understanding Policy Gradient Algorithms for RL on LLMs | RLHF Course Lecture 3</a></p></li><li><p><a href=\"https://youtu.be/i-AIMpZHgeg\">Implementing RL Algorithms for LLMs | RLHF Course Lecture 4</a></p></li></ul><div class=\"captioned-image-container\"><figure><a class=\"image-link image2 is-viewable-img\" target=\"_blank\" href=\"https://substackcdn.com/image/fetch/$s_!VS0r!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb238c68c-d7f4-4b2b-97fa-9a3cf773e72b_1280x720.png\" data-component-name=\"Image2ToDOM\"><div class=\"image2-inset\"><picture><source type=\"image/webp\" srcset=\"https://substackcdn.com/image/fetch/$s_!VS0r!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb238c68c-d7f4-4b2b-97fa-9a3cf773e72b_1280x720.png 424w, https://substackcdn.com/image/fetch/$s_!VS0r!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb238c68c-d7f4-4b2b-97fa-9a3cf773e72b_1280x720.png 848w, https://substackcdn.com/image/fetch/$s_!VS0r!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb238c68c-d7f4-4b2b-97fa-9a3cf773e72b_1280x720.png 1272w, https://substackcdn.com/image/fetch/$s_!VS0r!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb238c68c-d7f4-4b2b-97fa-9a3cf773e72b_1280x720.png 1456w\" sizes=\"100vw\"><img src=\"https://substackcdn.com/image/fetch/$s_!VS0r!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb238c68c-d7f4-4b2b-97fa-9a3cf773e72b_1280x720.png\" width=\"1280\" height=\"720\" data-attrs=\"{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b238c68c-d7f4-4b2b-97fa-9a3cf773e72b_1280x720.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:720,&quot;width&quot;:1280,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1243661,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.interconnects.ai/i/194224428?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb238c68c-d7f4-4b2b-97fa-9a3cf773e72b_1280x720.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}\" class=\"sizing-normal\" alt=\"\" srcset=\"https://substackcdn.com/image/fetch/$s_!VS0r!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb238c68c-d7f4-4b2b-97fa-9a3cf773e72b_1280x720.png 424w, https://substackcdn.com/image/fetch/$s_!VS0r!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb238c68c-d7f4-4b2b-97fa-9a3cf773e72b_1280x720.png 848w, https://substackcdn.com/image/fetch/$s_!VS0r!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb238c68c-d7f4-4b2b-97fa-9a3cf773e72b_1280x720.png 1272w, https://substackcdn.com/image/fetch/$s_!VS0r!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb238c68c-d7f4-4b2b-97fa-9a3cf773e72b_1280x720.png 1456w\" sizes=\"100vw\" loading=\"lazy\"></picture><div class=\"image-link-expand\"><div class=\"pencraft pc-display-flex pc-gap-8 pc-reset\"><button tabindex=\"0\" type=\"button\" class=\"pencraft pc-reset pencraft icon-container restack-image\"><svg role=\"img\" width=\"20\" height=\"20\" viewBox=\"0 0 20 20\" fill=\"none\" stroke-width=\"1.5\" stroke=\"var(--color-fg-primary)\" stroke-linecap=\"round\" stroke-linejoin=\"round\" xmlns=\"http://www.w3.org/2000/svg\"><g><title></title><path d=\"M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882\"></path></g></svg></button><button tabindex=\"0\" type=\"button\" class=\"pencraft pc-reset pencraft icon-container view-image\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"20\" height=\"20\" viewBox=\"0 0 24 24\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"2\" stroke-linecap=\"round\" stroke-linejoin=\"round\" class=\"lucide lucide-maximize2 lucide-maximize-2\"><polyline points=\"15 3 21 3 21 9\"></polyline><polyline points=\"9 21 3 21 3 15\"></polyline><line x1=\"21\" x2=\"14\" y1=\"3\" y2=\"10\"></line><line x1=\"3\" x2=\"10\" y1=\"21\" y2=\"14\"></line></svg></button></div></div></div></a></figure></div><h2>4. Recent technical research</h2><p>Long-time followers of Interconnects know that this blog has its roots in explaining fundamental research in the field. This has immense value in two ways. First, as AI moves incredibly fast, far more people need to be able to parse research to make the right bets on the technology. Research is the only early warning of some big changes coming. Second, it helps uplift the careers of my collaborators &#8211; the people I spend my life with! On that note, check out two papers I had the privilege of being part of below.</p><p><a href=\"https://arxiv.org/abs/2603.16759\">https://arxiv.org/abs/2603.16759</a> -<em> TurnWise: The Gap between Single- and Multi-turn Language Model Capabilities</em>,<em> </em>Graf et al. 2026</p><p>This work explores the strengths of various models in multi-turn dialogue settings, how to create training data to improve it, and other quirks in post-training. My interests here have fully shifted to agents, where I see multi-turn interactions as a very important user interface problem &#8212; what information do I show to the user to solve the task as soon as possible without cutting corners?</p><p><a href=\"https://arxiv.org/abs/2603.11327\">https://arxiv.org/abs/2603.11327</a> - <em>Meta-Reinforcement Learning with Self-Reflection for Agentic Search</em>, Xiao et al. 2026</p><p>This paper frames solving hard problems with RLVR as a meta-learning problem, where context from previous attempts should be used to inform future rollouts. It&#8217;s a very obvious idea in some ways, where most of RL for LLMs is still very on-policy, but naive. The models learn from recent trials in parameters, but not in context. This research feeds into a ton of other recent work on ways that RL can be formulated to solve different forms of continual learning. Another great related paper is <em><a href=\"https://arxiv.org/abs/2601.16175\">Learning to Discover at Test Time</a>.</em></p><div><hr></div><p class=\"button-wrapper\" data-attrs=\"{&quot;url&quot;:&quot;https://www.interconnects.ai/p/what-ive-been-building-atom-report/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}\" data-component-name=\"ButtonCreateButton\"><a class=\"button primary\" href=\"https://www.interconnects.ai/p/what-ive-been-building-atom-report/comments\"><span>Leave a comment</span></a></p><p>I&#8217;m off to China (and then hopefully DC) in the next couple of months to learn even more about how the world sees progress in AI. I&#8217;m excited to talk to a broader range of people than I tend to in my focused technical job. Thanks for reading, as always!</p>"
          },
          {
            "guid": {
              "#text": "https://www.interconnects.ai/p/the-inevitable-need-for-an-open-model",
              "@_isPermaLink": "false"
            },
            "link": "https://www.interconnects.ai/p/the-inevitable-need-for-an-open-model",
            "title": "The inevitable need for an open model consortium",
            "pubDate": "Sat, 11 Apr 2026 13:02:06 GMT",
            "enclosure": {
              "@_url": "https://substack-post-media.s3.amazonaws.com/public/images/18174b65-ddde-40ad-a82b-55467fecbc10_3182x1790.png",
              "@_type": "image/jpeg",
              "@_length": "0"
            },
            "dc:creator": "Nathan Lambert",
            "description": "And yes, I hate consortia too.",
            "content:encoded": "<p>Recently, I was talking with <a href=\"https://cs.stanford.edu/~pliang/\">Percy Liang</a>, Stanford professor and lead of the <a href=\"https://marin.community/\">Marin</a> project (another fully-open model lab), and it set in on me that there will eventually be a consortium of companies funding a foundational set of open models used across industry. It&#8217;s not clear when this&#8217;ll emerge, and Nemotron (<a href=\"https://nvidianews.nvidia.com/news/nvidia-launches-nemotron-coalition-of-leading-global-ai-labs-to-advance-open-frontier-models\">Coalition</a>) is Nvidia&#8217;s attempt to bankroll and bootstrap this approach within a single wealthy company, but a consortium is the only long-term stable path to well-funded, near-frontier open models.</p><p>In recent months, we&#8217;ve seen a lot of turnover in <a href=\"https://www.reuters.com/world/asia-pacific/head-alibabas-qwen-ai-division-resigns-2026-03-04/\">open</a> <a href=\"https://www.geekwire.com/2026/allen-institute-for-ai-ceo-ali-farhadi-steps-down-as-nonprofit-navigates-shifting-ai-landscape/\">model</a> labs, with high-profile departures at Qwen and Ai2 (<a href=\"https://x.com/natolambert/status/2037911242820796883\">my comment</a>). This shouldn&#8217;t be super surprising to followers of the ecosystem &#8212; it&#8217;s happened before with Meta <a href=\"https://www.meta.com/superintelligence/?srsltid=AfmBOopu-zIovrbgd9Q-G1StOW3gC8s0mf_iNDqD_2oa3l6qldcNHLXl\">shifting its focus away from Llama</a>, and it&#8217;ll only happen more as the cost of trying to keep pace at the frontier of AI only increases. The other leading labs with models available today include Chinese startups such as Moonshot AI, MiniMax, and Z.ai &#8212; all of which look precarious on their ability to fund continued growth in the cost of training or R&amp;D. Releasing one&#8217;s strongest models openly today is in active tension with the option of spending focus and resources on AI products that can currently generate meaningful revenue (and profits).</p><p class=\"button-wrapper\" data-attrs=\"{&quot;url&quot;:&quot;https://www.interconnects.ai/p/the-inevitable-need-for-an-open-model?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}\" data-component-name=\"ButtonCreateButton\"><a class=\"button primary\" href=\"https://www.interconnects.ai/p/the-inevitable-need-for-an-open-model?utm_source=substack&utm_medium=email&utm_content=share&action=share\"><span>Share</span></a></p><p>We&#8217;re going to see business models emerge around releasing <em>some</em>, or even many, models openly, but these will largely be smaller models that enable a long-tail of functionality, rather than models at the absolute frontier. This class of companies that&#8217;ll release many, strong fine-tunable models will include the likes of <a href=\"https://www.interconnects.ai/p/arcee-ai-goes-all-in-on-open-models\">Arcee AI</a>, Thinking Machines, OpenAI, Google with Gemma, and more in that class. The cost and relative advantage of keeping the best models closed in a business environment with many opportunities for revenue are too high. To summarize &#8212; there will be an ever increasing number of companies releasing models that are good for creating a lively niche of smaller, custom models, but an ever decreasing number of companies willing to release fully open, near-frontier models. </p><p>This is the core thesis of why I&#8217;m pushing hard for more people to do more research on how these smaller models can complement the best closed agents, the science of finetunability, etc. See my post below &#8212; it&#8217;s about creating a sustainable open model ecosystem, whether or not the frontier of open keeps paced with closed:</p><div class=\"digest-post-embed\" data-attrs=\"{&quot;nodeId&quot;:&quot;4003f1f5-81ca-48ab-aa76-2e99a2cd241c&quot;,&quot;caption&quot;:&quot;2025 was the year where a lot of companies started to take open models seriously as a path to influence in the extremely valuable AI ecosystem &#8212; the adoption of a strategy that was massively accelerated downstream of DeepSeek R1&#8217;s breakout success. Most of this is being done as a mission of hope, principle, or generosity.&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;What comes next with open models&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:10472909,&quot;name&quot;:&quot;Nathan Lambert&quot;,&quot;bio&quot;:&quot;ML researcher making sense of AI research, products, and the uncertain technological future. PhD from Berkeley AI. Experience at Meta, DeepMind, HuggingFace.&quot;,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!RihO!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8fedcdfb-e137-4f6a-9089-a46add6c6242_500x500.jpeg&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:100}],&quot;post_date&quot;:&quot;2026-03-16T13:00:51.417Z&quot;,&quot;cover_image&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/07ccf41a-ab0e-4cb6-b24b-234ec18c39a7_3182x1790.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.interconnects.ai/p/the-next-phase-of-open-models&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:190338833,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:81,&quot;comment_count&quot;:14,&quot;publication_id&quot;:48206,&quot;publication_name&quot;:&quot;Interconnects AI&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!djof!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc52e8097-8f3d-4f7e-808b-2f4ad37f3b52_720x720.png&quot;,&quot;belowTheFold&quot;:false,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}\"></div><p>It&#8217;ll take years for this equilibrium to become more obvious, seen through the lens of more open model families coming and going. This year, it seems likely we&#8217;ll see Nvidia&#8217;s Nemotron reach new heights, Reflection AI challenge some of the Chinese models with a strong, large MoE, maybe Meta releases a new open-weight model, and so on. True pressure to change strategy will only come when the capital environment punishes the less efficient spend on resources (e.g. giving away your competitive advantage, in having an in-house model). This pressure will likely hit Chinese startups training these models first. </p><p>All of Moonshot AI, MiniMax, and Zhipu AI will show signs of financial challenge in the coming years if they retain their strategy, on top of their models falling further behind the best open models in terms of generality. This is inevitable pressure to evolve open models to areas that are profitable and complementary of the frontier of AI.</p><p>Nvidia, which is best positioned to support the open ecosystem in the near term to support its core GPU business, could face many pressures to pull back its open model efforts. It could:</p><ul><li><p>Realize it&#8217;s too competitive to their biggest customers as they succeed too much with Nemotron, </p></li><li><p>Fall to competition on their core business and lose the free cash flow buffer needed to fund this (e.g. it&#8217;s 2031 and OpenAI, Anthropic, Google, and the other frontier labs are worth so much they build their own chips).<a class=\"footnote-anchor\" data-component-name=\"FootnoteAnchorToDOM\" id=\"footnote-anchor-1\" href=\"#footnote-1\" target=\"_self\">1</a> </p></li><li><p>Start succeeding beyond their initial goals and keep the chips for them to build ASI themselves, as a closed-weight model. </p></li></ul><p>The pressures for new funding mechanisms for open models are based on the assumptions of continued, substantive progress on the capabilities of frontier models. Mechanisms such as <a href=\"https://www.interconnects.ai/p/lossy-self-improvement\">self-improvement</a> and scaling all stages of the training pipeline are underway. This progress of capabilities will only increase the potential profit in selling models as and in products, not giving them away. The scale of investment required has already begun to push away non-profits from the game of making truly frontier-scale models.<a class=\"footnote-anchor\" data-component-name=\"FootnoteAnchorToDOM\" id=\"footnote-anchor-2\" href=\"#footnote-2\" target=\"_self\">2</a> Capitalism is designed to make companies ruthless and chase down leads on profitability, not donate technology as charity.</p><p>As the economic environment shifts companies away from releasing the strongest models openly, more companies that rely on these models will look for an outlet of securing model access into the future. This is going to be compounded by a growing group of companies who come to rely on open-weight models for their workflows. </p><p>These points loop back into how model training is getting more expensive, so where desire to have the models will go up, ability to procure them will go down for many players. There are x-factors that could multiply the demand for institutions to ensure the existence of open models, such as the best frontier models not even being available via API (such as if <a href=\"https://www.interconnects.ai/p/claude-mythos-and-misguided-open\">Claude Mythos</a> never goes general access).</p><p class=\"button-wrapper\" data-attrs=\"{&quot;url&quot;:&quot;https://www.interconnects.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}\" data-component-name=\"ButtonCreateButton\"><a class=\"button primary\" href=\"https://www.interconnects.ai/subscribe?\"><span>Subscribe now</span></a></p><p>As training relevant models is shifting to cost billions of dollars, rather than millions, few companies well be able to afford it. many companies will bite at the cost of paying 1/10th of the cost to train a frontier model, or if the consortium works, 1/50th. The upside for companies will be some mechanism to steer development (e.g. model sizes) or getting early access to develop internal and open-source tooling for the model. </p><p>It is in my nature to, by default, say this idea will fail, as training models is inherently a complex and high-focus endeavor, one that requires integration of every part of the stack and focusing specifically on your own vision and needs, rather than trying to serve every possible user. Eventually the need for open intelligence &#8212; and economic pressure to build it &#8212; will make a model consortium inevitable.</p><div class=\"footnote\" data-component-name=\"FootnoteToDOM\"><a id=\"footnote-1\" href=\"#footnote-anchor-1\" class=\"footnote-number\" contenteditable=\"false\" target=\"_self\">1</a><div class=\"footnote-content\"><p>There&#8217;s a meaningful chance in my estimates that Anthropic, OpenAI, and Google are the most valuable companies in the world in the 2030s by owning frontier intelligence.</p></div></div><div class=\"footnote\" data-component-name=\"FootnoteToDOM\"><a id=\"footnote-2\" href=\"#footnote-anchor-2\" class=\"footnote-number\" contenteditable=\"false\" target=\"_self\">2</a><div class=\"footnote-content\"><p>Truly open is a prospect for safety research and long-term innovation, which suits both the narratives of AI risk and AI optimism. We need it for both. Mech interp is one of the heaviest users of Olmo models. <s>If we don&#8217;t find what&#8217;s after the transformer, there may not be enough benefit to AI models. </s> (edit, I had published that as a half baked thought, it&#8217;s about how fully-open models operate in the ecosystem differently) All of these are largely orthogonal to the point of the post.</p><p></p></div></div>"
          },
          {
            "guid": {
              "#text": "https://www.interconnects.ai/p/claude-mythos-and-misguided-open",
              "@_isPermaLink": "false"
            },
            "link": "https://www.interconnects.ai/p/claude-mythos-and-misguided-open",
            "title": "Claude Mythos and misguided open-weight fearmongering",
            "pubDate": "Thu, 09 Apr 2026 21:28:39 GMT",
            "enclosure": {
              "@_url": "https://substack-post-media.s3.amazonaws.com/public/images/18fbf26c-4d1b-42a4-94e3-369522619514_3182x1790.png",
              "@_type": "image/jpeg",
              "@_length": "0"
            },
            "dc:creator": "Nathan Lambert",
            "description": "Another dance around fears of open-source.",
            "content:encoded": "<p>With the announcement of the Claude Mythos model this week and the admittedly very strong stated abilities, especially in cybersecurity, a <a href=\"https://x.com/tenobrus/status/2041636593040236745\">new</a> <a href=\"https://x.com/stalkermustang/status/2041689727540023524\">wave</a> of <a href=\"https://x.com/mckaywrigley/status/2041651309758275609\">anti</a> open-weight AI model narratives surged. The TL;DR of the argument is that our digital infrastructure will not be ready in time for an open-weight version of this model, which will allow attacks to be conducted by numerous parties.</p><p>The backlash against open models in the wake of the Mythos news conflates too many general unknowns into a simple, broad policy recommendation that could actually further weaken cybersecurity readiness.</p><p>We&#8217;ve been here before &#8211; open-weight models were discussed as being extremely dangerous when OpenAI withheld GPT-2 weights in 2019, and when OpenAI released GPT-4 in 2023. Both of these waves came and went. The core mistake that is being made is the composition of two issues: 1) the acceptance of the open-closed model gap being static in time and 2) linking open-weight viability generally to specific issues.</p><p>I&#8217;ve written at length recently on how I think that the best, frontier-level open weight models <a href=\"https://www.interconnects.ai/p/open-models-in-perpetual-catch-up\">are going to fall behind the best closed models in </a><em><a href=\"https://www.interconnects.ai/p/open-models-in-perpetual-catch-up\">overall</a></em><a href=\"https://www.interconnects.ai/p/open-models-in-perpetual-catch-up\"> capabilities</a> in the near future. I&#8217;ve also written about <a href=\"https://www.interconnects.ai/p/the-next-phase-of-open-models\">how the open-weight ecosystem needs to adapt</a> to accept this reality. This is one of the times for the AI industry where I will repeat that it&#8217;s a total blessing to have the 6-18 month delay from when a certain capability is available within a closed lab to it being reproduced in the open. It&#8217;s a good balance of safety and monitoring the frontier of AI systems while allowing a useful open-source ecosystem to exist and thrive.</p><p>The core argument I&#8217;ve focused on in the open-closed model time gap has been in <em>general</em> capabilities &#8211; i.e. for general purpose, frontier models such as Claude Opus 4.X or GPT Thinking 5.X. The abilities of these closed models to robustly solve and work in diverse situations as agents remains out of scope of the best open-weight models. What the open-weight models have tended to be better at is quickly keeping pace on key benchmarks (which admittedly is helped to some extent, but <a href=\"https://www.interconnects.ai/p/how-much-does-distillation-really\">not necessarily substantially by distillation</a>). This discussion is entirely different, it has to do with if open weight models can keep pace on the specific skills related to cybersecurity, and when we could expect an open version of this model to be available to the world.</p><p>The case of a Claude Mythos level open weight model is admittedly more nuanced to me than the previous few anti-open weight narratives the community has experienced. Where GPT-4 was about a more hypothetical risk, especially in areas like bio-risk, the clear and present reality of cyber infrastructure being prone to attack is far more tangible. Still, much of this nuance in the moment comes down to not knowing the full details of what the system can actually do (i.e. Mythos), and the state of the environment it would act in (i.e. our digital infrastructure).</p><p>To properly assess this risk, we need to know what it takes to build and deploy a Claude Mythos scale model. This entails three pieces: 1) training and releasing the weights, 2) the harness that gives the model effective tools it knows how to use, and 3) the inference compute and software.</p><p class=\"button-wrapper\" data-attrs=\"{&quot;url&quot;:&quot;https://www.interconnects.ai/p/claude-mythos-and-misguided-open?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}\" data-component-name=\"ButtonCreateButton\"><a class=\"button primary\" href=\"https://www.interconnects.ai/p/claude-mythos-and-misguided-open?utm_source=substack&utm_medium=email&utm_content=share&action=share\"><span>Share</span></a></p><p>(<em>Below I make some model size &amp; price estimates to show my thinking, these should not be taken as ground truth.)</em></p><p>Current estimates put the size ranges of leading models like Claude Opus 4.6 or GPT 5.4 as being around 3-5T parameters. Currently, the <a href=\"https://huggingface.co/inclusionAI/Ling-2.5-1T\">largest open-source models</a>, which have been coming from Chinese labs, are around 1T parameters. Claude Mythos&#8217;s preview pricing is 5X Opus, which could come from a simple multiplicative increase in active parameters (with the same serving system design), far higher inference-time scaling, more complex harnesses that make inference less efficient, lower utilization expectations, and so on. The simplest guess is that it&#8217;s a mix of all of the above, something like 2X bigger in parameters and much less efficient to serve. That&#8217;s a huge model, likely something similar to <a href=\"https://www.interconnects.ai/p/gpt-45-not-a-frontier-model\">GPT 4.5,</a> but actually post-trained well (GPT 4.5 was ahead of its time, infra-wise).</p><p>With size comes the challenge actually training the model, as bigger models always come with new technical problems that must be solved to unlock the capabilities. For the case of cybersecurity, my guess is that most of the capabilities can be learned by training a model to be superhuman on coding. Unlike some capabilities such as knowledge work, medicine, law, etc., coding can be studied and improved substantially with public data like GitHub. I&#8217;m far more optimistic in open-weight models staying fairly close to the frontier in narrow domains of code execution and processing, but I don&#8217;t understand the full scope of skills needed to be superhuman in cybersecurity understanding. How much expert knowledge and special sauce went into training Claude Mythos? That&#8217;s a substantial source of my error bars on the impact.</p><p>Second, we know nothing about how the model works under the hood. Today, models are complex systems that entail far more than just weights. They require complex tools and infrastructure to run them, of which Claude Code is the one we are most used to. Mythos very likely has its own innovations here.</p><p>My estimate for how many GPUs you&#8217;d need to serve an 8T parameter, modern MoE is something like O(100) H100 GPUs, which costs something like $10K a day (and this may be very slow in terms of tok/s). Heck, the <a href=\"https://www.nvidia.com/en-us/data-center/gb200-nvl72/\">official marketing copy</a> of the Nvidia GB200 VL72 system is &#8220;Unlocking Real-Time Trillion-Parameter Models&#8221; on the rack. Does Mythos fit on one rack? The point isn&#8217;t to rely on my specific estimate as a policy reference, but to repeat that running leading AI systems is very expensive and not something you can just do on a laptop or self-service cloud portals.</p><p>There are far fewer actors who can get their hands on these resources, relative to those who can download the model. Of course, there are still many, but it&#8217;s important to flesh out all the details of what it would take to proliferate the capabilities of a Mythos-like model. In summary, tools like Mythos will make the best attackers have more powerful tools of the trade, but it won&#8217;t be handing a nuke to every teenager connected to the internet.</p><div class=\"subscription-widget-wrap-editor\" data-attrs=\"{&quot;url&quot;:&quot;https://www.interconnects.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}\" data-component-name=\"SubscribeWidgetToDOM\"><div class=\"subscription-widget show-subscribe\"><div class=\"preamble\"><p class=\"cta-caption\">Interconnects AI is a reader-supported publication. Consider becoming a subscriber.</p></div><form class=\"subscription-widget-subscribe\"><input type=\"email\" class=\"email-input\" name=\"email\" placeholder=\"Type your email&#8230;\" tabindex=\"-1\"><input type=\"submit\" class=\"button primary\" value=\"Subscribe\"><div class=\"fake-input-wrapper\"><div class=\"fake-input\"></div><div class=\"fake-button\"></div></div></form></div></div><p>Personally, I do acknowledge there&#8217;s a chance that cybersecurity abuse is a red line that makes releasing open-weight text models above a certain capability threshold morally grey. Many people thought this red line would come far earlier, somewhere in between GPT-2 and GPT-4, through the harm axis of mis/disinformation, but that had different bottlenecks. For image generation models, we&#8217;re well past the first red line which is enabling non-consensual AI deepfakes with readily available open-weight models. We&#8217;re balancing the reality of these fears having come and gone before with a technology that&#8217;s becoming increasingly capable.</p><p>So, my second large source of error bars is &#8220;how bad is it actually&#8221; with respect to the state of cybersecurity. How much can humans clean up in the most important software with months of private access to a model like Claude Mythos? What will never get fixed?</p><p>For example, if we get open-weight models that are close to the capabilities of Claude Mythos, could those be fine-tuned by organizations to <em>harden</em> the security of their tools?</p><p>Currently, it&#8217;s too soon to call it as a general reason to stop progress in open models. When Claude Mythos is closed to so few partners, in some ways having strong open models <em>close</em> to the threshold makes assessing the danger easier. Having to rely fully on a single private company to determine the security of essential, international infrastructure is not a tenable equilibrium.</p><p>So, in conclusion, I urge people to further study three things:</p><ol><li><p>How do we measure cybersecurity related capabilities across open and closed models. With this, are open models truly keeping up at a 6-9month lag, or are they only maintaining performance relevance in other areas of coding?</p></li><li><p>How do we independently measure the true impact of Claude Mythos and Project Glasswing on existing cybersecurity concerns?</p></li><li><p>If it is the case that the models are keeping up and the defensive capabilities of Claude Mythos are weak, how do we better monitor (and if needed, try to regulate) the targeted capabilities of open-weight models in narrow domains?</p></li></ol><p>The goal is to encourage fears about open models remaining very specific. Any general ban on open models in a nation will immediately and likely irrevocably remove that entity&#8217;s ability to influence a crucial, and amorphous technology. If we stop building the best open models in the U.S., then another country will do this and become the center of the technology. There&#8217;s no way to fully kill open models, only influencing, understanding, and steering.</p>"
          },
          {
            "guid": {
              "#text": "https://www.interconnects.ai/p/gemma-4-and-what-makes-an-open-model",
              "@_isPermaLink": "false"
            },
            "link": "https://www.interconnects.ai/p/gemma-4-and-what-makes-an-open-model",
            "title": "Gemma 4 and what makes an open model succeed",
            "pubDate": "Fri, 03 Apr 2026 16:57:36 GMT",
            "enclosure": {
              "@_url": "https://substack-post-media.s3.amazonaws.com/public/images/c03578d7-2c0a-47cd-988e-c0e29008cc06_3182x1790.png",
              "@_type": "image/jpeg",
              "@_length": "0"
            },
            "dc:creator": "Nathan Lambert",
            "description": "Hint: it's not benchmark scores.",
            "content:encoded": "<p>Having written a lot of model release blog posts, there&#8217;s something much harder about reviewing open models when they drop relative to closed models, especially in 2026. In recent years, there were so few open models, so when <a href=\"https://www.interconnects.ai/p/llama-3-and-scaling-open-llms\">Llama 3</a> was released most people were still doing research on Llama 2 and super happy to get an update. When <a href=\"https://www.interconnects.ai/p/qwen-3-the-new-open-standard\">Qwen 3</a> was released, the <a href=\"https://www.interconnects.ai/p/llama-4\">Llama 4 fiasco</a> had just gone down, and a whole research community was <a href=\"https://www.interconnects.ai/p/rl-backlog-openais-many-rls-clarifying\">emerging to study RL on Qwen 2.5</a> &#8212; it was a no brainer to upgrade. </p><p>Today, when an open model releases, it&#8217;s competing with Qwen 3.5, Kimi K2.5, GLM 5, MiniMax M2.5, GPT-OSS, Arcee Large, Nemotron 3, Olmo 3, and others. The space is populated, but still feels full of hidden opportunity. The potential  of open models feels like a dark matter, a potential we know is huge, but few clear recipes and examples for how to unlock it are out there. Agentic AI, OpenClaw, and everything brewing in that space is going to spur mass experimentation in open models to <a href=\"https://www.interconnects.ai/p/the-next-phase-of-open-models\">complement the likes of Claude and Codex</a>, not replace them.</p><p>Especially with open models, the benchmarks at release are an extremely incomplete story. In some ways this is exciting, as new open models have a much higher variance and ability to surprise, but it also points at some structural reasons that make building businesses and great AI experiences around open models harder than the closed alternatives. When a new Claude Opus or GPT drops, spending a few hours with them in my agentic workflows is genuinely a good vibe test. For open models, putting them through this test is a category error.</p><p>Something else to be said about open models in the era of agents is that they get out of the debate of integration, harnesses, and tools and let us see close to the ground on what exactly is the ability of just a model. Of course, we can&#8217;t test some things like search abilities without some tool, but being able to measure exactly the pace of progress of the model alone is a welcome simplification to a systematically opaque AI space.</p><p class=\"button-wrapper\" data-attrs=\"{&quot;url&quot;:&quot;https://www.interconnects.ai/p/gemma-4-and-what-makes-an-open-model?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}\" data-component-name=\"ButtonCreateButton\"><a class=\"button primary\" href=\"https://www.interconnects.ai/p/gemma-4-and-what-makes-an-open-model?utm_source=substack&utm_medium=email&utm_content=share&action=share\"><span>Share</span></a></p><p>The list of factors I&#8217;d use to assess a new open-weight model I&#8217;m considering investing in includes:</p><ol><li><p><strong>Model performance</strong> (and size) &#8212; how this model performs on benchmarks I care about and how it compares to other models of a similar size.</p></li><li><p><strong>Country of origin</strong> &#8212; some businesses care deeply about provenance, and if a model was built in China or not.</p></li><li><p><strong>Model license</strong> &#8212; if a model needs legal approval for use, uptake will be slower at mid-sized and large companies.</p></li><li><p><strong>Tooling at release</strong> &#8212; many models release with half-broken, or at least substantially slower, implementations in popular software like vLLM, Transformers, SGLANG, etc due to pushing the envelope of architectures or tools.</p></li><li><p><strong>Model fine-tunability</strong> &#8212; how easy or hard it is to modify the given model to your use-case when you actually try and use it.</p></li></ol><p>The core problem is that some of these are immediately available at release, e.g. general performance, license, origin, etc. but others such as tooling take day(s) to week(s) to stabilize, and others are open research questions &#8212; with no group systematically monitoring fine-tunability. </p><p>In the early era of open models, the days of Llama 2 or 3 and Qwen pre v3.5, the architectures were fairly simple and the models tended to work out of the box. Some of this was due to the extremely hard work of the Llama, Qwen, Mistral, etc. developer teams. Some is due to the new models being genuinely harder to work with. When it comes to something like Qwen 3.5 or Nemotron 3, with hybrid models (either gated delta net or mamba layers), the tooling is very rough at release. Things you would expect to &#8220;just work&#8221; often don&#8217;t.</p><p>I&#8217;ve been following this area closely since we released <a href=\"https://www.interconnects.ai/p/olmo-hybrid-and-future-llm-architectures\">Olmo Hybrid</a> with a similar architecture, and Qwen 3.5 is just starting to work well in the various open-source tools that need to all play nice together for RL research. That&#8217;s 1.5 months after the release date! This is just to start really investing more into understanding the behavior of the models. Of course, others started working on these models sooner by investing more engineering resources or relying on partially closed software. The fully open and distributed ecosystem takes a long time to get going on some new models.</p><p>All of this is lead-in for the most important question for open models &#8212; how easy is it to adapt to specific use-cases? This is a different problem for different model sizes. Large MoE open-weight models may be used by entities like Cursor who need complex capabilities in their domain, e.g. <a href=\"https://cursor.com/blog/composer-2\">Composer 2</a> trained on Kimi K2.5. Other applications can be built on much smaller models, such as Chroma&#8217;s <a href=\"https://huggingface.co/chromadb/context-1\">Context-1</a> model for agentic search, built on GPT-OSS 20B. </p><p>The question of &#8220;which models are fine-tunable&#8221; is largely background knowledge known by engineers across the industry. There should be a thriving research area here to support the open ecosystem model. The first step is to understand characteristics of different base and post-trained models to understand what they look like. The second step is to tune pretraining recipes for open models so they&#8217;re more flexible. </p><div class=\"subscription-widget-wrap-editor\" data-attrs=\"{&quot;url&quot;:&quot;https://www.interconnects.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}\" data-component-name=\"SubscribeWidgetToDOM\"><div class=\"subscription-widget show-subscribe\"><div class=\"preamble\"><p class=\"cta-caption\">Interconnects AI is a reader-supported publication. Consider becoming a subscriber.</p></div><form class=\"subscription-widget-subscribe\"><input type=\"email\" class=\"email-input\" name=\"email\" placeholder=\"Type your email&#8230;\" tabindex=\"-1\"><input type=\"submit\" class=\"button primary\" value=\"Subscribe\"><div class=\"fake-input-wrapper\"><div class=\"fake-input\"></div><div class=\"fake-button\"></div></div></form></div></div><p>For <a href=\"https://atomproject.ai/\">The ATOM Project</a> and other Interconnects endeavors, we&#8217;ve put in substantial effort to measuring adoption trends in the open ecosystem. Everything takes a long time to unfold after a model is first publicly available &#8212; and adaptability is why. What we know for sure now, when Qwen has been going from strength to strength with its releases, is that technical staff across the industry has gotten comfortable working with Qwen models. Countless research methods and datasets were made to work with Qwen. It&#8217;ll take patience for any other model family to get to this point &#8212; a patience I&#8217;m not sure many open model builders have.</p><p>This takes us to <strong><a href=\"https://blog.google/innovation-and-ai/technology/developers-tools/gemma-4/\">Gemma 4</a>, Google&#8217;s latest open models</strong>. Gemma 3 was released more than a year ago, in March of 2025, and is a bit underrated. Gemma 4 comes in 4 sizes for now, with a bigger, MoE model of over 100B total parameters rumored but not released yet. The <a href=\"https://huggingface.co/collections/google/gemma-4\">models</a> we have today come in sizes of ~5B dense, 8B dense, 26B total 4B active MoE, and 31B dense. </p><p>I&#8217;m most excited that they&#8217;re finally adopting a standard Apache 2.0 open source license. This&#8217;ll massively boost adoption. The standard of better licenses for strong open-weight LLMs was set by mostly Chinese open model labs in the last 1-2 years, and now U.S. companies are following suit. I will personally be so happy if the horrible <a href=\"https://www.llama.com/llama3/license/\">Llama licenses</a> and <a href=\"https://ai.google.dev/gemma/terms\">Gemma terms of service</a> were an ~18-month transient dynamic of the industry being nervous about releasing strong open models.</p><p>The Gemma 4 scores look very solid, the small models have incredible benchmark scores (especially in general domains like <a href=\"https://x.com/demishassabis/status/2040067244349063326\">LMArena</a>) and the 31B model rivals the recent Qwen 3.5 27B, which is the leading member of that class. The ~30B size range is an important one, as it&#8217;s accessible both to researchers and to enterprises looking to deploy the model in real use-cases. Where the 7B model scale is the default for tinkering and research, a 30B model is the default for seeing if an open model can unlock substantial value in your specific workflow &#8212; a good mix of intelligence, low price, tractability for downstream training, etc.</p><div class=\"captioned-image-container\"><figure><a class=\"image-link image2 is-viewable-img\" target=\"_blank\" href=\"https://substackcdn.com/image/fetch/$s_!TDMh!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F208e9bab-a2f8-4e5b-bed0-2db600993c41_4200x2400.png\" data-component-name=\"Image2ToDOM\"><div class=\"image2-inset\"><picture><source type=\"image/webp\" srcset=\"https://substackcdn.com/image/fetch/$s_!TDMh!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F208e9bab-a2f8-4e5b-bed0-2db600993c41_4200x2400.png 424w, https://substackcdn.com/image/fetch/$s_!TDMh!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F208e9bab-a2f8-4e5b-bed0-2db600993c41_4200x2400.png 848w, https://substackcdn.com/image/fetch/$s_!TDMh!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F208e9bab-a2f8-4e5b-bed0-2db600993c41_4200x2400.png 1272w, https://substackcdn.com/image/fetch/$s_!TDMh!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F208e9bab-a2f8-4e5b-bed0-2db600993c41_4200x2400.png 1456w\" sizes=\"100vw\"><img src=\"https://substackcdn.com/image/fetch/$s_!TDMh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F208e9bab-a2f8-4e5b-bed0-2db600993c41_4200x2400.png\" width=\"1456\" height=\"832\" data-attrs=\"{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/208e9bab-a2f8-4e5b-bed0-2db600993c41_4200x2400.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:832,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}\" class=\"sizing-normal\" alt=\"\" srcset=\"https://substackcdn.com/image/fetch/$s_!TDMh!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F208e9bab-a2f8-4e5b-bed0-2db600993c41_4200x2400.png 424w, https://substackcdn.com/image/fetch/$s_!TDMh!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F208e9bab-a2f8-4e5b-bed0-2db600993c41_4200x2400.png 848w, https://substackcdn.com/image/fetch/$s_!TDMh!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F208e9bab-a2f8-4e5b-bed0-2db600993c41_4200x2400.png 1272w, https://substackcdn.com/image/fetch/$s_!TDMh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F208e9bab-a2f8-4e5b-bed0-2db600993c41_4200x2400.png 1456w\" sizes=\"100vw\" loading=\"lazy\"></picture><div class=\"image-link-expand\"><div class=\"pencraft pc-display-flex pc-gap-8 pc-reset\"><button tabindex=\"0\" type=\"button\" class=\"pencraft pc-reset pencraft icon-container restack-image\"><svg role=\"img\" width=\"20\" height=\"20\" viewBox=\"0 0 20 20\" fill=\"none\" stroke-width=\"1.5\" stroke=\"var(--color-fg-primary)\" stroke-linecap=\"round\" stroke-linejoin=\"round\" xmlns=\"http://www.w3.org/2000/svg\"><g><title></title><path d=\"M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882\"></path></g></svg></button><button tabindex=\"0\" type=\"button\" class=\"pencraft pc-reset pencraft icon-container view-image\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"20\" height=\"20\" viewBox=\"0 0 24 24\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"2\" stroke-linecap=\"round\" stroke-linejoin=\"round\" class=\"lucide lucide-maximize2 lucide-maximize-2\"><polyline points=\"15 3 21 3 21 9\"></polyline><polyline points=\"9 21 3 21 3 15\"></polyline><line x1=\"21\" x2=\"14\" y1=\"3\" y2=\"10\"></line><line x1=\"3\" x2=\"10\" y1=\"21\" y2=\"14\"></line></svg></button></div></div></div></a><figcaption class=\"image-caption\">Source: Sebastian Raschka, <a href=\"https://magazine.sebastianraschka.com/i/168650848/23-gemma-4\">Ahead of AI</a></figcaption></figure></div><p>This takes us back to the above adoption criteria I mentioned for open models and the bigger question &#8212; do I think Gemma 4 will be an overwhelming success? Previous Gemma models have been <a href=\"https://chatgpt.com/share/69cfe648-bc88-83e8-a0b3-d23091d66ae8\">plagued</a> by tooling issues and poorer performance when being finetuned. </p><p>Gemma 4&#8217;s success is going to be entirely determined by ease of use, to a point where a 5-10% swing on benchmarks wouldn&#8217;t matter at all. It&#8217;s strong enough, small enough, with the right license, and from the U.S., so many companies are going to slot it in.</p><p>I&#8217;m cautiously optimistic that Gemma 4 is going to work better here. Winds are shifting for open models built in America. We saw GPT-OSS go through a bumpy launch to become an overwhelming success. There&#8217;s a collective energy around the likes of Reflection, Arcee, Nemotron, Gemma, Olmo, and peers that show substantial demand for building new stacks around open models. There&#8217;s capital to be spent on AI stacks across the economy by those who want more ownership of everything, including the model. </p><p>After launching The ATOM Project 240 days ago, the conversation is shifting into the next stage. Summer of 2025 was a crisis moment where the U.S. AI scene realized it can&#8217;t wait and figure out open models after building AGI. The two markets will capture different areas and proceed in parallel. Now that more companies in the U.S. are releasing strong models, we need to improve the ecosystem so that these models are easy to use, understand, and build value around. It&#8217;s the hard work to build another inflection point in these adoption plots I&#8217;ve been updating consistently, but that&#8217;s the work to be done. Join me in it.  </p><p><em>More data coming soon! Here&#8217;s a sneak peek:</em></p><div class=\"captioned-image-container\"><figure><a class=\"image-link image2 is-viewable-img\" target=\"_blank\" href=\"https://substackcdn.com/image/fetch/$s_!-scL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c147024-01bb-46d2-b54f-d5a88edd64fe_1716x766.png\" data-component-name=\"Image2ToDOM\"><div class=\"image2-inset\"><picture><source type=\"image/webp\" srcset=\"https://substackcdn.com/image/fetch/$s_!-scL!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c147024-01bb-46d2-b54f-d5a88edd64fe_1716x766.png 424w, https://substackcdn.com/image/fetch/$s_!-scL!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c147024-01bb-46d2-b54f-d5a88edd64fe_1716x766.png 848w, https://substackcdn.com/image/fetch/$s_!-scL!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c147024-01bb-46d2-b54f-d5a88edd64fe_1716x766.png 1272w, https://substackcdn.com/image/fetch/$s_!-scL!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c147024-01bb-46d2-b54f-d5a88edd64fe_1716x766.png 1456w\" sizes=\"100vw\"><img src=\"https://substackcdn.com/image/fetch/$s_!-scL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c147024-01bb-46d2-b54f-d5a88edd64fe_1716x766.png\" width=\"1716\" height=\"766\" data-attrs=\"{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2c147024-01bb-46d2-b54f-d5a88edd64fe_1716x766.png&quot;,&quot;srcNoWatermark&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a12427df-5ca6-4365-8c25-20f6b996d4c7_1716x1326.png&quot;,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:766,&quot;width&quot;:1716,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:121631,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.interconnects.ai/i/193022426?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa12427df-5ca6-4365-8c25-20f6b996d4c7_1716x1326.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}\" class=\"sizing-normal\" alt=\"\" srcset=\"https://substackcdn.com/image/fetch/$s_!-scL!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c147024-01bb-46d2-b54f-d5a88edd64fe_1716x766.png 424w, https://substackcdn.com/image/fetch/$s_!-scL!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c147024-01bb-46d2-b54f-d5a88edd64fe_1716x766.png 848w, https://substackcdn.com/image/fetch/$s_!-scL!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c147024-01bb-46d2-b54f-d5a88edd64fe_1716x766.png 1272w, https://substackcdn.com/image/fetch/$s_!-scL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c147024-01bb-46d2-b54f-d5a88edd64fe_1716x766.png 1456w\" sizes=\"100vw\" loading=\"lazy\"></picture><div class=\"image-link-expand\"><div class=\"pencraft pc-display-flex pc-gap-8 pc-reset\"><button tabindex=\"0\" type=\"button\" class=\"pencraft pc-reset pencraft icon-container restack-image\"><svg role=\"img\" width=\"20\" height=\"20\" viewBox=\"0 0 20 20\" fill=\"none\" stroke-width=\"1.5\" stroke=\"var(--color-fg-primary)\" stroke-linecap=\"round\" stroke-linejoin=\"round\" xmlns=\"http://www.w3.org/2000/svg\"><g><title></title><path d=\"M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882\"></path></g></svg></button><button tabindex=\"0\" type=\"button\" class=\"pencraft pc-reset pencraft icon-container view-image\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"20\" height=\"20\" viewBox=\"0 0 24 24\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"2\" stroke-linecap=\"round\" stroke-linejoin=\"round\" class=\"lucide lucide-maximize2 lucide-maximize-2\"><polyline points=\"15 3 21 3 21 9\"></polyline><polyline points=\"9 21 3 21 3 15\"></polyline><line x1=\"21\" x2=\"14\" y1=\"3\" y2=\"10\"></line><line x1=\"3\" x2=\"10\" y1=\"21\" y2=\"14\"></line></svg></button></div></div></div></a></figure></div>"
          },
          {
            "guid": {
              "#text": "https://www.interconnects.ai/p/latest-open-artifacts-20-new-orgs",
              "@_isPermaLink": "false"
            },
            "link": "https://www.interconnects.ai/p/latest-open-artifacts-20-new-orgs",
            "title": "Latest open artifacts (#20): New orgs! New types of models! With Nemotron Super, Sarvam, Cohere Transcribe, & others",
            "pubDate": "Mon, 30 Mar 2026 13:02:45 GMT",
            "enclosure": {
              "@_url": "https://substackcdn.com/image/fetch/$s_!uD-D!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80b1a559-a0f5-4757-962c-1d9a21c17835_1024x576.png",
              "@_type": "image/jpeg",
              "@_length": "0"
            },
            "dc:creator": "Florian Brand",
            "description": "New orgs! New types of models! With Nemotron Super, Sarvam, Cohere Transcribe, & others",
            "content:encoded": "<p>This Artifacts Log post is unusual in how many diverse, quirky models there are across use-cases and modalities. Normally these model roundups are dominated by big models from the likes of Qwen, DeepSeek, Kimi, etc. There are models for all sorts of different use-cases in this post, from optical character recognition (OCR), RAG search, audio transcription, computer-use, code-editing, math theorem proving, and more. The artifacts covered this month also come from a much broader list of open model builders.</p><p>This gives us a lot of hope for the future of open models, where we see <a href=\"https://www.interconnects.ai/i/190338833/the-balance-of-power-in-open-vs-closed-models\">the need for domain-specific, cheap models</a> as being crucial tools to complement the strongest, closed agents. When the top few models get the headlines, this vast, industry-scale tinkering can easily be forgotten. Reading this post gives a technically grounded, broad coverage of the many directions the industry is pushing specific models for. Expect more like this!</p><p class=\"button-wrapper\" data-attrs=\"{&quot;url&quot;:&quot;https://www.interconnects.ai/p/latest-open-artifacts-20-new-orgs?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}\" data-component-name=\"ButtonCreateButton\"><a class=\"button primary\" href=\"https://www.interconnects.ai/p/latest-open-artifacts-20-new-orgs?utm_source=substack&utm_medium=email&utm_content=share&action=share\"><span>Share</span></a></p><p>To encourage people to take a look at the diversity of models in this issue, the core part of the update is not paywalled. An otherwise quiet month at the top end of open models really delivered.</p><h1>Artifacts Log</h1><h3><strong>Our Picks</strong></h3><ul><li><p><strong><a href=\"https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4\">NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4</a></strong> by <a href=\"https://huggingface.co/nvidia\">nvidia</a>: The long-awaited mid-sized model from NVIDIA is finally here: 120B total params with 12B active, a 1M context window, and support for multiple popular languages. Furthermore, the model is based on LatentMoE and uses NVFP4 during pre-training, which is a first for open models. Like other things from NVIDIA, it comes with an in-depth <a href=\"https://research.nvidia.com/labs/nemotron/files/NVIDIA-Nemotron-3-Super-Technical-Report.pdf\">tech report</a> plus <a href=\"https://huggingface.co/collections/nvidia/nemotron-pre-training-datasets\">pre-training</a> and <a href=\"https://huggingface.co/collections/nvidia/nemotron-post-training-v3\">post-training</a> datasets, with the vast majority of the data being openly released.</p><div class=\"captioned-image-container\"><figure><a class=\"image-link image2 is-viewable-img\" target=\"_blank\" href=\"https://substackcdn.com/image/fetch/$s_!4nWL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b56bd55-79e6-483a-ab8d-c3b193e89b84_1152x432.png\" data-component-name=\"Image2ToDOM\"><div class=\"image2-inset\"><picture><source type=\"image/webp\" srcset=\"https://substackcdn.com/image/fetch/$s_!4nWL!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b56bd55-79e6-483a-ab8d-c3b193e89b84_1152x432.png 424w, https://substackcdn.com/image/fetch/$s_!4nWL!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b56bd55-79e6-483a-ab8d-c3b193e89b84_1152x432.png 848w, https://substackcdn.com/image/fetch/$s_!4nWL!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b56bd55-79e6-483a-ab8d-c3b193e89b84_1152x432.png 1272w, https://substackcdn.com/image/fetch/$s_!4nWL!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b56bd55-79e6-483a-ab8d-c3b193e89b84_1152x432.png 1456w\" sizes=\"100vw\"><img src=\"https://substackcdn.com/image/fetch/$s_!4nWL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b56bd55-79e6-483a-ab8d-c3b193e89b84_1152x432.png\" width=\"1152\" height=\"432\" data-attrs=\"{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9b56bd55-79e6-483a-ab8d-c3b193e89b84_1152x432.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:432,&quot;width&quot;:1152,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}\" class=\"sizing-normal\" alt=\"\" srcset=\"https://substackcdn.com/image/fetch/$s_!4nWL!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b56bd55-79e6-483a-ab8d-c3b193e89b84_1152x432.png 424w, https://substackcdn.com/image/fetch/$s_!4nWL!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b56bd55-79e6-483a-ab8d-c3b193e89b84_1152x432.png 848w, https://substackcdn.com/image/fetch/$s_!4nWL!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b56bd55-79e6-483a-ab8d-c3b193e89b84_1152x432.png 1272w, https://substackcdn.com/image/fetch/$s_!4nWL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b56bd55-79e6-483a-ab8d-c3b193e89b84_1152x432.png 1456w\" sizes=\"100vw\" fetchpriority=\"high\"></picture><div class=\"image-link-expand\"><div class=\"pencraft pc-display-flex pc-gap-8 pc-reset\"><button tabindex=\"0\" type=\"button\" class=\"pencraft pc-reset pencraft icon-container restack-image\"><svg role=\"img\" width=\"20\" height=\"20\" viewBox=\"0 0 20 20\" fill=\"none\" stroke-width=\"1.5\" stroke=\"var(--color-fg-primary)\" stroke-linecap=\"round\" stroke-linejoin=\"round\" xmlns=\"http://www.w3.org/2000/svg\"><g><title></title><path d=\"M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882\"></path></g></svg></button><button tabindex=\"0\" type=\"button\" class=\"pencraft pc-reset pencraft icon-container view-image\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"20\" height=\"20\" viewBox=\"0 0 24 24\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"2\" stroke-linecap=\"round\" stroke-linejoin=\"round\" class=\"lucide lucide-maximize2 lucide-maximize-2\"><polyline points=\"15 3 21 3 21 9\"></polyline><polyline points=\"9 21 3 21 3 15\"></polyline><line x1=\"21\" x2=\"14\" y1=\"3\" y2=\"10\"></line><line x1=\"3\" x2=\"10\" y1=\"21\" y2=\"14\"></line></svg></button></div></div></div></a></figure></div></li><li><p><strong><a href=\"https://huggingface.co/CohereLabs/cohere-transcribe-03-2026\">cohere-transcribe-03-2026</a></strong> by <a href=\"https://huggingface.co/CohereLabs\">CohereLabs</a>: A speech-to-text model by Cohere based on the <a href=\"https://arxiv.org/abs/2005.08100\">conformer architecture</a>, similar to NVIDIA&#8217;s Parakeet. It features 14 different languages, including some AIPAC languages and Arabic. Performance-wise, Cohere claims it beats similarly sized open and closed models. To top it all off: The model is released under Apache 2.0! Previous open models by Cohere were released under a non-commercial license.</p></li><li><p><strong><a href=\"https://huggingface.co/sarvamai/sarvam-105b\">sarvam-105b</a></strong> by <a href=\"https://huggingface.co/sarvamai\">sarvamai</a>: The Indian startup Sarvam, which trained open models in the past, has scaled up everything for its new flagship models in terms of dataset size (12-16T tokens) and model size (<a href=\"https://huggingface.co/sarvamai/sarvam-30b\">30B-A2B</a>, 105B-10A). As a result, they come close to or even surpass a lot of open models with similar sizes. The release also shows why sovereign AI is so important, something that few other countries have internalized yet: In comparison with SOTA open models, the Sarvam models are vastly more preferred in Indic languages.</p><div class=\"captioned-image-container\"><figure><a class=\"image-link image2 is-viewable-img\" target=\"_blank\" href=\"https://substackcdn.com/image/fetch/$s_!YsFz!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fccf5cd5d-dbf1-4483-be6e-9bdc70554af3_2010x814.png\" data-component-name=\"Image2ToDOM\"><div class=\"image2-inset\"><picture><source type=\"image/webp\" srcset=\"https://substackcdn.com/image/fetch/$s_!YsFz!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fccf5cd5d-dbf1-4483-be6e-9bdc70554af3_2010x814.png 424w, https://substackcdn.com/image/fetch/$s_!YsFz!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fccf5cd5d-dbf1-4483-be6e-9bdc70554af3_2010x814.png 848w, https://substackcdn.com/image/fetch/$s_!YsFz!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fccf5cd5d-dbf1-4483-be6e-9bdc70554af3_2010x814.png 1272w, https://substackcdn.com/image/fetch/$s_!YsFz!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fccf5cd5d-dbf1-4483-be6e-9bdc70554af3_2010x814.png 1456w\" sizes=\"100vw\"><img src=\"https://substackcdn.com/image/fetch/$s_!YsFz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fccf5cd5d-dbf1-4483-be6e-9bdc70554af3_2010x814.png\" width=\"681\" height=\"275.9546703296703\" data-attrs=\"{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ccf5cd5d-dbf1-4483-be6e-9bdc70554af3_2010x814.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:590,&quot;width&quot;:1456,&quot;resizeWidth&quot;:681,&quot;bytes&quot;:102217,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.interconnects.ai/i/192515214?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fccf5cd5d-dbf1-4483-be6e-9bdc70554af3_2010x814.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}\" class=\"sizing-normal\" alt=\"\" srcset=\"https://substackcdn.com/image/fetch/$s_!YsFz!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fccf5cd5d-dbf1-4483-be6e-9bdc70554af3_2010x814.png 424w, https://substackcdn.com/image/fetch/$s_!YsFz!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fccf5cd5d-dbf1-4483-be6e-9bdc70554af3_2010x814.png 848w, https://substackcdn.com/image/fetch/$s_!YsFz!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fccf5cd5d-dbf1-4483-be6e-9bdc70554af3_2010x814.png 1272w, https://substackcdn.com/image/fetch/$s_!YsFz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fccf5cd5d-dbf1-4483-be6e-9bdc70554af3_2010x814.png 1456w\" sizes=\"100vw\" fetchpriority=\"high\"></picture><div class=\"image-link-expand\"><div class=\"pencraft pc-display-flex pc-gap-8 pc-reset\"><button tabindex=\"0\" type=\"button\" class=\"pencraft pc-reset pencraft icon-container restack-image\"><svg role=\"img\" width=\"20\" height=\"20\" viewBox=\"0 0 20 20\" fill=\"none\" stroke-width=\"1.5\" stroke=\"var(--color-fg-primary)\" stroke-linecap=\"round\" stroke-linejoin=\"round\" xmlns=\"http://www.w3.org/2000/svg\"><g><title></title><path d=\"M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882\"></path></g></svg></button><button tabindex=\"0\" type=\"button\" class=\"pencraft pc-reset pencraft icon-container view-image\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"20\" height=\"20\" viewBox=\"0 0 24 24\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"2\" stroke-linecap=\"round\" stroke-linejoin=\"round\" class=\"lucide lucide-maximize2 lucide-maximize-2\"><polyline points=\"15 3 21 3 21 9\"></polyline><polyline points=\"9 21 3 21 3 15\"></polyline><line x1=\"21\" x2=\"14\" y1=\"3\" y2=\"10\"></line><line x1=\"3\" x2=\"10\" y1=\"21\" y2=\"14\"></line></svg></button></div></div></div></a></figure></div></li><li><p><strong><a href=\"https://huggingface.co/mistralai/Mistral-Small-4-119B-2603\">Mistral-Small-4-119B-2603</a></strong> by <a href=\"https://huggingface.co/mistralai\">mistralai</a>: A 119B-A7B model by Mistral, combining their previous model generations into one as a hybrid reasoning model with coding abilities.</p></li><li><p><strong><a href=\"https://huggingface.co/zed-industries/zeta-2\">zeta-2</a></strong> by <a href=\"https://huggingface.co/zed-industries\">zed-industries</a>: The open source code editor Zed has released their edit prediction model openly in the past, which we featured <a href=\"https://www.interconnects.ai/p/artifacts-7\">a year ago</a>. While the previous version was based on open data, the new version, based on Seed-Coder-8B, is trained on open source code by users who explicitly opted into data collection.</p></li></ul><h3><strong>Models</strong></h3><h4>General Purpose</h4><ul><li><p><strong><a href=\"https://huggingface.co/nvidia/gpt-oss-puzzle-88B\">gpt-oss-puzzle-88B</a></strong> by <a href=\"https://huggingface.co/nvidia\">nvidia</a>: A pruned expert version of GPT OSS 120B. It also replaces some global attention layers with window attention. Puzzle is &#8220;a post-training neural architecture search (NAS) framework, with the goal of significantly improving inference efficiency for reasoning-heavy workloads while maintaining or improving accuracy across reasoning budgets.&#8221;</p></li><li><p><strong><a href=\"https://huggingface.co/allenai/Olmo-Hybrid-7B\">Olmo-Hybrid-7B</a></strong> by <a href=\"https://huggingface.co/allenai\">allenai</a>: A hybrid attention + GDN (gated DeltaNet) model. See <a href=\"https://www.interconnects.ai/p/olmo-hybrid-and-future-llm-architectures\">our blog post</a> for more insights about the architecture and its challenges.</p><div class=\"captioned-image-container\"><figure><a class=\"image-link image2 is-viewable-img\" target=\"_blank\" href=\"https://substackcdn.com/image/fetch/$s_!IgMs!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F072051d6-3788-4ab4-9587-c051f282b3b8_2906x2370.png\" data-component-name=\"Image2ToDOM\"><div class=\"image2-inset\"><picture><source type=\"image/webp\" srcset=\"https://substackcdn.com/image/fetch/$s_!IgMs!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F072051d6-3788-4ab4-9587-c051f282b3b8_2906x2370.png 424w, https://substackcdn.com/image/fetch/$s_!IgMs!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F072051d6-3788-4ab4-9587-c051f282b3b8_2906x2370.png 848w, https://substackcdn.com/image/fetch/$s_!IgMs!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F072051d6-3788-4ab4-9587-c051f282b3b8_2906x2370.png 1272w, https://substackcdn.com/image/fetch/$s_!IgMs!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F072051d6-3788-4ab4-9587-c051f282b3b8_2906x2370.png 1456w\" sizes=\"100vw\"><img src=\"https://substackcdn.com/image/fetch/$s_!IgMs!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F072051d6-3788-4ab4-9587-c051f282b3b8_2906x2370.png\" width=\"1456\" height=\"1187\" data-attrs=\"{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/072051d6-3788-4ab4-9587-c051f282b3b8_2906x2370.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1187,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}\" class=\"sizing-normal\" alt=\"\" srcset=\"https://substackcdn.com/image/fetch/$s_!IgMs!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F072051d6-3788-4ab4-9587-c051f282b3b8_2906x2370.png 424w, https://substackcdn.com/image/fetch/$s_!IgMs!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F072051d6-3788-4ab4-9587-c051f282b3b8_2906x2370.png 848w, https://substackcdn.com/image/fetch/$s_!IgMs!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F072051d6-3788-4ab4-9587-c051f282b3b8_2906x2370.png 1272w, https://substackcdn.com/image/fetch/$s_!IgMs!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F072051d6-3788-4ab4-9587-c051f282b3b8_2906x2370.png 1456w\" sizes=\"100vw\" loading=\"lazy\"></picture><div class=\"image-link-expand\"><div class=\"pencraft pc-display-flex pc-gap-8 pc-reset\"><button tabindex=\"0\" type=\"button\" class=\"pencraft pc-reset pencraft icon-container restack-image\"><svg role=\"img\" width=\"20\" height=\"20\" viewBox=\"0 0 20 20\" fill=\"none\" stroke-width=\"1.5\" stroke=\"var(--color-fg-primary)\" stroke-linecap=\"round\" stroke-linejoin=\"round\" xmlns=\"http://www.w3.org/2000/svg\"><g><title></title><path d=\"M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882\"></path></g></svg></button><button tabindex=\"0\" type=\"button\" class=\"pencraft pc-reset pencraft icon-container view-image\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"20\" height=\"20\" viewBox=\"0 0 24 24\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"2\" stroke-linecap=\"round\" stroke-linejoin=\"round\" class=\"lucide lucide-maximize2 lucide-maximize-2\"><polyline points=\"15 3 21 3 21 9\"></polyline><polyline points=\"9 21 3 21 3 15\"></polyline><line x1=\"21\" x2=\"14\" y1=\"3\" y2=\"10\"></line><line x1=\"3\" x2=\"10\" y1=\"21\" y2=\"14\"></line></svg></button></div></div></div></a></figure></div></li></ul><ul><li><p><strong><a href=\"https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Nano-4B-BF16\">NVIDIA-Nemotron-3-Nano-4B-BF16</a></strong> by <a href=\"https://huggingface.co/nvidia\">nvidia</a>: A compressed version of NVIDIA-Nemotron-Nano-9B-v2, which itself is a compressed version of NVIDIA-Nemotron-Nano-12B-v2. Nvidia has been pushing this direction more than anyone else with open models.</p></li></ul><h4>Multimodal</h4><ul><li><p><strong><a href=\"https://huggingface.co/YuanLabAI/Yuan3.0-Ultra\">Yuan3.0-Ultra</a></strong> by <a href=\"https://huggingface.co/YuanLabAI\">YuanLabAI</a>: A 1T multimodal model by the relatively unknown Yuan Lab. They pre-trained a 1.5T model on 2.2T tokens and subsequently pruned experts with a new technique, outlined in the <a href=\"https://github.com/Yuan-lab-LLM/Yuan3.0-Ultra/blob/main/Docs/Yuan3.0_Ultra%20Paper.pdf\">tech report</a>.</p></li><li><p><strong><a href=\"https://huggingface.co/meituan-longcat/LongCat-Next\">LongCat-Next</a></strong> by <a href=\"https://huggingface.co/meituan-longcat\">meituan-longcat</a>: A multimodal model which can process text, vision, and audio as both inputs and outputs.</p><div class=\"captioned-image-container\"><figure><a class=\"image-link image2 is-viewable-img\" target=\"_blank\" href=\"https://substackcdn.com/image/fetch/$s_!cxaI!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdda76903-ad4c-4999-9cb6-c6a9cf2babe8_3437x1929.jpeg\" data-component-name=\"Image2ToDOM\"><div class=\"image2-inset\"><picture><source type=\"image/webp\" srcset=\"https://substackcdn.com/image/fetch/$s_!cxaI!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdda76903-ad4c-4999-9cb6-c6a9cf2babe8_3437x1929.jpeg 424w, https://substackcdn.com/image/fetch/$s_!cxaI!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdda76903-ad4c-4999-9cb6-c6a9cf2babe8_3437x1929.jpeg 848w, https://substackcdn.com/image/fetch/$s_!cxaI!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdda76903-ad4c-4999-9cb6-c6a9cf2babe8_3437x1929.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!cxaI!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdda76903-ad4c-4999-9cb6-c6a9cf2babe8_3437x1929.jpeg 1456w\" sizes=\"100vw\"><img src=\"https://substackcdn.com/image/fetch/$s_!cxaI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdda76903-ad4c-4999-9cb6-c6a9cf2babe8_3437x1929.jpeg\" width=\"677\" height=\"379.88255494505495\" data-attrs=\"{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/dda76903-ad4c-4999-9cb6-c6a9cf2babe8_3437x1929.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:817,&quot;width&quot;:1456,&quot;resizeWidth&quot;:677,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;evaluation&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}\" class=\"sizing-normal\" alt=\"evaluation\" title=\"evaluation\" srcset=\"https://substackcdn.com/image/fetch/$s_!cxaI!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdda76903-ad4c-4999-9cb6-c6a9cf2babe8_3437x1929.jpeg 424w, https://substackcdn.com/image/fetch/$s_!cxaI!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdda76903-ad4c-4999-9cb6-c6a9cf2babe8_3437x1929.jpeg 848w, https://substackcdn.com/image/fetch/$s_!cxaI!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdda76903-ad4c-4999-9cb6-c6a9cf2babe8_3437x1929.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!cxaI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdda76903-ad4c-4999-9cb6-c6a9cf2babe8_3437x1929.jpeg 1456w\" sizes=\"100vw\" loading=\"lazy\"></picture><div class=\"image-link-expand\"><div class=\"pencraft pc-display-flex pc-gap-8 pc-reset\"><button tabindex=\"0\" type=\"button\" class=\"pencraft pc-reset pencraft icon-container restack-image\"><svg role=\"img\" width=\"20\" height=\"20\" viewBox=\"0 0 20 20\" fill=\"none\" stroke-width=\"1.5\" stroke=\"var(--color-fg-primary)\" stroke-linecap=\"round\" stroke-linejoin=\"round\" xmlns=\"http://www.w3.org/2000/svg\"><g><title></title><path d=\"M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882\"></path></g></svg></button><button tabindex=\"0\" type=\"button\" class=\"pencraft pc-reset pencraft icon-container view-image\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"20\" height=\"20\" viewBox=\"0 0 24 24\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"2\" stroke-linecap=\"round\" stroke-linejoin=\"round\" class=\"lucide lucide-maximize2 lucide-maximize-2\"><polyline points=\"15 3 21 3 21 9\"></polyline><polyline points=\"9 21 3 21 3 15\"></polyline><line x1=\"21\" x2=\"14\" y1=\"3\" y2=\"10\"></line><line x1=\"3\" x2=\"10\" y1=\"21\" y2=\"14\"></line></svg></button></div></div></div></a></figure></div></li><li><p><strong><a href=\"https://huggingface.co/ibm-granite/granite-4.0-1b-speech\">granite-4.0-1b-speech</a></strong> by <a href=\"https://huggingface.co/ibm-granite\">ibm-granite</a>: A small speech-to-text model supporting six languages. It also supports the generation of English audio for translation.</p></li><li><p><strong><a href=\"https://huggingface.co/microsoft/Phi-4-reasoning-vision-15B\">Phi-4-reasoning-vision-15B</a></strong> by <a href=\"https://huggingface.co/microsoft\">microsoft</a>: A Phi model which uses the SigLIP-2 vision encoder.</p></li></ul><h4>Special Purpose</h4><ul><li><p><strong><a href=\"https://huggingface.co/miromind-ai/MiroThinker-1.7\">MiroThinker-1.7</a></strong> by <a href=\"https://huggingface.co/miromind-ai\">miromind-ai</a>: A fine-tuned version of Qwen 235B for agentic workflows, especially research.</p><div class=\"captioned-image-container\"><figure><a class=\"image-link image2 is-viewable-img\" target=\"_blank\" href=\"https://substackcdn.com/image/fetch/$s_!6raY!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a1f79c7-bbeb-4a80-b457-5a9c497c363e_2852x1352.png\" data-component-name=\"Image2ToDOM\"><div class=\"image2-inset\"><picture><source type=\"image/webp\" srcset=\"https://substackcdn.com/image/fetch/$s_!6raY!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a1f79c7-bbeb-4a80-b457-5a9c497c363e_2852x1352.png 424w, https://substackcdn.com/image/fetch/$s_!6raY!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a1f79c7-bbeb-4a80-b457-5a9c497c363e_2852x1352.png 848w, https://substackcdn.com/image/fetch/$s_!6raY!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a1f79c7-bbeb-4a80-b457-5a9c497c363e_2852x1352.png 1272w, https://substackcdn.com/image/fetch/$s_!6raY!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a1f79c7-bbeb-4a80-b457-5a9c497c363e_2852x1352.png 1456w\" sizes=\"100vw\"><img src=\"https://substackcdn.com/image/fetch/$s_!6raY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a1f79c7-bbeb-4a80-b457-5a9c497c363e_2852x1352.png\" width=\"684\" height=\"324.14835164835165\" data-attrs=\"{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5a1f79c7-bbeb-4a80-b457-5a9c497c363e_2852x1352.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:690,&quot;width&quot;:1456,&quot;resizeWidth&quot;:684,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;image&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}\" class=\"sizing-normal\" alt=\"image\" title=\"image\" srcset=\"https://substackcdn.com/image/fetch/$s_!6raY!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a1f79c7-bbeb-4a80-b457-5a9c497c363e_2852x1352.png 424w, https://substackcdn.com/image/fetch/$s_!6raY!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a1f79c7-bbeb-4a80-b457-5a9c497c363e_2852x1352.png 848w, https://substackcdn.com/image/fetch/$s_!6raY!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a1f79c7-bbeb-4a80-b457-5a9c497c363e_2852x1352.png 1272w, https://substackcdn.com/image/fetch/$s_!6raY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a1f79c7-bbeb-4a80-b457-5a9c497c363e_2852x1352.png 1456w\" sizes=\"100vw\" loading=\"lazy\"></picture><div class=\"image-link-expand\"><div class=\"pencraft pc-display-flex pc-gap-8 pc-reset\"><button tabindex=\"0\" type=\"button\" class=\"pencraft pc-reset pencraft icon-container restack-image\"><svg role=\"img\" width=\"20\" height=\"20\" viewBox=\"0 0 20 20\" fill=\"none\" stroke-width=\"1.5\" stroke=\"var(--color-fg-primary)\" stroke-linecap=\"round\" stroke-linejoin=\"round\" xmlns=\"http://www.w3.org/2000/svg\"><g><title></title><path d=\"M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882\"></path></g></svg></button><button tabindex=\"0\" type=\"button\" class=\"pencraft pc-reset pencraft icon-container view-image\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"20\" height=\"20\" viewBox=\"0 0 24 24\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"2\" stroke-linecap=\"round\" stroke-linejoin=\"round\" class=\"lucide lucide-maximize2 lucide-maximize-2\"><polyline points=\"15 3 21 3 21 9\"></polyline><polyline points=\"9 21 3 21 3 15\"></polyline><line x1=\"21\" x2=\"14\" y1=\"3\" y2=\"10\"></line><line x1=\"3\" x2=\"10\" y1=\"21\" y2=\"14\"></line></svg></button></div></div></div></a></figure></div></li><li><p><strong><a href=\"https://huggingface.co/Prior-Labs/tabpfn_2_6\">tabpfn_2_6</a></strong> by <a href=\"https://huggingface.co/Prior-Labs\">Prior-Labs</a>: An update to the popular tabular prediction model, which is slightly larger than its predecessor. Its license allows research and internal evaluation only.</p></li><li><p><strong><a href=\"https://huggingface.co/facebook/sam3.1\">sam3.1</a></strong> by <a href=\"https://huggingface.co/facebook\">facebook</a>: An update to SAM 3, carrying the same restrictive license.</p></li><li><p><strong><a href=\"https://huggingface.co/Hcompany/Holotron-12B\">Holotron-12B</a></strong> by <a href=\"https://huggingface.co/Hcompany\">Hcompany</a>: A policy model for CUA agents.</p></li><li><p><strong><a href=\"https://huggingface.co/meituan-longcat/LongCat-Flash-Prover\">LongCat-Flash-Prover</a></strong> by <a href=\"https://huggingface.co/meituan-longcat\">meituan-longcat</a>: A Lean4 fine-tune of the large LongCat model.</p></li><li><p><strong><a href=\"https://huggingface.co/mistralai/Leanstral-2603\">Leanstral-2603</a></strong> by <a href=\"https://huggingface.co/mistralai\">mistralai</a>: A Lean4 fine-tune of the new Mistral Small 4.</p></li><li><p><strong><a href=\"https://huggingface.co/RekaAI/reka-edge-2603\">reka-edge-2603</a></strong> by <a href=\"https://huggingface.co/RekaAI\">RekaAI</a>: A model for robotics, beating models such as Cosmos-Reason2. Its noncommercial license converts into Apache 2.0 after two years.</p></li></ul><h4>RAG</h4><ul><li><p><strong><a href=\"https://huggingface.co/baidu/Qianfan-OCR\">Qianfan-OCR</a></strong> by <a href=\"https://huggingface.co/baidu\">baidu</a>: There have been a lot of great OCR models lately. This one is from Baidu and is licensed under Apache 2.0.</p></li><li><p><strong><a href=\"https://huggingface.co/datalab-to/chandra-ocr-2\">chandra-ocr-2</a></strong> by <a href=\"https://huggingface.co/datalab-to\">datalab-to</a>: An update to the Chandra OCR model, released under a restrictive license.</p></li><li><p><strong><a href=\"https://huggingface.co/lightonai/Reason-ModernColBERT\">Reason-ModernColBERT</a></strong> by <a href=\"https://huggingface.co/lightonai\">lightonai</a>: A SOTA retrieval model released under a non-commercial license. However, there is also code to re-generate the data, allowing the training of a commercially viable version.</p></li><li><p><strong><a href=\"https://huggingface.co/chromadb/context-1\">context-1</a></strong> by <a href=\"https://huggingface.co/chromadb\">chromadb</a>: A fine-tuned version of GPT-OSS for agentic search with an in-depth <a href=\"https://www.trychroma.com/research/context-1\">tech report</a>. It also marks the debut of Chroma into the open model space. Trained with Thinking Machine&#8217;s <a href=\"https://thinkingmachines.ai/tinker/\">Tinker</a>.</p><div class=\"captioned-image-container\"><figure><a class=\"image-link image2 is-viewable-img\" target=\"_blank\" href=\"https://substackcdn.com/image/fetch/$s_!_sEq!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F646905bb-983f-4b3c-88e4-3eaa805613a4_3250x1640.png\" data-component-name=\"Image2ToDOM\"><div class=\"image2-inset\"><picture><source type=\"image/webp\" srcset=\"https://substackcdn.com/image/fetch/$s_!_sEq!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F646905bb-983f-4b3c-88e4-3eaa805613a4_3250x1640.png 424w, https://substackcdn.com/image/fetch/$s_!_sEq!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F646905bb-983f-4b3c-88e4-3eaa805613a4_3250x1640.png 848w, https://substackcdn.com/image/fetch/$s_!_sEq!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F646905bb-983f-4b3c-88e4-3eaa805613a4_3250x1640.png 1272w, https://substackcdn.com/image/fetch/$s_!_sEq!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F646905bb-983f-4b3c-88e4-3eaa805613a4_3250x1640.png 1456w\" sizes=\"100vw\"><img src=\"https://substackcdn.com/image/fetch/$s_!_sEq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F646905bb-983f-4b3c-88e4-3eaa805613a4_3250x1640.png\" width=\"1456\" height=\"735\" data-attrs=\"{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/646905bb-983f-4b3c-88e4-3eaa805613a4_3250x1640.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:735,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Chroma Context-1: Training a Self-Editing Search Agent&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}\" class=\"sizing-normal\" alt=\"Chroma Context-1: Training a Self-Editing Search Agent\" title=\"Chroma Context-1: Training a Self-Editing Search Agent\" srcset=\"https://substackcdn.com/image/fetch/$s_!_sEq!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F646905bb-983f-4b3c-88e4-3eaa805613a4_3250x1640.png 424w, https://substackcdn.com/image/fetch/$s_!_sEq!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F646905bb-983f-4b3c-88e4-3eaa805613a4_3250x1640.png 848w, https://substackcdn.com/image/fetch/$s_!_sEq!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F646905bb-983f-4b3c-88e4-3eaa805613a4_3250x1640.png 1272w, https://substackcdn.com/image/fetch/$s_!_sEq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F646905bb-983f-4b3c-88e4-3eaa805613a4_3250x1640.png 1456w\" sizes=\"100vw\" loading=\"lazy\"></picture><div class=\"image-link-expand\"><div class=\"pencraft pc-display-flex pc-gap-8 pc-reset\"><button tabindex=\"0\" type=\"button\" class=\"pencraft pc-reset pencraft icon-container restack-image\"><svg role=\"img\" width=\"20\" height=\"20\" viewBox=\"0 0 20 20\" fill=\"none\" stroke-width=\"1.5\" stroke=\"var(--color-fg-primary)\" stroke-linecap=\"round\" stroke-linejoin=\"round\" xmlns=\"http://www.w3.org/2000/svg\"><g><title></title><path d=\"M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882\"></path></g></svg></button><button tabindex=\"0\" type=\"button\" class=\"pencraft pc-reset pencraft icon-container view-image\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"20\" height=\"20\" viewBox=\"0 0 24 24\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"2\" stroke-linecap=\"round\" stroke-linejoin=\"round\" class=\"lucide lucide-maximize2 lucide-maximize-2\"><polyline points=\"15 3 21 3 21 9\"></polyline><polyline points=\"9 21 3 21 3 15\"></polyline><line x1=\"21\" x2=\"14\" y1=\"3\" y2=\"10\"></line><line x1=\"3\" x2=\"10\" y1=\"21\" y2=\"14\"></line></svg></button></div></div></div></a></figure></div></li><li><p><strong><a href=\"https://huggingface.co/rednote-hilab/dots.mocr\">dots.mocr</a></strong> by <a href=\"https://huggingface.co/rednote-hilab\">rednote-hilab</a>: The beloved dots.ocr model has been updated and supports SVG outputs. However, on top of the general MIT license, the model comes with additional usage restrictions, just like its predecessor.</p></li></ul>\n      <p>\n          <a href=\"https://www.interconnects.ai/p/latest-open-artifacts-20-new-orgs\">\n              Read more\n          </a>\n      </p>\n   "
          },
          {
            "guid": {
              "#text": "https://www.interconnects.ai/p/lossy-self-improvement",
              "@_isPermaLink": "false"
            },
            "link": "https://www.interconnects.ai/p/lossy-self-improvement",
            "title": "Lossy self-improvement",
            "pubDate": "Sun, 22 Mar 2026 19:39:40 GMT",
            "enclosure": {
              "@_url": "https://substack-post-media.s3.amazonaws.com/public/images/720ffbb3-46c3-4ebe-9b0d-62985c025698_3182x1790.png",
              "@_type": "image/jpeg",
              "@_length": "0"
            },
            "dc:creator": "Nathan Lambert",
            "description": "The case for why self-improvement is real but it doesn't lead to fast takeoff.",
            "content:encoded": "<p>Fast takeoff, the singularity, and recursive self-improvement (RSI) are all top of mind in AI circles these days. There are elements of truth to them in what&#8217;s happening in the AI industry. Two, maybe three, labs are consolidating as an oligopoly with access to the best AI models (and the resources to build the next ones). The AI tools of today are abruptly transforming engineering and research jobs.</p><p>AI research is becoming much easier in many ways. The technical problems that need to be solved to scale training large language models even further are formidable. Super-human coding assistants making these approachable is breaking a lot of former claims of what building these things entailed. Together this is setting us up for a year (or more) of rapid progress at the cutting edge of AI.</p><p>We&#8217;re also at a time where language models are already extremely good. They&#8217;re in fact good enough for plenty of extremely valuable knowledge-work tasks. Language models taking another big step is hard to imagine &#8212; it&#8217;s unclear which tasks they&#8217;re going to master this year outside of code and CLI-based computer-use. There will be some new ones! These capabilities unlock new styles of working that&#8217;ll send more ripples through the economy.</p><p>These dramatic changes almost make it seem like a foregone conclusion that language models can then just keep accelerating progress on their own. The popular language for this is a recursive self-improvement loop. Early writing on the topic dates back to the 2000s, such as the <a href=\"https://www.lesswrong.com/posts/JBadX7rwdcRFzGuju/recursive-self-improvement\">blog post</a> entirely on the topic from 2008: </p><blockquote><p>Recursion is the sort of thing that happens when you hand the AI the object-level problem of &#8220;redesign your own cognitive algorithms&#8221;.</p></blockquote><p>And slightly earlier, in 2007, Yudkowsky also defined the related idea of a Seed AI in <em><a href=\"https://intelligence.org/files/LOGI.pdf\">Levels of Organization in General Intelligence</a>:</em></p><blockquote><p>A seed AI is an AI designed for self-understanding, self-modification, and recursive self-improvement. This has implications both for the functional architectures needed to achieve primitive intelligence, and for the later development of the AI if and when its holonic self-understanding begins to improve. Seed AI is not a workaround that avoids the challenge of general intelligence by bootstrapping from an unintelligent core; seed AI only begins to yield benefits once there is some degree of available intelligence to be utilized. The later consequences of seed AI (such as true recursive self-improvement) only show up after the AI has achieved significant holonic understanding and general intelligence.</p></blockquote><p>It&#8217;s reasonable to think we&#8217;re at the start here, with how general and useful today&#8217;s models are.</p><blockquote></blockquote><p>Generally, RSI can be summarized as when AI can improve itself, the improved version can improve even more efficiently, creating a closed amplification loop that leads to an intelligence explosion, often referred to as the singularity. There are a few assumptions in this. For RSI to occur, it needs to be that:</p><ol><li><p>The loop is closed. Models can keep improving on themselves and beget more models.</p></li><li><p>The loop is self-amplifying. The next models will yield even bigger improvements than the current ones.</p></li><li><p>The loop continues to run without losing efficiency. There are not added pieces of friction that make the exponential knee-capped as an early sigmoid.</p></li></ol><p>While I agree that momentous, socially destabilizing changes are coming in the next few years from sustained AI improvements, I expect the trend line of progress to be more linear than exponential when we reflect back. Instead of recursive self-improvement, it will be <strong>lossy self-improvement</strong> (LSI) &#8211; the models become core to the development loop but friction breaks down all the core assumptions of RSI. The more compute and agents you throw at a problem, the more loss and repetition shows up.</p><div class=\"subscription-widget-wrap-editor\" data-attrs=\"{&quot;url&quot;:&quot;https://www.interconnects.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}\" data-component-name=\"SubscribeWidgetToDOM\"><div class=\"subscription-widget show-subscribe\"><div class=\"preamble\"><p class=\"cta-caption\">Interconnects AI is a reader-supported publication. Consider becoming a subscriber.</p></div><form class=\"subscription-widget-subscribe\"><input type=\"email\" class=\"email-input\" name=\"email\" placeholder=\"Type your email&#8230;\" tabindex=\"-1\"><input type=\"submit\" class=\"button primary\" value=\"Subscribe\"><div class=\"fake-input-wrapper\"><div class=\"fake-input\"></div><div class=\"fake-button\"></div></div></form></div></div><p>I&#8217;m still a believer that the complexity brake on advanced systems will be a strong counterbalance to the reality that AI models are getting substantially better at every narrow task we need to compose together in making a leading AI model. I quoted this previously in <a href=\"https://www.interconnects.ai/p/brakes-on-an-intelligence-explosion?open=false#%C2%A72-current-ai-is-broad-not-narrow-intelligence\">April of 2025 in response to AI 2027</a>.</p><blockquote><p>Microsoft co-founder Paul Allen argued the opposite of accelerating returns, the <strong>complexity brake:</strong> the more progress science makes towards understanding intelligence, the more difficult it becomes to make additional progress. A study of the number of patents shows that human creativity does not show accelerating returns, but in fact, as suggested by Joseph Tainter in his The Collapse of Complex Societies, a law of diminishing returns. The number of patents per thousand peaked in the period from 1850 to 1900, and has been declining since. The growth of complexity eventually becomes self-limiting, and leads to a widespread &#8220;general systems collapse&#8221;.</p></blockquote><p>There are plenty of examples in how models are already trained, the deep intuitions we need to get them right, and the organizations that build them that show where the losses will come from. Building leading language models is incredibly complex, and only becoming more-so. There are a few core frictions in my mind.</p><p><strong>1. Automatable research is too narrow</strong></p><p>First, it is clear that language models this year will already be useful tools at optimizing localized tasks like lowering the test loss of a model. Andrey Karpathy recently launched his <a href=\"https://github.com/karpathy/autoresearch\">autoresearch</a> that popularized doing just this. This allows AI agents to play directly on GPUs to target tasks like lowering the loss on the test set. This approach works in narrow domains, i.e. one general test loss or one overall reward. The problem is that there&#8217;s a long-standing gap between an on-paper more accurate model and models that users find more productive. The most provocative case is for pretraining, which was discussed more at length around scaling laws. Scaling laws show us that the loss will continue going down, but <a href=\"https://www.interconnects.ai/p/scaling-realities\">we don&#8217;t know if that&#8217;ll be economically more</a> valuable.</p><p>In post-training, reinforcement learning algorithms are at least more directly tied to <em>specific</em> performance gains as most RL training environments can be used directly as an evaluation.  Still, I worry about generalization and tying back to models that are better at the specific task of improving themselves. It&#8217;s a big leap from models get better at some things to that necessarily translating to models that are better at building themselves and designing experiments. We&#8217;ve seen many AI capabilities sort of saturate at certain levels of human taste, such as writing quality. AI research is a bit different here, as there is a very high ceiling to climb up to. Where models mostly saturate on writing because there&#8217;s inherent tension in preferences, models will saturate on research because the search space and optimization target is too wide.</p><p>The <a href=\"https://arxiv.org/abs/2603.08640\">early benchmarks</a> for measuring this sort of ability all fall prey to the same problem &#8211; narrow scope. Agents will do well at optimizing single metrics, but the leap required to navigate many metrics at once is a very different skill set. That is actually what the best researchers do &#8212; they make many scalable ideas work <em>together</em>.</p><p>The most related benchmark we have to measure this is PostTrainBench, which is quite fun, but progress will very rapidly get distorted on this. Over 90% of the challenge in doing post-training well is getting the last 1-3% of performance, especially without cooking the model in out-of-domain tasks. Post-training a general, leading model is extremely complex, and only getting more complex. </p><div class=\"captioned-image-container\"><figure><a class=\"image-link image2 is-viewable-img\" target=\"_blank\" href=\"https://substackcdn.com/image/fetch/$s_!Hrz3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feab92b9f-776f-4bbe-98fc-36ca74c25dfd_2004x844.png\" data-component-name=\"Image2ToDOM\"><div class=\"image2-inset\"><picture><source type=\"image/webp\" srcset=\"https://substackcdn.com/image/fetch/$s_!Hrz3!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feab92b9f-776f-4bbe-98fc-36ca74c25dfd_2004x844.png 424w, https://substackcdn.com/image/fetch/$s_!Hrz3!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feab92b9f-776f-4bbe-98fc-36ca74c25dfd_2004x844.png 848w, https://substackcdn.com/image/fetch/$s_!Hrz3!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feab92b9f-776f-4bbe-98fc-36ca74c25dfd_2004x844.png 1272w, https://substackcdn.com/image/fetch/$s_!Hrz3!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feab92b9f-776f-4bbe-98fc-36ca74c25dfd_2004x844.png 1456w\" sizes=\"100vw\"><img src=\"https://substackcdn.com/image/fetch/$s_!Hrz3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feab92b9f-776f-4bbe-98fc-36ca74c25dfd_2004x844.png\" width=\"1456\" height=\"613\" data-attrs=\"{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/eab92b9f-776f-4bbe-98fc-36ca74c25dfd_2004x844.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:613,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:251287,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.interconnects.ai/i/191707266?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feab92b9f-776f-4bbe-98fc-36ca74c25dfd_2004x844.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}\" class=\"sizing-normal\" alt=\"\" srcset=\"https://substackcdn.com/image/fetch/$s_!Hrz3!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feab92b9f-776f-4bbe-98fc-36ca74c25dfd_2004x844.png 424w, https://substackcdn.com/image/fetch/$s_!Hrz3!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feab92b9f-776f-4bbe-98fc-36ca74c25dfd_2004x844.png 848w, https://substackcdn.com/image/fetch/$s_!Hrz3!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feab92b9f-776f-4bbe-98fc-36ca74c25dfd_2004x844.png 1272w, https://substackcdn.com/image/fetch/$s_!Hrz3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feab92b9f-776f-4bbe-98fc-36ca74c25dfd_2004x844.png 1456w\" sizes=\"100vw\" loading=\"lazy\"></picture><div class=\"image-link-expand\"><div class=\"pencraft pc-display-flex pc-gap-8 pc-reset\"><button tabindex=\"0\" type=\"button\" class=\"pencraft pc-reset pencraft icon-container restack-image\"><svg role=\"img\" width=\"20\" height=\"20\" viewBox=\"0 0 20 20\" fill=\"none\" stroke-width=\"1.5\" stroke=\"var(--color-fg-primary)\" stroke-linecap=\"round\" stroke-linejoin=\"round\" xmlns=\"http://www.w3.org/2000/svg\"><g><title></title><path d=\"M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882\"></path></g></svg></button><button tabindex=\"0\" type=\"button\" class=\"pencraft pc-reset pencraft icon-container view-image\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"20\" height=\"20\" viewBox=\"0 0 24 24\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"2\" stroke-linecap=\"round\" stroke-linejoin=\"round\" class=\"lucide lucide-maximize2 lucide-maximize-2\"><polyline points=\"15 3 21 3 21 9\"></polyline><polyline points=\"9 21 3 21 3 15\"></polyline><line x1=\"21\" x2=\"14\" y1=\"3\" y2=\"10\"></line><line x1=\"3\" x2=\"10\" y1=\"21\" y2=\"14\"></line></svg></button></div></div></div></a></figure></div><p>I could go on and on about this. Another example is from during my Ph.D. (2017-2022), when there was immense hype around a field called &#8220;<a href=\"https://www.automl.org/automl/\">AutoML</a>&#8221; which aimed to use techniques like Bayesian Optimization to find new architectures and parameters for models. The hype never translated into changing my job. Language models will do more than this, but not enough to take jobs away from top AI researchers any time soon. The core currency of researchers is still intuition and managing complexity, rather than specific optimization and implementation. </p><p><strong>2. Diminishing returns of more AI agents in parallel</strong></p><p>The biggest problem for rapid improvement in AI is that even though we&#8217;ll have 10,000 remote workers in a datacenter, it&#8217;ll be nearly impossible to channel all of them at one problem. Inherently, especially when the models are still so similar, they&#8217;re sampling from the same distribution of solutions and capabilities while being bottlenecked by human supervision. Adding more agents will have a strict saturation in the amount of marginal performance that can be added &#8211; the intuition of the best few researchers (and time to run experiments) will be the final bottleneck.</p><p>A common idea to illustrate this is <a href=\"https://en.wikipedia.org/wiki/Amdahl%27s_law\">Amdahl&#8217;s law</a>, which is taken from computer architecture and shows that a given task can only generate a fixed speedup proportional to how much can be parallelized and how many parallel workers exist. An illustration is below:</p><div class=\"captioned-image-container\"><figure><a class=\"image-link image2 is-viewable-img\" target=\"_blank\" href=\"https://substackcdn.com/image/fetch/$s_!lX1X!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7157f5b9-2b12-45ab-8bff-0771fa034c6d_3840x3000.png\" data-component-name=\"Image2ToDOM\"><div class=\"image2-inset\"><picture><source type=\"image/webp\" srcset=\"https://substackcdn.com/image/fetch/$s_!lX1X!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7157f5b9-2b12-45ab-8bff-0771fa034c6d_3840x3000.png 424w, https://substackcdn.com/image/fetch/$s_!lX1X!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7157f5b9-2b12-45ab-8bff-0771fa034c6d_3840x3000.png 848w, https://substackcdn.com/image/fetch/$s_!lX1X!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7157f5b9-2b12-45ab-8bff-0771fa034c6d_3840x3000.png 1272w, https://substackcdn.com/image/fetch/$s_!lX1X!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7157f5b9-2b12-45ab-8bff-0771fa034c6d_3840x3000.png 1456w\" sizes=\"100vw\"><img src=\"https://substackcdn.com/image/fetch/$s_!lX1X!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7157f5b9-2b12-45ab-8bff-0771fa034c6d_3840x3000.png\" width=\"1456\" height=\"1138\" data-attrs=\"{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7157f5b9-2b12-45ab-8bff-0771fa034c6d_3840x3000.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1138,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:304903,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.interconnects.ai/i/191707266?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7157f5b9-2b12-45ab-8bff-0771fa034c6d_3840x3000.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}\" class=\"sizing-normal\" alt=\"\" srcset=\"https://substackcdn.com/image/fetch/$s_!lX1X!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7157f5b9-2b12-45ab-8bff-0771fa034c6d_3840x3000.png 424w, https://substackcdn.com/image/fetch/$s_!lX1X!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7157f5b9-2b12-45ab-8bff-0771fa034c6d_3840x3000.png 848w, https://substackcdn.com/image/fetch/$s_!lX1X!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7157f5b9-2b12-45ab-8bff-0771fa034c6d_3840x3000.png 1272w, https://substackcdn.com/image/fetch/$s_!lX1X!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7157f5b9-2b12-45ab-8bff-0771fa034c6d_3840x3000.png 1456w\" sizes=\"100vw\" loading=\"lazy\"></picture><div class=\"image-link-expand\"><div class=\"pencraft pc-display-flex pc-gap-8 pc-reset\"><button tabindex=\"0\" type=\"button\" class=\"pencraft pc-reset pencraft icon-container restack-image\"><svg role=\"img\" width=\"20\" height=\"20\" viewBox=\"0 0 20 20\" fill=\"none\" stroke-width=\"1.5\" stroke=\"var(--color-fg-primary)\" stroke-linecap=\"round\" stroke-linejoin=\"round\" xmlns=\"http://www.w3.org/2000/svg\"><g><title></title><path d=\"M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882\"></path></g></svg></button><button tabindex=\"0\" type=\"button\" class=\"pencraft pc-reset pencraft icon-container view-image\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"20\" height=\"20\" viewBox=\"0 0 24 24\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"2\" stroke-linecap=\"round\" stroke-linejoin=\"round\" class=\"lucide lucide-maximize2 lucide-maximize-2\"><polyline points=\"15 3 21 3 21 9\"></polyline><polyline points=\"9 21 3 21 3 15\"></polyline><line x1=\"21\" x2=\"14\" y1=\"3\" y2=\"10\"></line><line x1=\"3\" x2=\"10\" y1=\"21\" y2=\"14\"></line></svg></button></div></div></div></a></figure></div><p>In AI this should be relatively easier to convey, as the low-level operating details of computers are fairly mysterious. Consider an AI researcher on the transition from writing code by hand to using AI autocomplete assistance to now using autonomous coding agents. These are all massive gains. Let us continue. Now this researcher uses 3-4 agents working on different sub-tasks or approaches to the problem at hand. This is still a large gain. Now consider this single researcher trying to organize 30-40 agents with tasks to do every day. Some people can get more value out of this scale, but not many.</p><p>How many people do you think could come up with 300-400 tasks for AI agents every day? Not many. This problem will hit the AI models soon enough as well.</p><p><strong>3. Resource bottlenecks and politics</strong></p><p>Fundamentally, all the AI companies are walking a fine line of acquiring substantial capital, converting new compute resources to revenue via sufficient demand, and repeating the process all-the-while spending an extreme amount on research. With the scale of resources here, there will always be political bottlenecks on who gets resources and what gets bet on. In this layer, research leadership sits above the AIs and the researchers. Even as models continue to improve, this source of friction will never get removed. It isn&#8217;t a substantial friction, but the AI models are fundamentally operating in organizations where humans are the bottleneck on resources. </p><p>The early scale of improvements with language models is local optimizations, where the resources used cost &lt;$1M per day. With my other views on the frictions of AI, this is on its own a very minor impact on the rate of improvement, but for those with worries of fast take-off, RSI, and loss of control to AIs, it should be obvious that billions of dollars of compute resources for research are unlikely to be totally isolated for end-to-end experimentation of AI models. </p><p class=\"button-wrapper\" data-attrs=\"{&quot;url&quot;:&quot;https://www.interconnects.ai/p/lossy-self-improvement?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}\" data-component-name=\"ButtonCreateButton\"><a class=\"button primary\" href=\"https://www.interconnects.ai/p/lossy-self-improvement?utm_source=substack&utm_medium=email&utm_content=share&action=share\"><span>Share</span></a></p><div><hr></div><p>The conclusion here is that because we&#8217;re at the early stages of using AI assistance, autonomously and at scale for AI-development, we&#8217;re collectively discovering the ways that AI can help us massively. We&#8217;re all applying these tools to capture the low-hanging fruit we see and our jobs are literally changing to be higher paced and more productive. The problem is that all of these axes have clear human, political, or technical complexity bottlenecks.</p><p>The bottom of every sigmoid feels like an exponential. We&#8217;ve ridden multiple exponentials in the era of language models, in 2023 we scaled to huge models and GPT-4 felt like magic, by 2025 we added inference-time scaling with o1 and reasoning models &#8212; they let us &#8220;solve&#8221; math and coding, now we&#8217;re going to take a big step by polishing the entire AI workflow (all the while scaling training compute massively). 2026 will feel like a huge step, but it doesn&#8217;t have a fundamental change convincing me that progress will begin to take off.</p><p>This could still cross the colloquial threshold for AGI, which is a drop-in replacement for most remote workers, which would be an incredible milestone. Much of the challenge in the debate of if we hit AGI in the coming years is that AI models are jagged and smart in different ways than humans, so they won&#8217;t look like drop-in replacements for remote workers, but in many cases just using AI will be far more effective than trying to work with a human. It&#8217;s reshaping what jobs are.</p><p>Let us consider the scenarios we&#8217;re working through.</p><ol><li><p>Engineering is becoming automated today. Humans are way more productive, models can scale through complex infrastructure deployments much faster, run with higher GPU utilization, etc. Infrastructure gains become fixed improvements in the rate and scale of experimentation, the fundamental units of progress in AI.</p></li><li><p>Basic AI model research and optimization will be automated. The AI models are expanding in scope &#8211; they transition from writing kernels to deciding on architectures. This is moving from improving the experimentation toolkit to running minor experiments themselves. Configs, hyperparameters, etc. become the domain of the AI assistants.</p></li></ol><p>These are both real. The problem is that a third era doesn&#8217;t have a simple scale to jump to. Where the AI models can create knowledge by synthesis and execution, the next jump requires harnessing thousands of agents or having models make more novel discoveries &#8211; like unlocking the next paradigm after inference time scaling. The improvements downstream of AI are going to make the industry supercharged at hill climbing, but I worry that this won&#8217;t bring paradigm shifts that are needed for new categories of AI &#8211; continual learning, world models, whatever your drug of choice is.</p><p>All together, the models are becoming core to the development loop and that&#8217;s worth being excited (and worried) about. The models <em>are </em>performing self-improvement. They&#8217;re not transforming the approach. We <em>are</em> scaling up the compute we spend on our own research practices and tools. There are diminishing returns. Agents <em>are</em> going to start being autonomous entities we work with. They feel like a cross between a genius and a 5 year old. We will be in this era of lossy self-improvement (LSI) for a few years, but it is not enough for a fast takeoff. </p>"
          },
          {
            "guid": {
              "#text": "https://www.interconnects.ai/p/gpt-54-is-a-big-step-for-codex",
              "@_isPermaLink": "false"
            },
            "link": "https://www.interconnects.ai/p/gpt-54-is-a-big-step-for-codex",
            "title": "GPT 5.4 is a big step for Codex",
            "pubDate": "Wed, 18 Mar 2026 13:02:54 GMT",
            "enclosure": {
              "@_url": "https://substackcdn.com/image/fetch/$s_!49G2!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce867601-79e5-4519-9e6d-8ae221c08f0b_2400x1800.webp",
              "@_type": "image/jpeg",
              "@_length": "0"
            },
            "dc:creator": "Nathan Lambert",
            "description": "On evaluating and understanding the frontier of agents, and why I still turn to Claude.",
            "content:encoded": "<p>I&#8217;m a little late to this model review, but that has given me more time to think about the axes that matter for agents. Traditional benchmarks reduce model performance to a single score of correctness &#8211; they always have because that was simple, easy to quickly use to gauge performance, and so on. This is also advice that I give to people trying to build great benchmarks &#8211; it needs to reduce to one number that is interpretable. This is likely still going to be true in a year or two, and benchmarks for agents will be better, but for the time being it doesn&#8217;t really map to what we feel because agentic tasks are all about a mix of correctness, ease of use, speed, and cost. Eventually benchmarks will individually address these.</p><p>Where GPT 5.4 feels like another incremental model on some on-paper benchmarks, in practice it feels like a meaningful step in all four of those traits. GPT 5.4 in Codex, always on fast mode and high or extra-high effort, is the first OpenAI agent that feels like it can do a lot of random things you can throw at it.</p><div class=\"subscription-widget-wrap-editor\" data-attrs=\"{&quot;url&quot;:&quot;https://www.interconnects.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}\" data-component-name=\"SubscribeWidgetToDOM\"><div class=\"subscription-widget show-subscribe\"><div class=\"preamble\"><p class=\"cta-caption\">Interconnects AI is a reader-supported publication. Consider becoming a subscriber.</p></div><form class=\"subscription-widget-subscribe\"><input type=\"email\" class=\"email-input\" name=\"email\" placeholder=\"Type your email&#8230;\" tabindex=\"-1\"><input type=\"submit\" class=\"button primary\" value=\"Subscribe\"><div class=\"fake-input-wrapper\"><div class=\"fake-input\"></div><div class=\"fake-button\"></div></div></form></div></div><p>I haven&#8217;t been particularly deep in software engineering over the last few months, so most of my working with agents has been smaller projects (not totally one-off, but small enough where I&#8217;ve built the entire thing and manage the design over weeks), data analysis, and research tasks. When you embrace being agent-native, this style of work entails a lot of regular APIs, background packages (like installing and managing LateX binaries, ffmpeg, multimedia conversion tools, etc), git operations, file management, search etc. Prior to GPT 5.4, I always churned off of OpenAI&#8217;s agents due to a death by a thousand cuts. It felt like rage quits. I&#8217;d feel like I was getting into GPT 5.2 Codex, but it would fail on a git operation and have me (or Claude) need to reset it. Those hard edges are no longer there.</p><p>The other subtle change in GPT 5.4&#8217;s approachability &#8211; the biggest reason I think OpenAI is much more back in the agent wars &#8211; is that it just feels a bit more &#8220;right.&#8221; I classify this differently to the routine tasks I discussed above, and it has to do with how the product (i.e. the model harness) presents the model outputs, requests, and all that to you the user. It has to do with how easy it is to dive in. This has always been Claude&#8217;s biggest strength in its astronomical growth. Not only has Claude been immensely useful, but it has a charm and entertainment value to it that&#8217;ll make new people stick around. GPT 5.4 has a bit of that, but the underlying model strengths of Claude still leave it feeling warmer.</p><p>Where Claude is a super smart model, with character, a turn of phrase in a debate, and sometimes forgetting something, OpenAI&#8217;s models in Codex feel meticulous, slightly cold, but deeply mechanical. I&#8217;d use Claude for things I need more of an opinion on and GPT 5.4 to churn through an overwhelmingly specific TODO list. The instruction following of GPT 5.4 is so precise that I need to learn to interact with the models differently after spending so much time with Claude. Claude, in some domains, you come to see has an excellent model for your intent. GPT 5.4 just does what you say to do. These are very different philosophies of &#8220;what will make the best model for an agent&#8221;, Claude will likely appeal to the newcomers, but GPT 5.4 will likely appeal to the master agent coordinator that wants to unleash their AI army on distributed tasks.</p><p class=\"button-wrapper\" data-attrs=\"{&quot;url&quot;:&quot;https://www.interconnects.ai/p/gpt-54-is-a-big-step-for-codex?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}\" data-component-name=\"ButtonCreateButton\"><a class=\"button primary\" href=\"https://www.interconnects.ai/p/gpt-54-is-a-big-step-for-codex?utm_source=substack&utm_medium=email&utm_content=share&action=share\"><span>Share</span></a></p><p>Outside of charm, and dare I say taste, a lot of the usability factors are actually better on OpenAI&#8217;s half of the world. The Codex app is compelling &#8211; I don&#8217;t always use it, but sometimes I totally love it. I suspect substantial innovation is coming in what these apps look like. Personally, I expect them to eventually look like Slack (when multiple agents need to talk to eachother, under my watch).</p><p>OpenAI also natively offers fast mode for their models with a subscription and very large rate limits. I&#8217;ve been on the $100/month Claude plan and $200/month ChatGPT plan for quite some time. I&#8217;ve never been remotely close to my Codex limits with fast mode and xhigh reasoning effort, where I hit my Claude limits from time to time. There&#8217;s definitely a modeling reason to this &#8211; most of OpenAI&#8217;s release blogs showcase each iterative model being substantially more concise in the number of tokens it takes to get peak benchmark performance. This is a measure of reasoning efficiency. This 2D (or more) benchmark picture is exactly where the world is going.</p><p>Here&#8217;s a <a href=\"https://cursor.com/blog/cursorbench\">plot from Cursor</a>, which sadly doesn&#8217;t have all the GPT 5.4 reasoning efforts, but it confirms this point in a third party evaluation. What is missing across model families is the <em>speed</em> and price (a proxy for total compute used) to get there.</p><div class=\"captioned-image-container\"><figure><a class=\"image-link image2 is-viewable-img\" target=\"_blank\" href=\"https://substackcdn.com/image/fetch/$s_!49G2!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce867601-79e5-4519-9e6d-8ae221c08f0b_2400x1800.webp\" data-component-name=\"Image2ToDOM\"><div class=\"image2-inset\"><picture><source type=\"image/webp\" srcset=\"https://substackcdn.com/image/fetch/$s_!49G2!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce867601-79e5-4519-9e6d-8ae221c08f0b_2400x1800.webp 424w, https://substackcdn.com/image/fetch/$s_!49G2!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce867601-79e5-4519-9e6d-8ae221c08f0b_2400x1800.webp 848w, https://substackcdn.com/image/fetch/$s_!49G2!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce867601-79e5-4519-9e6d-8ae221c08f0b_2400x1800.webp 1272w, https://substackcdn.com/image/fetch/$s_!49G2!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce867601-79e5-4519-9e6d-8ae221c08f0b_2400x1800.webp 1456w\" sizes=\"100vw\"><img src=\"https://substackcdn.com/image/fetch/$s_!49G2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce867601-79e5-4519-9e6d-8ae221c08f0b_2400x1800.webp\" width=\"1456\" height=\"1092\" data-attrs=\"{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ce867601-79e5-4519-9e6d-8ae221c08f0b_2400x1800.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1092,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:69400,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.interconnects.ai/i/191317183?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce867601-79e5-4519-9e6d-8ae221c08f0b_2400x1800.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}\" class=\"sizing-normal\" alt=\"\" srcset=\"https://substackcdn.com/image/fetch/$s_!49G2!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce867601-79e5-4519-9e6d-8ae221c08f0b_2400x1800.webp 424w, https://substackcdn.com/image/fetch/$s_!49G2!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce867601-79e5-4519-9e6d-8ae221c08f0b_2400x1800.webp 848w, https://substackcdn.com/image/fetch/$s_!49G2!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce867601-79e5-4519-9e6d-8ae221c08f0b_2400x1800.webp 1272w, https://substackcdn.com/image/fetch/$s_!49G2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce867601-79e5-4519-9e6d-8ae221c08f0b_2400x1800.webp 1456w\" sizes=\"100vw\" loading=\"lazy\"></picture><div class=\"image-link-expand\"><div class=\"pencraft pc-display-flex pc-gap-8 pc-reset\"><button tabindex=\"0\" type=\"button\" class=\"pencraft pc-reset pencraft icon-container restack-image\"><svg role=\"img\" width=\"20\" height=\"20\" viewBox=\"0 0 20 20\" fill=\"none\" stroke-width=\"1.5\" stroke=\"var(--color-fg-primary)\" stroke-linecap=\"round\" stroke-linejoin=\"round\" xmlns=\"http://www.w3.org/2000/svg\"><g><title></title><path d=\"M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882\"></path></g></svg></button><button tabindex=\"0\" type=\"button\" class=\"pencraft pc-reset pencraft icon-container view-image\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"20\" height=\"20\" viewBox=\"0 0 24 24\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"2\" stroke-linecap=\"round\" stroke-linejoin=\"round\" class=\"lucide lucide-maximize2 lucide-maximize-2\"><polyline points=\"15 3 21 3 21 9\"></polyline><polyline points=\"9 21 3 21 3 15\"></polyline><line x1=\"21\" x2=\"14\" y1=\"3\" y2=\"10\"></line><line x1=\"3\" x2=\"10\" y1=\"21\" y2=\"14\"></line></svg></button></div></div></div></a></figure></div><p>The final benefit of GPT 5.4, and OpenAI&#8217;s agentic models in general for that matter, is much better context management. In using them regularly now I feel like I&#8217;ve never hit the context wall or context anxiety point. The reasoning efficiency I suspect is the case above just lets the model do way more with its initially empty context window. Then, when GPT 5.4 does compact, it&#8217;s been less noticeable.</p><p>The one problem I&#8217;ve been having with both Claude Opus 4.6 and GPT 5.4 is a light forgetfulness. If you give the models multiple TODOs in a single message outside of planning mode, I find them often dropping them. Sometimes it feels like the models glitch and try to solve a previous problem rather than the recent ones. I&#8217;m not sure what in the model or the harness is the exact cause, but sometimes I like to queue up a few messages as I see the model working on something, to refine the task, but currently this tends to be a pretty risky outcome except in the simplest use-cases.</p><p>These days I&#8217;ve been using both GPT and Claude extensively, mostly based on my mood, and have been getting more done than ever. Having a GPT 5.4 Pro integration directly with Codex, e.g. like \\ultrathink, would be a big differentiator for OpenAI. Those models have been incredible.</p><p>All in, I see GPT 5.4 as an agentic model that brings a ton more simple usability and &#8220;agentness&#8221; to the very strong software foundation of GPT 5.3 Codex. It&#8217;s a big step, and I&#8217;m unbelievably excited for which of these two companies releases an update next. On paper, listing the strengths of GPT 5.4 across better top end coding performance, better speed, better context management, better rate limits, it&#8217;s a testament to how nuanced choosing a model is. I genuinely still <em>enjoy</em> Claude a bit more for ways that&#8217;ll never show up on benchmarks. This makes me type <code>claude</code> into my terminal at the start of my day, rather than <code>codex</code>.</p>"
          },
          {
            "guid": {
              "#text": "https://www.interconnects.ai/p/the-next-phase-of-open-models",
              "@_isPermaLink": "false"
            },
            "link": "https://www.interconnects.ai/p/the-next-phase-of-open-models",
            "title": "What comes next with open models",
            "pubDate": "Mon, 16 Mar 2026 13:00:51 GMT",
            "enclosure": {
              "@_url": "https://substack-post-media.s3.amazonaws.com/public/images/07ccf41a-ab0e-4cb6-b24b-234ec18c39a7_3182x1790.png",
              "@_type": "image/jpeg",
              "@_length": "0"
            },
            "dc:creator": "Nathan Lambert",
            "description": "Markets, capabilities, cope, and bewilderment in the industrialization of language models.",
            "content:encoded": "<p>2025 was the year where a lot of companies started to take open models seriously as a path to influence in the extremely valuable AI ecosystem &#8212; the adoption of a strategy that was massively accelerated downstream of <a href=\"https://www.interconnects.ai/p/deepseek-r1-recipe-for-o1\">DeepSeek R1&#8217;s</a> breakout success. Most of this is being done as a mission of hope, principle, or generosity. </p><p>Very few businesses have a real monetary reason to build open models. Well-cited reasons, such as <a href=\"https://gwern.net/complement\">commoditizing one&#8217;s complements</a> for Meta&#8217;s Llama, are hard to follow up on when the cost of participating well is billions of dollars. Still, AI is in such an early phase of technological development, mostly defined by large-scale industrialization and massive scale-out of infrastructure, that having any sort of influence at the cutting edge of AI is seen as a path to immense potential value. </p><p>Open models are a very fast way to achieve this, you can obtain substantial usage and mindshare with no enterprise agreements or marketing campaigns &#8212; just releasing one good model. Many companies in AI have raised a ton of money built on less. </p><p>The hype of open models is simultaneously amplified by the mix of cope, disruptive anticipation, and science fiction that hopes for the world where open models do truly surpass the closed labs. This goal could be an economically catastrophic success for the AI ecosystem, where profits and revenue plummet but the broader balance of <a href=\"https://www.interconnects.ai/p/how-anthropic-vs-dow-impacts-open\">power and control of AI models</a> is long-term more stable.</p><p>There&#8217;s a small chance open models win in absolute performance, but it would only be on the back of either a true scientific breakthrough that is somehow kept hidden from the leading labs or the models truly hitting a wall in performance. Both of them are definitely possible, but very unlikely. </p><p>It is important to remind yourself that there have been no walls in progress to date and all the top AI researchers we discuss this with constantly explain the low-hanging fruit they see on progress. It may not be recursive self-improvement to the singularity (more on that in a separate post), but large technology companies are on a direct path to building definitionally transformative tools. They are coming.</p><p class=\"button-wrapper\" data-attrs=\"{&quot;url&quot;:&quot;https://www.interconnects.ai/p/the-next-phase-of-open-models?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}\" data-component-name=\"ButtonCreateButton\"><a class=\"button primary button-wrapper\" href=\"https://www.interconnects.ai/p/the-next-phase-of-open-models?utm_source=substack&utm_medium=email&utm_content=share&action=share\"><span>Share</span></a></p><h2>The balance of power in open vs. closed models</h2><p>The fair assessment of the open-closed gap is that <a href=\"https://www.interconnects.ai/p/open-models-in-perpetual-catch-up\">open models have always been 6-18 months behind the best closed models</a>. It is a remarkable testament to the open labs, operating on far smaller budgets, that this has stayed so stable. Many top analysts like myself are bewildered by the way the gap isn&#8217;t bigger. Distillation helps a bit in quality, benchmaxing more than closed labs helps perceptions, but the progress of the leading open models is flat out remarkable. </p><p>The reality is that the open-closed model gap is more likely to grow than shrink. The top few labs are improving as fast as ever, <a href=\"https://www.interconnects.ai/p/opus-46-vs-codex-53\">releasing many great new models</a>, with more on the docket. Many of the most impressive frontier model improvements relative to their open counterparts feel totally unmeasured on public benchmarks. </p><p>In a new era of coding agents, the popular method to &#8220;copy&#8221; performance from closed models, <a href=\"https://www.interconnects.ai/p/how-much-does-distillation-really\">distillation</a>, requires more creativity to extract performance &#8212; previously, you could use the entire completion from the model to train your student, but now the most important part is the complex RL environments and the prompts to place your agents in them. These are much easier to hide and all the while the Chinese labs leading in open models are always complaining about computational restrictions. </p><p>As the leading AI models move into longer-horizon and more specialized tasks, mediated by complex and expensive gate-keepers in the U.S. economy (e.g. legal or healthcare systems), I expect large gaps in performance to appear. Coding can largely be mostly &#8220;solved&#8221; with careful data processes, scraping GitHub, and clever environments. The economies of scale and foci of training are moving into domains that are not on the public web, so they are far harder to replicate than early language models. </p><p>Developing frontier AI models today is more defined by stacking medium to small wins, unlocked by infrastructure, across time. This rewards organizations that can expand scope while maintaining quality, which is extremely expensive.</p><p>All of these dynamics together create a business landscape for open models that is hard to parse. Through 2026, closed models are going to take leaps and bounds in performance in directions that it is unlikely for open models to follow. This sets us up for a world where we need to consider, fund, use, and discuss open models differently. This piece lays out how open models are changing. It is a future that&#8217;ll be clearly defined by three classes of models.</p><ol><li><p><strong>True (closed) frontier models.</strong> These will drive the strongest knowledge work and coding agents. They will be truly remarkable tools that force us to reconsider our relationship to work.</p></li><li><p><strong>Open frontier models.</strong> These will be the best open-weight, large models that are attempting to compete on the same directions as above. There will be plenty of use-cases that they don&#8217;t work for relative to the best models, but countless use-cases where they work remarkably well. For many use-cases, even ones as valuable as some subsets of coding, these will work great. <br><br>The AI ecosystem will still take years to understand what it means to have intelligence of this magnitude served in private, at the marginal cost of electricity for individuals, as assistants, coaches, companions, and more. OpenClaw provided a glimpse behind the mirror that will expand and grow. The class of models around GPT-OSS 120B, <a href=\"https://developer.nvidia.com/blog/introducing-nemotron-3-super-an-open-hybrid-mamba-transformer-moe-for-agentic-reasoning/\">Nvidia Nemotron 3 Super</a>, or <a href=\"https://huggingface.co/MiniMaxAI/MiniMax-M2.5\">MiniMax M2.5</a> are the balance of performance to price that can work as local models.</p></li><li><p><strong>Open, small models as distributed intelligence</strong>. The <em>most successful</em> open models will be complementary tools to closed agents. This is a path for open models to complement and accelerate the frontier of progress.<br><br>AI is slotting in to automate many repetitive, niche tasks across the technology economy. There&#8217;s a huge pressure to shift these tasks off of the best closed models &#8212; which frankly are still better at most of the things, across my conversations with businesses trying to build with open models &#8212; to small, open models that can be 10X faster and 100X cheaper. There aren&#8217;t really people building data and fine-tuning engines for economically viable tasks on the smallest models possible. <br><br>These models need to be almost brain-numbingly boring and specific. In a world dominated by coding agents, I want to build open models that Claude Code is <em>desperate</em> to use as a tool, letting its sub agents unlock entirely new areas of work. This is possible, but remarkably under-explored. Small models from the likes of Qwen and co. are still marketed on general-task benchmarks. The hype of &#8220;open models catching the frontier&#8221; distracts the world from this very large area of demand.<br><br>This is the sort of model that moves open models from just a few, crucial static weights to more of an ecosystem. It requires creativity and a new approach. The goal of this piece is to illustrate why and how to build these, with added context on where open models stand today.</p></li></ol><p>All three of these model classes hint at different ways to use agents. It is absolutely definitional to how AI is going to be built going forward that they&#8217;re not just model weights, but rather systems that <a href=\"https://www.interconnects.ai/p/thinking-searching-and-acting\">think, search, and act</a>. The weights only define one portion of those abilities.</p><div class=\"subscription-widget-wrap-editor\" data-attrs=\"{&quot;url&quot;:&quot;https://www.interconnects.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}\" data-component-name=\"SubscribeWidgetToDOM\"><div class=\"subscription-widget show-subscribe\"><div class=\"preamble\"><p class=\"cta-caption\">Interconnects AI is a reader-supported publication. Consider becoming a subscriber.</p></div><form class=\"subscription-widget-subscribe\"><input type=\"email\" class=\"email-input\" name=\"email\" placeholder=\"Type your email&#8230;\" tabindex=\"-1\"><input type=\"submit\" class=\"button primary\" value=\"Subscribe\"><div class=\"fake-input-wrapper\"><div class=\"fake-input\"></div><div class=\"fake-button\"></div></div></form></div></div><h2>Open weights as part of an AI system</h2><p>To start, consider what are the most impactful and impressive things that language models can do <em>without</em> a suite of tools at their side. When was the last time that you were blown away by something that was <em>just</em> autoregressive token outputs? Unless you&#8217;re doing a substantial amount of work on mathematical proofs or competition code, it seems like that situation has changed little since GPT-4&#8217;s release in 2023. The AI systems we use today are about far, far more than weights.</p><p>In this world, closed models have a clear advantage. Closed models get to vertically integrate everything from the chips they run on, the inference software, the weights, the tools, and the user interface. Open models on the other hand need to work on every inference setup, with many tools, and in many use-cases. This vertical integration is best expressed today in the joy of using Claude Code with Opus 4.6 or OpenAI&#8217;s Codex with GPT 5.4. Open models haven&#8217;t passed this point. Some are starting to focus on specific interfaces, e.g. OpenCode, but there&#8217;s an inherent tension in making an open model work only in your blessed product roadmap.</p><p>At the same time, this change could point to more about the latest AI systems being open! If you can do less with the weights alone, maybe more labs will release them.</p><p>The way to think about AI systems today is as a mix of weights, tools, and harnesses. The weights portion is familiar. The tools are the deeply integrated environments the models act in <em>at deployment time</em> &#8212; best typified by search and code sandboxes &#8212; and the harness is how these two fit together with a product that the user sees.</p><p>In this world, there are two things to consider: 1) Is there an equivalent, open system to the closed products that people are using today &#8212; I mean truly equivalent, where every level of the stack can be modified and controlled (more on this later), and 2) How does this system&#8217;s view impact different future decisions in the open ecosystem?</p><h2>Still looking for open model business strategies</h2><p>To understand how the business and practicality of open models will evolve, let me take a tour back in time to foundational writing on the role of open-source in modern technology companies. The first is a Google blog post, <a href=\"https://googleblog.blogspot.com/2009/12/meaning-of-open.html\">The Meaning of Open</a>, which originally was an internal memo by Jonathan Rosenberg, which sparked an intense internal debate that later resulted in it becoming public. To start, here&#8217;s a basic assessment of how open systems can work:</p><blockquote><p>Open systems have the potential to spawn industries. They harness the intellect of the general population and spur businesses to compete, innovate, and win based on the merits of their products and not just the brilliance of their business tactics.</p></blockquote><p>I&#8217;ve long believed that the company who will benefit most from the ecosystem of open models is the one who understands it best. This entails being deeply involved with open research and experimentation in how to use the models. So far, most of the open model company business models are not this. Rosenberg expands on this in his 2009 post, comparing the dynamics of open systems to closed products:</p><blockquote><p>[Open systems] are competitive and far more dynamic. In an open system, a competitive advantage doesn&#8217;t derive from locking in customers, but rather from understanding the fast-moving system better than anyone else and using that knowledge to generate better, more innovative products. The successful company in an open system is both a fast innovator and a thought leader; the brand value of thought leadership attracts customers and then fast innovation keeps them. This isn&#8217;t easy &#8212; far from it &#8212; but fast companies have nothing to fear, and when they are successful they can generate great shareholder value.</p></blockquote><p>We&#8217;ve known for some time that open weight models are not actually enough to constitute a product &#8212; models are a product in the sense that they have tools and harnesses, so we don&#8217;t actually have fully open systems, we have systems that are partially open partially closed, making moats messy. VLLM and a model like GLM 5 are pieces of a system, but it still takes more to deploy them &#8212; expensive private GPUs and some tools with local business data.</p><p>It may turn out to be that AI is too complex and expensive to have any analogous open system to previous generations of technology. If there was a fully open system, it would win by default, as many historical generations of technology have shown us. This fully open analog does not yet exist, so we have constant debates on the role of open-source AI.</p><p>Bill Gurley recounts how Google&#8217;s free products have exemplified the open or free strategies across technology. Gurley <a href=\"https://abovethecrowd.com/2011/03/24/freight-train-that-is-android/\">wrote</a> on the open-source operating system, Android, and the free browser, Chrome, in 2011:</p><blockquote><p>So here is the kicker. Android, as well as Chrome and Chrome OS for that matter, are not &#8220;products&#8221; in the classic business sense. They have no plan to become their own &#8220;economic castles.&#8221; Rather they are very expensive and very aggressive &#8220;moats,&#8221; funded by the height and magnitude of Google&#8217;s castle. Google&#8217;s aim is defensive not offensive. They are not trying to make a profit on Android or Chrome. They want to take any layer that lives between themselves and the consumer and make it free (or even <a href=\"https://abovethecrowd.com/2009/10/29/google-redefines-disruption-the-%E2%80%9Cless-than-free%E2%80%9D-business-model/\">less than free</a>).</p><p>Because these layers are basically software products with no variable costs, this is a very viable defensive strategy. In essence, they are not just building a moat; Google is also scorching the earth for 250 miles around the outside of the castle to ensure no one can approach it.</p></blockquote><p>In the same post, Gurley reflects on the limits of Google&#8217;s openness:</p><blockquote><p>In this open manifesto, Jonathan opines over and over again that open systems unquestionably result in the very best solutions for end customers. That is with one exception. &#8220;In many cases, most notably our search and ads products, opening up the code would not contribute to these goals and would actually hurt users.&#8221; As Rodney Dangerfield said in Caddyshack, &#8220;It looks good on you, though.&#8221;</p></blockquote><p>Essentially, Google open-sourced so much, in fact <em>paid</em> people to use its products (e.g. paying phone makers to use android) to keep the funnel leading to the search profit center. This is the virtuous loop that the search business still funds to this day.</p><p>AI is still nothing like this, but signs of change are emerging. The default belief on the value of models to these companies is that<em> the model is the product</em>. This is obvious with products like hosted APIs, where releasing the model weights would be business suicide, but this is softening as interfaces like Claude Code, Codex, Cursor, etc. get vastly popular. It could be a path to more openness, at least in parts of the stack. We can see this with the coding plans offered by Moonshot and Z.ai &#8212; where the demand is very high for the businesses, even though the model is open. Most people will just use the cheap interface with inference, instead of figuring out how to use the model themselves (as long as the business is mostly consumer or per-head services).</p><p>All of this doesn&#8217;t leave me optimistic on the direction of companies becoming more open in the coming years. I&#8217;d expect the opposite still. <a href=\"https://www.interconnects.ai/p/why-nvidia-builds-open-models-with\">Nvidia has the one great reason to be open</a> &#8212; to sell more GPUs to people building on open models and understand what they need to build next, but there&#8217;s no one else obvious on this list. Until there are more specific economic reasons to build open models, the companies building these at the frontier will have fewer resources to spend on the models and face a consolidation to the best few.</p><p>In the face of consolidation at the open frontier, the investment in the models <em>should</em> shift to areas where the models can have more differentiated upside relative to the best closed frontier models.</p><p class=\"button-wrapper\" data-attrs=\"{&quot;url&quot;:&quot;https://www.interconnects.ai/p/the-next-phase-of-open-models/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}\" data-component-name=\"ButtonCreateButton\"><a class=\"button primary\" href=\"https://www.interconnects.ai/p/the-next-phase-of-open-models/comments\"><span>Leave a comment</span></a></p><h2>Open models that are specific, cheap, fast, and ubiquitous</h2><p>There&#8217;s too much obsession with the best companies building open models to try and compete at the frontier. There&#8217;s a vastly underserved market of enterprises that want cheap, reliable models for repetitive use-cases in their systems. Picture this, one small model with a series of LoRA adapters that specialize the model to internal skills. This can be deployed very cheaply as tools and a complement to the frontier closed models that are orchestrating agents. </p><p>Every task that a frontier agentic model does tens to hundreds of times can potentially be outsourced to a small model. There are ancillary benefits to this, e.g. privacy of a local model reading your files and summarizing to Claude, but almost no one is pushing hard in this direction. The leading model family of capable, customizable small models to date is Qwen, but that&#8217;s now <a href=\"https://interconnect.substack.com/p/alibabas-ai-drama\">shrouded in uncertainty with the departures of key personnel</a>. Gemma, Phi, Olmo, etc. are all major steps down in quality, and therefore potential for modification.</p><p>There are a few obvious examples why this can be scaled up. There was a recent <a href=\"https://x.com/awnihannun/status/2030024849570288080\">thread</a> and <a href=\"https://x.com/N8Programs/status/2030386417566613707\">discussion</a> on how the new Qwen 3.5 4B model arguably bests the original ChatGPT model. On the research side, there are already <a href=\"https://arxiv.org/abs/2601.20789\">recipes</a> for finetuning open models on specific code-bases to match performance of much bigger models. <a href=\"https://moondream.ai/\">Moondream.ai</a> is a startup made by a friend of mine Vik, who builds some of the best, small multimodal models on a tiny budget &#8212; they compete with Qwen and Llama on real world tasks. This is the tip of an iceberg. </p><p>Intelligence compression hasn&#8217;t been explored with nearly as much depth (or resources) because it is less exciting than keeping track of the progress of the best few models. Investigating these areas is the standard technological diffusion process that is slow and why we&#8217;re still early in understanding how people will build with AI. My contention is that too many people building open models are slightly deluded in their perception of their competitiveness. The best few models will win on general capabilities and there are still plenty of underserved niches elsewhere.</p><p>Taking this to the next level involves releasing open models that are scoped to be truly excellent at 1-3 tasks, as I hinted at the beginning of this piece. Too many people try to compete with Qwen and show that their small model does great on frontier AI benchmarks. The right benchmark here is savings in compute and time.</p><p>It&#8217;ll take years for this transition to slowly become reality. Part of why I am so excited about it is that it is driving innovation on open models being more about diversity, specialization, and curiosity, rather than the standard &#8220;one model to rule them all&#8221; that the frontier models presume.</p><p class=\"button-wrapper\" data-attrs=\"{&quot;url&quot;:&quot;https://www.interconnects.ai/p/the-next-phase-of-open-models?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}\" data-component-name=\"ButtonCreateButton\"><a class=\"button primary\" href=\"https://www.interconnects.ai/p/the-next-phase-of-open-models?utm_source=substack&utm_medium=email&utm_content=share&action=share\"><span>Share</span></a></p><h2>Models vs. ecosystems.<br>Consolidation vs. creativity.</h2><p>So long as the open source ecosystem for AI is defined by a bunch of model providers trying to chase after the closed labs, it will largely lose. It will face pain on funding and substantive adoption. The same consolidation that will come for closed AI companies will come for open model builders &#8212; likely even sooner. </p><p>Open systems at their best allow many people to participate and many approaches to flourish.</p><p>The world of open models needs to be more of an ecosystem. I&#8217;ve discussed in the past how <a href=\"https://www.interconnects.ai/p/on-chinas-open-source-ai-trajectory\">China is </a><em><a href=\"https://www.interconnects.ai/p/on-chinas-open-source-ai-trajectory\">closer</a></em><a href=\"https://www.interconnects.ai/p/on-chinas-open-source-ai-trajectory\"> to this type of environment</a> by having a variety of companies, but the variety in approaches is still too low.</p><p>Ecosystems are self-reinforcing, whereas individual models are static artifacts in time. Ecosystems showcase clear, constant opportunities for what&#8217;s next that have growing value propositions. </p><p>The path forward for open models is to solve different problems than the frontier labs, to find places where open models are effectively free alternatives, to show ways of using specialized models that the closed labs cannot offer. The world of open models needs to embrace creativity, before building powerful AI systems grows too expensive and prices out many of the prized open labs of today.</p>"
          },
          {
            "guid": {
              "#text": "https://www.interconnects.ai/p/how-anthropic-vs-dow-impacts-open",
              "@_isPermaLink": "false"
            },
            "link": "https://www.interconnects.ai/p/how-anthropic-vs-dow-impacts-open",
            "title": "Dean Ball on open models and government control ",
            "pubDate": "Fri, 06 Mar 2026 14:03:27 GMT",
            "enclosure": {
              "@_url": "https://api.substack.com/feed/podcast/190039178/7266d8486e4b1ea88e5e56dd28dfe2de.mp3",
              "@_type": "audio/mpeg",
              "@_length": "0"
            },
            "dc:creator": "Nathan Lambert",
            "description": "Subtle precedents on the future of open models set by the unfolding Anthropic v. Department of War case.",
            "content:encoded": "<p>Watching history unfold between Anthropic and the Department of War (DoW) it has been obvious to me that this could be a major turning point in perspectives on open models, but one that&#8217;ll take years to be obvious. As AI becomes more powerful, existing power structures will grapple with their roles relative to existing companies. Some in open models frame this as &#8220;<a href=\"https://x.com/ClementDelangue/status/2027196053989052608\">not your weights, not your brain</a>,&#8221; but it points to a much bigger problem when governments realize this. </p><p>If AI is the most powerful technology, why would any global entity let a single U.S. company (or government) control their relationship to it?</p><p>I got <span class=\"mention-wrap\" data-attrs=\"{&quot;name&quot;:&quot;Dean W. Ball&quot;,&quot;id&quot;:5925551,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!mLaj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49371abf-2579-47be-8114-3e0ca580af8b_1024x1024.png&quot;,&quot;uuid&quot;:&quot;d7aada77-d1d0-4b6b-a494-84ada0eb1b13&quot;}\" data-component-name=\"MentionToDOM\"></span> of the great <span class=\"mention-wrap\" data-attrs=\"{&quot;name&quot;:&quot;Hyperdimensional&quot;,&quot;id&quot;:2244049,&quot;type&quot;:&quot;pub&quot;,&quot;url&quot;:&quot;https://open.substack.com/pub/hyperdimensional&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8f70956b-24b6-432b-81c4-dcfa4095ead7_1024x1024.png&quot;,&quot;uuid&quot;:&quot;4290c33f-dceb-4f81-bd46-c8be97e8385e&quot;}\" data-component-name=\"MentionToDOM\"></span> newsletter onto the <span class=\"mention-wrap\" data-attrs=\"{&quot;name&quot;:&quot;SAIL Media&quot;,&quot;id&quot;:392441355,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/26b97e53-7767-4ade-b482-6b4265aef09b_586x586.png&quot;,&quot;uuid&quot;:&quot;9f7f0186-6c7f-48f8-bae4-809e7135a64e&quot;}\" data-component-name=\"MentionToDOM\"></span> weekly Substack live to discuss this. In the end, we agree that the recent actions by the DoW &#8212; especially the designation of Anthropic as a supply chain risk (which Dean and I both vehemently disagree with) &#8212; points to open models being the 5-10 year stable equilibrium for power centers. </p><p>The point of this discussion is:</p><ul><li><p>Why do open models avoid some of the power struggles we&#8217;ve seen play out last week?</p></li><li><p>How do we bridge short term headwinds for open models towards long-term strength?</p></li><li><p>The general balance of capabilities between open and closed models.</p></li></ul><p>Personally, I feel the need to build open models more than ever and am happy to see more constituencies wake up to it. What I don&#8217;t know is <em>how</em> to fund and organize that. <a href=\"https://gwern.net/complement\">Commoditizing one&#8217;s compliments</a> is a valid strategy, but it starts to break down when AI models cost closer to a trillion dollars than a hundred million. With open models being very hard to monetize, there&#8217;s a bumpy road ahead for figuring out <em>who</em> builds these models in face of real business growth elsewhere in the AI stack.</p><p>Enjoy and please share any feedback you have on this tricky topic! </p><p>Listen on <a href=\"https://podcasts.apple.com/us/podcast/interconnects-audio/id1719552353\">Apple Podcasts</a>, <a href=\"https://open.spotify.com/show/6XNzfJULeVxR7SneeesDUs\">Spotify</a>, and <a href=\"https://www.interconnects.ai/podcast\">where ever you get your podcasts</a>. For other Interconnects interviews, <a href=\"https://www.interconnects.ai/t/interviews\">go here</a>.</p><p class=\"button-wrapper\" data-attrs=\"{&quot;url&quot;:&quot;https://www.interconnects.ai/p/how-anthropic-vs-dow-impacts-open?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}\" data-component-name=\"ButtonCreateButton\"><a class=\"button primary\" href=\"https://www.interconnects.ai/p/how-anthropic-vs-dow-impacts-open?utm_source=substack&utm_medium=email&utm_content=share&action=share\"><span>Share</span></a></p><h3>Chapters</h3><ul><li><p>00:00 Intro: is the Anthropic supply chain risk good or bad for open models?</p></li><li><p>04:03 Funding open models and the widening frontier gap</p></li><li><p>12:33 Sovereign AI and global demand for alternatives</p></li><li><p>20:55 Open model ecosystem: Qwen, usability, and short-term outlook</p></li><li><p>28:20 Government power, nationalization risk, and financializing compute</p></li></ul><h3>Transcript</h3><p><strong>00:00:00 Nathan Lambert:</strong> Okay. We are live and people will start joining. I&#8217;m very happy to catch up with Dean. I think as we were setting this up, the news has been breaking that the official supply chain risk designation was filed. This is not a live reaction to that. If we get any really, really interesting news, we&#8217;ll talk about it. I think one of the undercurrents that I&#8217;ve felt that this week where everything happened is gonna touch on is open models, but there&#8217;s not an obvious angle. I think I will frame this to Dean to start, which is how does-- Like, there&#8217;s two sides of open models. One is that there&#8217;s the kind of cliche like, not my weights, not your weights, not your mind, where like somebody could take it away if not an open model, which people are boosting like, &#8220;Oh, like Anthropic&#8217;s gonna take away their intelligence.&#8221; But the other side is people worried about open models existing that the Department of War can just take and use for any purpose that it wants. And I feel like both of these are a little cliche. And the core question is like, is this type of event where more control is coming towards AI and more multi-party interest, like is that gonna be good or bad for the open weight model ecosystem?</p><p><strong>00:01:12 Dean Ball:</strong> My guess is that in the long run, this is probably profoundly good for open weight AI. And like the whole reason I got in, like, so I became interested in frontier AI governance. I did something totally different with my time before. I wrote about different kinds of policy and studied different kinds of policy. And the reason I got into this was because it immediately occurred to me that the government was gonna... I was like, okay, let&#8217;s assume we&#8217;re building super intelligence soon or whatever, like very advanced AI that seems like really important and powerful. That&#8217;s gonna be something that I depend on, like for my day-to-day life. I&#8217;m gonna need it for all kinds of things. It&#8217;s gonna profoundly implicate my freedom of expression as an American and my exercise of my liberty and all that. And yet it&#8217;s also gonna profoundly implicate national security. And so the government&#8217;s gonna have its hands all over it, and they also might not like me using it because I might use it, and others might use it to challenge the status quo in various ways, to challenge the existing power structures which the government is a part of. So we have a political problem on our hands here, in my view.</p><p><strong>00:02:36 Dean Ball:</strong> It immediately occurred to me that we&#8217;re gonna have this huge problem of like, this is gonna be a conflict because this is something that&#8217;s gonna enormously implicate American speech and liberty, and also it&#8217;s gonna have legitimate national security issues, and also the government&#8217;s gonna want it because of bad power-seeking reasons. And so that&#8217;s always a part of the picture. And my view was this is just a fight that&#8217;s gonna play out over the coming decades, and I wanna be a part of this fight. But number two, in that fight, you have to have an insurance policy, and open weight is the insurance policy. Open weight is the way we can always say yes, but we can build the open ecosystem. We can do that. And so I think in the fullness of time, this is gonna be beneficial, but the problem is there&#8217;s a lot of coordination and economic problems that have to be solved here. It&#8217;s not just a matter of hoping that Google and Meta or whomever else, or the Chinese companies, by virtue, out of the goodness of their hearts continue to open-source things. That&#8217;s not scalable. There has to be a reason to do it. So what are the institutional dynamics open weight gonna look like in the long term? I don&#8217;t really know, but it feels deeply under theorized.</p><p><strong>00:04:03 Nathan Lambert:</strong> I think it&#8217;s hard to fund is the thing. I mean, we saw Qwen had their turmoil this week, which is timely, and I&#8217;m not that surprised because the stakes for these companies is so high, and they all are trying to make sure their companies win in it. And people will say like, &#8220;Oh, Meta should commoditize their complements and release open models.&#8221; But no one&#8217;s ever commoditized their complements with something that costs a trillion dollars to make. Like, that&#8217;s a line item. Like, is Apple gonna commoditize... Apple commoditizing their complement would be them doing the... They could spend just as much as all the other tech companies are on CapEx and spend hundreds of billions of dollars, but they&#8217;re choosing not to. And I just like, I agree that long term it should be better, but if we never bridge that gap, does it actually materialize? Like, the crank is being turned of these models getting better and better. GPT 5.4 released today, excited to try it.</p><p><strong>00:05:02 Nathan Lambert:</strong> But like, where does it go? Like, what I&#8217;m working on is totally falling behind the frontier. We&#8217;re the foundation of research, but it&#8217;s like I see it already slipping.</p><p><strong>00:05:13 Dean Ball:</strong> So I kinda think, yeah, I mean, look, I think it&#8217;s gonna get bad in the short term, it&#8217;s gonna be bleak, right? There&#8217;s just no doubt about that in my view. Because we&#8217;re in this period, like I think the pace of frontier progress is gonna continue. My own view is that, like, just &#8216;cause I peer in and use the open weight Chinese models on a fairly regular basis, and I kinda just feel as though the gap has widened between the US frontier and the open frontier. Unfortunately, it&#8217;s so sad that US frontier and open frontier are increasingly distinct things. But I do feel as though that probably is true. And that&#8217;s probably gonna continue because in the next, like, in the early stages of a new technology, you would expect for the vertically integrated players to be the ones who do the best. And over time, the modular players can win, and part of that is &#8216;cause eventually you do get to good enough, right? Like, eventually, I think most people think the iPhone is good enough now. There was a time when every year the iPhone upgrade was like, &#8220;Oh my God, this is so much better.&#8221; Intelligence is maybe different, but maybe not for a lot of things.</p><p><strong>00:06:37 Nathan Lambert:</strong> Well, like, there&#8217;s no iPhone that you can buy from anyone. Nothing you can buy from anyone but Apple is nearly as good. That&#8217;s the concern. It&#8217;s like, is it gonna be Anthropic that like, yeah, it stopped getting better, but you can&#8217;t rebuild it. Like, you can&#8217;t make the open source version.</p><p><strong>00:06:51 Nathan Lambert:</strong> I also think I had a later question, which is like, the weights are so much less of a concern for me. So like, somebody dropping a two-trillion-parameter model that&#8217;s open weights and way better than anything else that somebody has built and released in the open, it almost doesn&#8217;t matter if you don&#8217;t understand the harness and the tools and the setup you need to make it into a Claude-like system. Like, you need what, eighty nodes of H100s that cost a hundred thousand dollars a day to run and expertise to make it a system. It&#8217;s like the shifting away from weights is also happening. I don&#8217;t think it&#8217;s happening in this open versus closed ecosystem at the surface level of the discussion. So that&#8217;s why I&#8217;m just like, I don&#8217;t know if it&#8217;s gonna exist. The thing that I could see happening is that open weights models are niche, and they help these Claude-like models, but there&#8217;s not an alternative in that universe. So it&#8217;s like, is the government capable of actually making this alternative exist? I don&#8217;t know. Like, I don&#8217;t know if you can Manhattan Project this, and I wouldn&#8217;t advocate for it.</p><p><strong>00:07:53 Dean Ball:</strong> I actually think about it from the opposite perspective, because I think that what happens if the government follows through on what they&#8217;ve threatened with Anthropic, which is to make it so that basically any military contractor cannot have any commercial relations with Anthropic, which means NVIDIA can&#8217;t sell GPUs to them for anything. Amazon can&#8217;t sell cloud services to them. Amazon and NVIDIA also can&#8217;t be invested in them, by the way, if you take any commercial relations at its face value. Now, that&#8217;s not a power the government actually has, but nonetheless, if this harassment campaign continues, I think what it probably does... You know, I spend a lot of time in international policy, dealing, talking to foreign governments and civil society in foreign countries, and they already have major trust issues with respect to the US closed source models because they think the US government is gonna come in and disable the models. Like, the American president will get mad at Brazil, say, and in addition to putting tariffs or sanctions, the US president will say, &#8220;Yeah, we&#8217;re also gonna turn off all your public services that are dependent upon American closed source models.&#8221; Right? So people view that as this profound threat, and people are legitimately scared of that in other countries.</p><p><strong>00:10:00 Dean Ball:</strong> I think this turns that fear up another meaningful degree, and probably not incorrectly, by the way, probably rightfully so. And so I kinda look at this and I think, well, now a lot of American companies might also have that concern, and so you certainly have a demand side of people who are gonna be like, &#8220;I get this. It is a risk to use anything where I have a commercial relationship. &#8216;Cause once I have a commercial relationship, the government can regulate that. Can I find some way of getting out of it?&#8221; I think there&#8217;s gonna be demand for that. Whether or not that demand produces supply, I think will depend on... It might just not be possible, that&#8217;s true. But I think you&#8217;ve never had a more favorable demand picture, and I suspect that on the margin, this probably will favor open in the longer run.</p><p><strong>00:10:44 Nathan Lambert:</strong> Yeah. So there&#8217;s a few ways that I think about this. I have this thing, like ATOM Project and all this other stuff I do, and it&#8217;s like, how do I meaningfully advocate for this? I think there&#8217;s something, like I work at AI2, and AI2 has budgets of order of a hundred million dollars and can train decent models. But if I wanted to redo an AI2, like my method for getting that type of money, it&#8217;s mostly gonna be like befriending a billionaire. And it seems like philanthropy dice roll in the near term is a way to get it. But then, like, maybe it really is some long slog of a multi-industrial consortium that takes a couple years off the ground and slowly, like, Google&#8217;s, or all these Netflix and all these five hundred billion dollar smaller companies are gonna give millions of dollars to have somebody else do it because they can&#8217;t get the billion dollars themselves, but they know they need to have it existed.</p><p><strong>00:11:31 Dean Ball:</strong> And sovereign wealth funds. Right. Sovereign wealth funds everywhere can do that, right? There&#8217;s trillions of dollars in sovereign wealth. There&#8217;s pension funds, public employee pension funds. A lot of people can chip into this and it&#8217;s possible. This is like, <a href=\"https://en.wikipedia.org/wiki/Yann_LeCun\">Yann LeCun</a> thinks this is the inevitable outcome. He thinks that the future is gonna be that some sort of global consortium gets together and builds this, because no one country is gonna be able to own it, because it&#8217;s gonna be too important. I&#8217;ve always kinda doubted that, and I&#8217;ve always thought that that outcome is probably a bad outcome for the world, honestly.</p><p><strong>00:12:06 Nathan Lambert:</strong> That&#8217;s a bad outcome for how good the AI is.</p><p><strong>00:12:09 Dean Ball:</strong> That&#8217;s correct. It&#8217;s a socialist outcome, you know? It&#8217;s not communism, but it is democratic socialism, and I&#8217;m not a democratic socialist, so I&#8217;m not a super big fan of that. But at the same time, I have to be honest that I kinda think that this probably does increase the odds of that precise outcome coming to bear.</p><p><strong>00:12:33 Nathan Lambert:</strong> I think something that comes sooner is that a lot of these super wealthy countries are gonna realize they can have real... Like, they can do some sort of sovereign AI and make some sort of noise, particularly starting with open models. I think there&#8217;s the Institute for Foundation Models, which is based on the UAE university system. Like, that&#8217;s--</p><p><strong>00:12:53 Dean Ball:</strong> That&#8217;s very UAE-coded, yeah.</p><p><strong>00:12:55 Nathan Lambert:</strong> They&#8217;ve been playing that for years, and they can keep doing this. Their models are gonna be pretty good, and I think there&#8217;s gonna be more people that do this. There&#8217;s the SWISS initiative in EU, which is on one hand doing a good job, on the other hand plagued by the most obvious European limitations of talent cycling and consortium life. I think these things are gonna become more of a thing in the next year, but I don&#8217;t know exactly how they impact the... They don&#8217;t impact the frontier of AI, but maybe they&#8217;re just like how the geopolitics and power of AI evolves. And I for some reason feel like open models need to be the thing that they&#8217;re gonna do because if they have a closed model that&#8217;s not as good, it doesn&#8217;t really give them any sort of power. But I don&#8217;t have a good enough world view for what that actually does, and if there&#8217;s more EU models, if India actually has their act together and trains a solid model. I don&#8217;t know what that does, but I feel like it&#8217;s probably gonna happen.</p><p><strong>00:13:54 Dean Ball:</strong> Yeah. I mean, it&#8217;s really super interesting &#8216;cause I think the other thing-- that will be inherently... I mean, it will be a Linux compared to a macOS, you know? It will not be as good of an experience for people. But then it becomes strange. Like, I don&#8217;t think macOS is as appealing of a thing if it&#8217;s viewed to be owned by the US government, right? And in fact, part of the reason I think that Apple is able to make its case quite credibly to consumers and businesses is they have resisted US government pressure to turn things over before. People might remember about a decade ago, there was this shooter in San Bernardino, California, and the FBI tried to force Apple to release iPhone data, and Apple said, &#8220;No, we&#8217;re not gonna expose this information.&#8221; Now, I think the FBI eventually just hacked it anyway, but that&#8217;s a separate issue. It&#8217;s a matter of principle here.</p><p><strong>00:15:01 Dean Ball:</strong> So yeah, I think it&#8217;s an interesting question: do we expect for the gap between the open frontier and the American closed frontier to widen in the near future, especially just because of how much compute they&#8217;re gonna have?</p><p><strong>00:15:30 Nathan Lambert:</strong> A hundred percent. And data and talent. Like, a hundred percent. It&#8217;s happening.</p><p><strong>00:15:34 Dean Ball:</strong> Data, talent. And it&#8217;s compounding, right? I mean, this has always been my view. And how much, I&#8217;m not sure, but I think it could be quite significant because these things are compounding benefits. And so if you expect them to just continue compounding, then all of a sudden it gets pretty bleak pretty quickly, would be my fear.</p><p><strong>00:16:00 Nathan Lambert:</strong> One of the... I mean, what&#8217;s your take on this? Why has it not compounded so much faster? Like, I feel like these three companies are spending, I don&#8217;t know, 10X what the Chinese labs are spending, and you only get like a little bit better model. Like, I believed so full-heartedly that Claude and ChatGPT and all these models are much better, and I expect them to become better by increasing margin, but it&#8217;s still confusing why they&#8217;re not already more ahead.</p><p><strong>00:16:29 Dean Ball:</strong> I go back and forth on this. Sometimes I think they are that ahead, and it&#8217;s just difficult to show up in benchmarks for the obvious reasons that benchmarks get chased. And like, I do feel that with the coding agents and with certain use cases, I do just feel like, wow, the American frontier is just way ahead, profoundly ahead of the Chinese frontier there. But there&#8217;s a lot of other things where you do kinda saturate how good you can be. I suspect that a very large fraction of AI usage is essentially glorified Google search. Even though I don&#8217;t think AI is glorified Google search, I suspect that a lot of what people use it for is that, at the consumer level. And it isn&#8217;t obvious to me how much better you can get at things like that. But my guess would be that over the next five years, I would guess the American labs really take off, in part because of compute, data, internal deployments for recursive self-improvement style stuff. And also, it&#8217;s amazing how we talk about that as just a normal thing now.</p><p><strong>00:18:05 Nathan Lambert:</strong> I think there will be a ceiling on it. Like, they&#8217;re gonna get a ton of improvement-- The gains are insane. It&#8217;s like, personally, at my job, I&#8217;ve been a lot of a research manager and just chasing shit down to get a model out the door. But now I can take on hard engineering tasks because I&#8217;m like, &#8220;Okay, might as well do this at the same time.&#8221; Like, going from zero to a hundred software engineers at anyone&#8217;s fingertips is worth a lot in terms of exploration. But the next, like, from a hundred to ten thousand is like, people can mess that up type thing. But that&#8217;s a huge gain.</p><p><strong>00:18:37 Dean Ball:</strong> I kind of agree. I think there&#8217;ll be a sigmoid there too. But then the other thing that will happen is, like, what I sort of wonder is will the AI companies, will the current model vendors, will they eventually become more like true infrastructure companies where what they actually do is they have models that design their own chips and models that design their own data centers and models that design their own successors. And so it&#8217;s this hugely vertically integrated thing, and what you&#8217;re really getting access to is not just the model itself, but you&#8217;re getting access to this highly optimized hardware, physical world infrastructure. And again, that&#8217;s kind of already the case, but does that become even more the case? And then that&#8217;s truly insurmountable for any open player. That&#8217;s definitionally insurmountable for an open player, and that becomes scary too. But again, this is why I&#8217;ve always felt so good about the position of the US closed source labs. This is why I&#8217;ve always been pretty bullish on them and have my concerns about open.</p><p><strong>00:20:07 Dean Ball:</strong> But to the extent the US government makes it impossible to trust closed source models, you do provide an advantage to open there. You&#8217;re giving a shot in the arm. If you like open source, you should hope that the supply chain risk designation against Anthropic is quite broad.</p><p><strong>00:20:09 Nathan Lambert:</strong> It&#8217;s a rough thing to hope for.</p><p><strong>00:20:09 Dean Ball:</strong> I mean, you shouldn&#8217;t actually hope for it, but I just mean, like, if that&#8217;s the only thing you care about in the world is open source, then--</p><p><strong>00:20:17 Nathan Lambert:</strong> I would say that anyone that only cares about open source probably is not thinking through any of these principles. It just gets really bad if you only have-- Like, AI is not gonna be meaningful lift to the economy and nor sustainable if everything is open. Like, if models are truly commoditized, things look kind of rough out there.</p><p><strong>00:20:36 Dean Ball:</strong> I think a world where models get commoditized is a really bleak world too, actually. And yeah, this is why I&#8217;m very worried about what the US government is doing. But I think that it helps on the margin, though. It probably helps on the margin in terms of waking people up. That still is my view.</p><p><strong>00:20:55 Nathan Lambert:</strong> I am a little surprised by the Qwen stuff, but I think there&#8217;s-- It&#8217;s like, at some point, I knew there was gonna be a year where a lot of the open model efforts just died because they&#8217;re just too expensive and too similar. But at the same time, having a lot of efforts that are somewhat similar but exploring a lot of the minor permutations in modeling space to figure out what works for people who use open models is actually quite good. I&#8217;m very bearish on the reflection style approach, which is build a lab, build an incredible model, drop it, make a bank selling it on-prem. Because on-prem is not that distinct from a business model as having a closed model. You could sell a closed model on-prem with the right IP controls. But then the person who actually wins open is by trying a whole bunch of tiny different things, understanding what is actually a meaningful differentiator in private data, in certain deployments and whatever, and then really iterating on that with a community. And that&#8217;s why I was like, Qwen is the closest to doing this by being so close to the community, and it&#8217;s so distinct from what a lot of the other labs are betting on.</p><p><strong>00:22:05 Nathan Lambert:</strong> But I see the pressure going away and kind of reducing diversity onto standards, because standards also make inference more efficient. Using open models is really rough. I think some of the best open models have really had rough launches. I think GPT-OSS had a horrible launch in terms of usability and is now one of the most popular models of all time. Qwen 3.5, it&#8217;s like researchers I work with are like, &#8220;Oh, let&#8217;s see if we can do some basic RL baselines on it,&#8221; and all the software stack is kinda broken. It takes a few weeks to get it going. And this is &#8216;cause all the models change differently, and closed labs just have such an advantage there &#8216;cause they should conceivably ship things on day one that work. I mean, don&#8217;t talk about Claude&#8217;s runtime, but that&#8217;s fine.</p><p><strong>00:22:42 Dean Ball:</strong> And don&#8217;t talk about the GPT-5 auto router either. But yeah, no, totally. I think that&#8217;s right.</p><p><strong>00:22:53 Dean Ball:</strong> I think fullness of time, I&#8217;m bullish on open source in the long run, fairly bearish in the next five years. The next five years are gonna matter quite a bit. And there is a lot of cope in both open source world and also... I don&#8217;t really hear it so much in open source world. I think open source world is actually more honest about this. But where the cope is so bad is in global civil society discourse. Like, I was in India for the AI Impact Summit recently, and they are just smoking the copium, being like, &#8220;We are gonna do everything on subfrontier open source models, and we&#8217;re just gonna diffuse those, and that&#8217;s all we&#8217;re gonna need in our economy.&#8221; And I just think that&#8217;s, if you&#8217;re India, that&#8217;s really not the bet you wanna make. I understand these are resource-constrained countries. They have a lot of acute constraints that they face, but nonetheless, I think that&#8217;s probably not a good bet.</p><p><strong>00:24:05 Nathan Lambert:</strong> Well, it&#8217;s even if those long tail models will work like manufacturing has worked, where it&#8217;s like Apple has put hundreds of billions of dollars into the manufacturing ecosystem in China to get absolute fine margins and scale. Like, if you really-- these things are gonna be used so much that that fine margin is actually gonna matter a lot, and it is not cheap to get that fine margin. You can&#8217;t just YOLO a DeepSeek V3 and spend five million dollars in compute and be done. It&#8217;s still gonna be expensive for a long time.</p><p><strong>00:24:34 Dean Ball:</strong> Yeah, it requires-- I think the Chinese approach, in the long run, if China&#8217;s gonna continue its strategy and they want to be competitive with the American frontier, they&#8217;re gonna have to fully socialize that, I think. I don&#8217;t think DeepSeek alone is gonna be able to do this, and I don&#8217;t think even Alibaba alone is gonna be able to do this. I think they&#8217;re going to need some sort of collective effort. Especially because of the export controls, the American export controls. They&#8217;re gonna have to centralize compute. They&#8217;re gonna have to centralize all these things, and talent and data and all that.</p><p><strong>00:25:17 Nathan Lambert:</strong> I don&#8217;t see it happening. Like, maybe someone gets officially AGI pilled, and I don&#8217;t know that much about China. But the things I know about China, it seems like that would be a big lift, and it would take a lot of time to actually do it. Like, all the companies would have to give up their biggest... All the cloud companies are like tech companies making a lot of money. They would be like, &#8220;We have to give up what?&#8221;</p><p><strong>00:25:42 Dean Ball:</strong> No, it would be a tough sell. Obviously, if the Chinese government decides they want to do it, they absolutely will. But in total, it will be a tough sell. My experience having had diplomatic engagements of many sorts with Chinese government-- and a lot of Chinese tech policy is actually not directly set by the government. It&#8217;s actually more kind of civil society, academia and civil society adjacent to government. Had a lot of conversations with folks like that, and they&#8217;re definitely... It&#8217;s largely not a very AGI-pilled crew. I think AGI-pilled-ness probably has a rough correlation with GDP per capita, and I think China is about where you would expect based on their GDP per capita, maybe a little bit ahead, but not very so. But if they ever do get AGI pilled, that&#8217;s the kind of thing that they could consider, but then that&#8217;s still a pretty extraordinary outcome because the Chinese government would have to be willing to make these things and then give it away. And I kinda just don&#8217;t think they will.</p><p><strong>00:27:11 Nathan Lambert:</strong> Yeah. I mean, all the politics of control with how everybody thinks AI is so powerful are pointing to very value-destructive actions economically in order to achieve the end state that people determine to be right. It&#8217;s like supporting open source to the extent that you can to avoid situations like Anthropic being labeled a supply chain risk and having interactions like that totally decimating runway of AI productivity. Like, if the companies are really gonna commit to open source for other things, then they&#8217;re gonna lose money. And I see this in-- China&#8217;s economy would be taking a gigantic hit doing this. And that&#8217;s kind of a common theme of what we&#8217;re talking about is that the interface of AI in an economic fashion is gonna make the next few years really weird.</p><p><strong>00:28:06 Dean Ball:</strong> I hope so.</p><p><strong>00:28:09 Nathan Lambert:</strong> I think things are gonna be weird, but I haven&#8217;t spent a ton of time thinking about how that interacts with political institutions. I thought about socially weird a lot, but I haven&#8217;t thought about power weird a lot.</p><p><strong>00:28:20 Dean Ball:</strong> Oh, power weird is what I worry about all the time. What I worry about the most is I think it&#8217;s plausible that what we&#8217;re seeing... I&#8217;ve always had this concern. I have this dual problem of-- maybe I&#8217;m talking out of both sides of my mouth. Maybe that&#8217;s just the critique, and it&#8217;s a fair critique. But I routinely complain about how people in government aren&#8217;t really... They pretend to take AI seriously, but they don&#8217;t take it that seriously. And they don&#8217;t really own the implications of advanced, of near term advanced AI and all that. I think we basically have transformative AI right now, but they don&#8217;t own that, because it&#8217;s annoying, it&#8217;s difficult, it&#8217;s conceptually challenging.</p><p><strong>00:29:08 Dean Ball:</strong> But the flip side of that is that if people do start to take it very seriously, there&#8217;s the risk that they sort of lash out, that they get scared, and they lash out and do things that are rash, in a rush. And that actually creates very, very bad, much worse outcomes than you otherwise might have gotten. I think that&#8217;s a very fair risk, and I think it&#8217;s possible that you might see things like that happen within the U.S. I don&#8217;t think this particular incident with Anthropic is quite an example of that. But it&#8217;s possible that you do see that in the coming years, and that is in and of itself a pretty scary outcome because if the U.S. government decides that they want to nationalize the frontier labs, I think it could be one of the most tyrannical things we ever see happen in this country.</p><p><strong>00:30:16 Nathan Lambert:</strong> Yeah. It&#8217;s like, I don&#8217;t know how to reply to this. I think things are... It&#8217;s serious times and I see so many... It feels like such a Sisyphean task to make more open models exist, but all the broader trends seem to point to that being a more stable equilibrium in a lot of ways. Like, good enough open models and keeping up with what we all feel happening in the closed model land.</p><p><strong>00:30:50 Nathan Lambert:</strong> So I don&#8217;t know. I stay motivated, but I feel increasingly lost in terms of achieving it.</p><p><strong>00:30:56 Dean Ball:</strong> I don&#8217;t think you should be. I think, look, I suspect the US government will not actually do it, and the best thing about America is that our general sort of-- I don&#8217;t wanna say incompetence, but the general sort of chaos of American institutions and decentralized confusingness of it all, it can often be quite frustrating, and it can sometimes be a detriment, but it can also be really great because we tend to not execute and follow through on our very worst ideas. And so I don&#8217;t think we&#8217;re going to do that. It doesn&#8217;t feel very American to do it. I worry about it because I worry about these rash reactions, and that&#8217;s why I fight as heavily as I do on things like this, despite not insignificant cost to me to do it, politically speaking. But that&#8217;s totally worth it because I care about this. I think everything, I think that will probably be fine. But yeah, I do agree. It&#8217;s a major risk. It&#8217;s a major risk, and it&#8217;s a weird world to think about, I&#8217;ll tell you that much.</p><p><strong>00:32:16 Nathan Lambert:</strong> Yeah. I don&#8217;t have a lot more to add. I&#8217;m sure we&#8217;ll continue this discussion. I think it warrants the space of it &#8216;cause that&#8217;s the... It&#8217;s one of the longer term things, but it&#8217;s not in the news cycle whatsoever, at least the open model angle. There&#8217;s just so many layers. People have to talk. Like, send feedback, people listening. I&#8217;ll even send this out as a podcast as well and just like, what do people think? How do we get to the places we want to get to?</p><p><strong>00:32:46 Dean Ball:</strong> Well, one thing I&#8217;m particularly interested in is-- one of the items in the Trump administration action plan, which I worked on for those who don&#8217;t have that context, is this idea of financializing compute, creating a financial market, like basically a commodities market for compute so that you can buy, you know, like really robust. In the same way that you can buy electricity spot, electricity futures and electricity on the spot market and things like this, the wholesale. Could you do something like that for compute? That could really profoundly change the dynamics and the economics of AI production. It&#8217;s not gonna turn them over. It doesn&#8217;t flip them on their head, but it changes it quite meaningfully. And I&#8217;m very excited by that prospect.</p><p><strong>00:33:48 Dean Ball:</strong> And that&#8217;s the kind of thing that I would be increasingly doing if this sort of interference of government into the frontier continues. What I suspect I&#8217;ll do is start developing some of those ideas which I developed earlier. I&#8217;m only one person. If those things start to seem relevant again, I totally will. Because anything to make it easier to produce AI for people that don&#8217;t have trillions of dollars will be extremely important.</p><p><strong>00:34:38 Nathan Lambert:</strong> Yeah. I think that... I don&#8217;t know. I&#8217;m happy to leave it there.</p><p><strong>00:34:43 Dean Ball:</strong> Cool.</p><p><strong>00:34:45 Nathan Lambert:</strong> I can let you get on your trip. It&#8217;s good to catch up. I&#8217;m early in the process of potentially coming to DC in a few months, so I will let you know if I do.</p><p><strong>00:34:52 Dean Ball:</strong> Oh, please do. It&#8217;d be great to see you. We can record an episode of my podcast live.</p><p><strong>00:34:58 Nathan Lambert:</strong> Sounds good. Okay. Thanks everybody for listening.</p><p><strong>00:35:03 Dean Ball:</strong> Talk to y&#8217;all later. Bye.</p>"
          },
          {
            "guid": {
              "#text": "https://www.interconnects.ai/p/olmo-hybrid-and-future-llm-architectures",
              "@_isPermaLink": "false"
            },
            "link": "https://www.interconnects.ai/p/olmo-hybrid-and-future-llm-architectures",
            "title": "Olmo Hybrid and future LLM architectures",
            "pubDate": "Thu, 05 Mar 2026 16:16:44 GMT",
            "enclosure": {
              "@_url": "https://substackcdn.com/image/fetch/$s_!7CIi!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F634f5655-f362-4494-80ff-38c095b9caaf_2846x1128.png",
              "@_type": "image/jpeg",
              "@_length": "0"
            },
            "dc:creator": "Nathan Lambert",
            "description": "The latest Olmo model and discussions at the frontier of open-source post training tools.",
            "content:encoded": "<p>So-called hybrid architectures are far from new in open-weight models these days. We now have the recent <a href=\"https://qwen.ai/blog?id=qwen3.5\">Qwen 3.5</a> (previewed by <a href=\"https://qwen.ai/blog?id=e34c4305036ce60d55a0791b170337c2b70ae51d&amp;from=home.latest-research-list\">Qwen3-Next</a>), <a href=\"https://arxiv.org/abs/2510.26692\">Kimi Linear</a> last fall (a smaller release than their <a href=\"https://www.interconnects.ai/p/kimi-k2-thinking-what-it-means\">flagship Kimi K2 models</a>), <a href=\"https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16\">Nvidia&#8217;s Nemotron 3 Nano</a> (with the bigger models expecting to drop soon), <a href=\"https://huggingface.co/ibm-granite/granite-4.0-tiny-preview\">IBM Granite 4</a>, and other less notable models. This is one of those times when a research trend looks like it&#8217;s getting adopted everywhere at once (maybe the Muon optimizer too, soon?).</p><p>To tell this story, we need to go back a few years to December 2023, when <a href=\"https://www.interconnects.ai/p/llms-beyond-attention\">Mamba and Striped Hyena</a> were taking the world by storm<a class=\"footnote-anchor\" data-component-name=\"FootnoteAnchorToDOM\" id=\"footnote-anchor-1\" href=\"#footnote-1\" target=\"_self\">1</a> &#8212; asking the question: Do we need full attention in our models? These early models fizzled out, partially for the same reasons they&#8217;re hard today &#8212; tricky implementations, open-source tool problems, more headaches in training &#8212; but also because the models fell over a bit when scaled up. The hybrid models of the day weren&#8217;t quite good enough yet.</p><p>These models are called hybrid because they mix these new recurrent neural network (RNN) modules with the traditional attention that made the transformer famous. They all work best with this mix of modules. The RNN layers keep part of the computation compressed in a hidden state to be used for the next token in the prediction &#8212; a summary of all information that came before &#8212; an idea that has an extremely long historical lineage in deep learning, e.g. back to the <a href=\"https://en.wikipedia.org/wiki/Long_short-term_memory\">LSTM</a>. This setup avoids the quadratic compute cost of attention (i.e. avoiding the incrementally expanding the KV cache per token of the attention operator), and can even assist in solving new problems.</p><p class=\"button-wrapper\" data-attrs=\"{&quot;url&quot;:&quot;https://www.interconnects.ai/p/olmo-hybrid-and-future-llm-architectures?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}\" data-component-name=\"ButtonCreateButton\"><a class=\"button primary\" href=\"https://www.interconnects.ai/p/olmo-hybrid-and-future-llm-architectures?utm_source=substack&utm_medium=email&utm_content=share&action=share\"><span>Share</span></a></p><p>The models listed to start this article use a mix of RNN approaches, some models (Qwen and Kimi) use a newer idea called Gated DeltaNet (GDN) and some still use Mamba layers (Granite and Nemotron). The Olmo Hybrid model we&#8217;re releasing today also falls on the GDN side, based on careful experimentation, and theory that GDN is capable of learning features that attention or Mamba layers cannot.</p><h2>Introducing Olmo Hybrid and its pretraining efficiency</h2><p>Olmo Hybrid is a 7B base model, with 3 experiment post-trained checkpoints released &#8212; starting with an Instruct model, with a reasoning model coming soon. It is the best open artifact for studying hybrid models, as it is almost identical to our <a href=\"https://www.interconnects.ai/p/olmo-3-americas-truly-open-reasoning\">Olmo 3 7B model</a> from last fall, just with a change in architecture. With the model, we are releasing a paper with substantial theory on <em>why</em> hybrid models can be better than standard transformers. This is a long paper that I&#8217;m still personally working through, but it&#8217;s excellent. </p><p>You can read the paper <a href=\"https://allenai.org/papers/olmo-hybrid\">here</a> and poke around with the checkpoints <a href=\"https://huggingface.co/collections/allenai/olmo-hybrid\">here</a>. This is an incredible, long-term research project led by <a href=\"https://lambdaviking.com/\">Will Merrill</a>. He did a great job.</p><p>To understand the context of why hybrid models can be a strict upgrade on transformers, let me begin with a longer excerpt from the paper&#8217;s introduction, emphasis mine:</p><blockquote><p>Past theoretical work has shown that attention and recurrence have complementary strengths (Merrill et al., 2024; Grazzi et al., 2025), so mixing them is a natural way to construct an architecture with the benefits of both primitives. <strong>We further derive novel theoretical results showing that hybrid models are even more powerful than the sum of their parts</strong>: there are formal problems related to code evaluation that neither transformers nor GDN can express on their own, but which hybrid models can represent theoretically and learn empirically. <strong>But</strong> <strong>this greater expressivity does not immediately imply that hybrid models should be better LMs: thus, we run fully controlled scaling studies comparing hybrid models vs. transformers</strong>, showing rigorously that hybrid models&#8217; expressivity translates to better token efficiency, in agreement with our observations from the Olmo Hybrid pretraining run. Finally, we provide a theoretical explanation for why increasing an architecture&#8217;s expressive power should improve language model scaling rooted in the multi-task nature of the language modeling objective.</p><p>Taken together, our results suggest that hybrid models dominate transformers, both theoretically, in their balance of expressivity and parallelism, and empirically, in terms of benchmark performance and long-context abilities. We believe these findings position hybrid models for wider adoption and call on the research community to pursue further architecture research.</p></blockquote><p>Essentially, we show and argue a few things:</p><ol><li><p><strong>Hybrid models are more expressive.</strong> They can form their outputs to learn more types of functions. An intuition for why this would be good could follow: More expressive models are good with deep learning because we want to make the model class as flexible as possible and let the optimizer do the work rather than constraints on the learner. Sounds a lot like the <a href=\"http://www.incompleteideas.net/IncIdeas/BitterLesson.html\">Bitter Lesson</a>.</p></li><li><p><strong>Why does expressive power help with efficiency?</strong> This is where things are more nuanced. We argue that more expressive models will have better scaling laws, following the <em><a href=\"https://arxiv.org/abs/2303.13506\">quantization model</a></em><a href=\"https://arxiv.org/abs/2303.13506\"> of neural scaling</a>.</p></li></ol><p>All of this theory work is a great way to go deeper, and frankly I have a lot more to learn on it, but the crucial part is that we transition from theory to clear experiments that back it up. Particularly the scaling laws for designing this model were studied carefully to decide on the final hybrid architecture. The final performance is very sensitive to exactly which RNN block is used and in what quantity.</p><p>In scaling experiments, the results showed that for Olmo, the hybrid GDN (3:1 ratio of layers) &gt; pure GDN (all RNN layers) &gt; standard transformer (all attention) &gt; hybrid Mamba2 &gt; pure Mamba2. The crucial point was that these gaps maintained when scaling to more parameters and compute. A visual summary of the different types of architectures studied is below.</p><div class=\"captioned-image-container\"><figure><a class=\"image-link image2 is-viewable-img\" target=\"_blank\" href=\"https://substackcdn.com/image/fetch/$s_!7CIi!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F634f5655-f362-4494-80ff-38c095b9caaf_2846x1128.png\" data-component-name=\"Image2ToDOM\"><div class=\"image2-inset\"><picture><source type=\"image/webp\" srcset=\"https://substackcdn.com/image/fetch/$s_!7CIi!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F634f5655-f362-4494-80ff-38c095b9caaf_2846x1128.png 424w, https://substackcdn.com/image/fetch/$s_!7CIi!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F634f5655-f362-4494-80ff-38c095b9caaf_2846x1128.png 848w, https://substackcdn.com/image/fetch/$s_!7CIi!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F634f5655-f362-4494-80ff-38c095b9caaf_2846x1128.png 1272w, https://substackcdn.com/image/fetch/$s_!7CIi!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F634f5655-f362-4494-80ff-38c095b9caaf_2846x1128.png 1456w\" sizes=\"100vw\"><img src=\"https://substackcdn.com/image/fetch/$s_!7CIi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F634f5655-f362-4494-80ff-38c095b9caaf_2846x1128.png\" width=\"1456\" height=\"577\" data-attrs=\"{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/634f5655-f362-4494-80ff-38c095b9caaf_2846x1128.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:577,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:293613,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.interconnects.ai/i/189829062?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F634f5655-f362-4494-80ff-38c095b9caaf_2846x1128.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}\" class=\"sizing-normal\" alt=\"\" srcset=\"https://substackcdn.com/image/fetch/$s_!7CIi!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F634f5655-f362-4494-80ff-38c095b9caaf_2846x1128.png 424w, https://substackcdn.com/image/fetch/$s_!7CIi!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F634f5655-f362-4494-80ff-38c095b9caaf_2846x1128.png 848w, https://substackcdn.com/image/fetch/$s_!7CIi!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F634f5655-f362-4494-80ff-38c095b9caaf_2846x1128.png 1272w, https://substackcdn.com/image/fetch/$s_!7CIi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F634f5655-f362-4494-80ff-38c095b9caaf_2846x1128.png 1456w\" sizes=\"100vw\" loading=\"lazy\"></picture><div class=\"image-link-expand\"><div class=\"pencraft pc-display-flex pc-gap-8 pc-reset\"><button tabindex=\"0\" type=\"button\" class=\"pencraft pc-reset pencraft icon-container restack-image\"><svg role=\"img\" width=\"20\" height=\"20\" viewBox=\"0 0 20 20\" fill=\"none\" stroke-width=\"1.5\" stroke=\"var(--color-fg-primary)\" stroke-linecap=\"round\" stroke-linejoin=\"round\" xmlns=\"http://www.w3.org/2000/svg\"><g><title></title><path d=\"M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882\"></path></g></svg></button><button tabindex=\"0\" type=\"button\" class=\"pencraft pc-reset pencraft icon-container view-image\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"20\" height=\"20\" viewBox=\"0 0 24 24\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"2\" stroke-linecap=\"round\" stroke-linejoin=\"round\" class=\"lucide lucide-maximize2 lucide-maximize-2\"><polyline points=\"15 3 21 3 21 9\"></polyline><polyline points=\"9 21 3 21 3 15\"></polyline><line x1=\"21\" x2=\"14\" y1=\"3\" y2=\"10\"></line><line x1=\"3\" x2=\"10\" y1=\"21\" y2=\"14\"></line></svg></button></div></div></div></a></figure></div><p>In terms of this specific model, the pretraining gains were giant! Relative to Olmo 3 dense, it represents an about 2X gain on training efficiency. When you look at evaluation performance for pretraining, there was also substantial improvement in performance, particularly after long context extension (the final 2 rows of Table 2 in the paper, highlighted below).</p><div class=\"captioned-image-container\"><figure><a class=\"image-link image2 is-viewable-img\" target=\"_blank\" href=\"https://substackcdn.com/image/fetch/$s_!IgMs!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F072051d6-3788-4ab4-9587-c051f282b3b8_2906x2370.png\" data-component-name=\"Image2ToDOM\"><div class=\"image2-inset\"><picture><source type=\"image/webp\" srcset=\"https://substackcdn.com/image/fetch/$s_!IgMs!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F072051d6-3788-4ab4-9587-c051f282b3b8_2906x2370.png 424w, https://substackcdn.com/image/fetch/$s_!IgMs!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F072051d6-3788-4ab4-9587-c051f282b3b8_2906x2370.png 848w, https://substackcdn.com/image/fetch/$s_!IgMs!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F072051d6-3788-4ab4-9587-c051f282b3b8_2906x2370.png 1272w, https://substackcdn.com/image/fetch/$s_!IgMs!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F072051d6-3788-4ab4-9587-c051f282b3b8_2906x2370.png 1456w\" sizes=\"100vw\"><img src=\"https://substackcdn.com/image/fetch/$s_!IgMs!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F072051d6-3788-4ab4-9587-c051f282b3b8_2906x2370.png\" width=\"1456\" height=\"1187\" data-attrs=\"{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/072051d6-3788-4ab4-9587-c051f282b3b8_2906x2370.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1187,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:904194,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.interconnects.ai/i/189829062?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F072051d6-3788-4ab4-9587-c051f282b3b8_2906x2370.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}\" class=\"sizing-normal\" alt=\"\" srcset=\"https://substackcdn.com/image/fetch/$s_!IgMs!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F072051d6-3788-4ab4-9587-c051f282b3b8_2906x2370.png 424w, https://substackcdn.com/image/fetch/$s_!IgMs!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F072051d6-3788-4ab4-9587-c051f282b3b8_2906x2370.png 848w, https://substackcdn.com/image/fetch/$s_!IgMs!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F072051d6-3788-4ab4-9587-c051f282b3b8_2906x2370.png 1272w, https://substackcdn.com/image/fetch/$s_!IgMs!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F072051d6-3788-4ab4-9587-c051f282b3b8_2906x2370.png 1456w\" sizes=\"100vw\" loading=\"lazy\"></picture><div class=\"image-link-expand\"><div class=\"pencraft pc-display-flex pc-gap-8 pc-reset\"><button tabindex=\"0\" type=\"button\" class=\"pencraft pc-reset pencraft icon-container restack-image\"><svg role=\"img\" width=\"20\" height=\"20\" viewBox=\"0 0 20 20\" fill=\"none\" stroke-width=\"1.5\" stroke=\"var(--color-fg-primary)\" stroke-linecap=\"round\" stroke-linejoin=\"round\" xmlns=\"http://www.w3.org/2000/svg\"><g><title></title><path d=\"M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882\"></path></g></svg></button><button tabindex=\"0\" type=\"button\" class=\"pencraft pc-reset pencraft icon-container view-image\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"20\" height=\"20\" viewBox=\"0 0 24 24\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"2\" stroke-linecap=\"round\" stroke-linejoin=\"round\" class=\"lucide lucide-maximize2 lucide-maximize-2\"><polyline points=\"15 3 21 3 21 9\"></polyline><polyline points=\"9 21 3 21 3 15\"></polyline><line x1=\"21\" x2=\"14\" y1=\"3\" y2=\"10\"></line><line x1=\"3\" x2=\"10\" y1=\"21\" y2=\"14\"></line></svg></button></div></div></div></a></figure></div><h2>The journey to post-training Olmo Hybrid</h2><p>Most of the experience in post-training Olmo models has been climbing up a steep curve in base model capabilities with minor tweaks to architecture. Our recipes from <a href=\"https://arxiv.org/abs/2311.10702\">Tulu 2</a>, <a href=\"https://arxiv.org/abs/2411.15124\">Tulu 3</a>, and the Olmo 3 reasoning work (building substantially on <a href=\"https://arxiv.org/abs/2506.04178\">OpenThoughts 3</a>) all worked in a fairly straightforward, off the shelf manner. Olmo Hybrid is our first experience in post-training a substantially different architecture, and the results were mixed. </p><h3>1. Benchmark performance</h3><p>Following the Olmo 3 recipe, we got some substantial wins (knowledge) and some substantial losses (extended reasoning) relative to the dense model. All together these still represent a very strong fully open model &#8212; just that the pretraining gains didn&#8217;t translate as obviously. The results are below.</p><div class=\"captioned-image-container\"><figure><a class=\"image-link image2 is-viewable-img\" target=\"_blank\" href=\"https://substackcdn.com/image/fetch/$s_!BSEJ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b5485f6-9d57-45c9-a686-a51754acc4cb_3992x1562.png\" data-component-name=\"Image2ToDOM\"><div class=\"image2-inset\"><picture><source type=\"image/webp\" srcset=\"https://substackcdn.com/image/fetch/$s_!BSEJ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b5485f6-9d57-45c9-a686-a51754acc4cb_3992x1562.png 424w, https://substackcdn.com/image/fetch/$s_!BSEJ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b5485f6-9d57-45c9-a686-a51754acc4cb_3992x1562.png 848w, https://substackcdn.com/image/fetch/$s_!BSEJ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b5485f6-9d57-45c9-a686-a51754acc4cb_3992x1562.png 1272w, https://substackcdn.com/image/fetch/$s_!BSEJ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b5485f6-9d57-45c9-a686-a51754acc4cb_3992x1562.png 1456w\" sizes=\"100vw\"><img src=\"https://substackcdn.com/image/fetch/$s_!BSEJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b5485f6-9d57-45c9-a686-a51754acc4cb_3992x1562.png\" width=\"1456\" height=\"570\" data-attrs=\"{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4b5485f6-9d57-45c9-a686-a51754acc4cb_3992x1562.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:570,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:829143,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.interconnects.ai/i/189829062?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b5485f6-9d57-45c9-a686-a51754acc4cb_3992x1562.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}\" class=\"sizing-normal\" alt=\"\" srcset=\"https://substackcdn.com/image/fetch/$s_!BSEJ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b5485f6-9d57-45c9-a686-a51754acc4cb_3992x1562.png 424w, https://substackcdn.com/image/fetch/$s_!BSEJ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b5485f6-9d57-45c9-a686-a51754acc4cb_3992x1562.png 848w, https://substackcdn.com/image/fetch/$s_!BSEJ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b5485f6-9d57-45c9-a686-a51754acc4cb_3992x1562.png 1272w, https://substackcdn.com/image/fetch/$s_!BSEJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b5485f6-9d57-45c9-a686-a51754acc4cb_3992x1562.png 1456w\" sizes=\"100vw\" loading=\"lazy\"></picture><div class=\"image-link-expand\"><div class=\"pencraft pc-display-flex pc-gap-8 pc-reset\"><button tabindex=\"0\" type=\"button\" class=\"pencraft pc-reset pencraft icon-container restack-image\"><svg role=\"img\" width=\"20\" height=\"20\" viewBox=\"0 0 20 20\" fill=\"none\" stroke-width=\"1.5\" stroke=\"var(--color-fg-primary)\" stroke-linecap=\"round\" stroke-linejoin=\"round\" xmlns=\"http://www.w3.org/2000/svg\"><g><title></title><path d=\"M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882\"></path></g></svg></button><button tabindex=\"0\" type=\"button\" class=\"pencraft pc-reset pencraft icon-container view-image\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"20\" height=\"20\" viewBox=\"0 0 24 24\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"2\" stroke-linecap=\"round\" stroke-linejoin=\"round\" class=\"lucide lucide-maximize2 lucide-maximize-2\"><polyline points=\"15 3 21 3 21 9\"></polyline><polyline points=\"9 21 3 21 3 15\"></polyline><line x1=\"21\" x2=\"14\" y1=\"3\" y2=\"10\"></line><line x1=\"3\" x2=\"10\" y1=\"21\" y2=\"14\"></line></svg></button></div></div></div></a></figure></div><p>The exact reason why this happens is a research question. Our best guess is that the Olmo Hybrid base model is just a sufficiently different student model, where most of our post training data at early stages is learning from stronger &#8220;teacher&#8221; models (a recap of this method, called <a href=\"https://www.interconnects.ai/p/how-much-does-distillation-really\">distillation</a>, appeared recently in Interconnects). </p><p>There is a lot of other research ongoing in the community around what makes a strong teacher model &#8212; generally, the best overall model is <em>not </em>the best teacher.  In other words, training on data outputted from the model with best evaluation scores today is unlikely to unlock the ceiling in performance for your new base model. A second factor, which is even less explored, is how different base models likely need different teachers to learn from. This is why Olmo Hybrid could perform very differently, where it&#8217;s behavior is downstream of an architecture-based learning change, where the pretraining data is almost identical.</p><p>There&#8217;s A LOT more work to dig into here, some <a href=\"https://www.openthoughts.ai/blog/agent\">empirical work in generating better data</a> and other work in understanding how <a href=\"https://x.com/_emliu/status/2026359480363913531?s=46&amp;t=0Enn1cSa9nnKjGPrLHWfng\">different training stages fit together</a>. I am confident this Olmo Hybrid base model is solid and more performance can be extracted, but it takes more careful work adapting existing datasets.</p><h3>2. Open-source tooling </h3><p>The frank reality of new architectures for open models is that the open-source software tooling support is horrific. There&#8217;s the paper-cuts that people are familiar with, e.g. random errors in popular libraries (as people experienced with GPT-OSS) that slow adoption, but there are also deeper problems.</p><p>A large part of the potential benefit of hybrid models is the reduction in memory usage for long-context generation, which is crucial for reinforcement learning and agentic tasks. It should be a huge win for post-training! This, unfortunately, is far from the case, and will likely take another 3-6months to get right for this batch of GDN models.</p><p>The core problem is that the open-source inference tools, e.g. VLLM, are relying on far less developed kernels (and other internals) when compared to standard transformers. This comes with two challenges &#8212; throughput slowdowns and numerical issues. Numerical issues can be combatted with a variety of inference flags. Quoting the paper again:</p><blockquote><p>The two key flags in VLLM we needed to get maximum performance with the post-training model were <code>--disable-cascade-attn</code>, which disables cascade attention (an optimization for shared prompt prefixes), and -<code>-enforce-eager</code>, which turns off CUDA graphs. These two flags have been used in our RL setup dating back to Olmo 3, but are new additions to evaluations. Scores for the released models drop precipitously without them. We also evaluated our final models with the hybrid model cache in the richer FP32 datatype, to improve stability via <code>--mamba_ssm_cache_dtype</code> following NVIDIA.</p></blockquote><p>Essentially, we used these to make sure the model was numerically stable. The downside is that the inference throughput plummets, so the potential gains in compute efficiency are erased. A comparison of numbers is below.</p><div class=\"captioned-image-container\"><figure><a class=\"image-link image2 is-viewable-img\" target=\"_blank\" href=\"https://substackcdn.com/image/fetch/$s_!00Cf!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff5554664-f970-4321-9863-c08c8239c17f_3902x2440.png\" data-component-name=\"Image2ToDOM\"><div class=\"image2-inset\"><picture><source type=\"image/webp\" srcset=\"https://substackcdn.com/image/fetch/$s_!00Cf!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff5554664-f970-4321-9863-c08c8239c17f_3902x2440.png 424w, https://substackcdn.com/image/fetch/$s_!00Cf!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff5554664-f970-4321-9863-c08c8239c17f_3902x2440.png 848w, https://substackcdn.com/image/fetch/$s_!00Cf!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff5554664-f970-4321-9863-c08c8239c17f_3902x2440.png 1272w, https://substackcdn.com/image/fetch/$s_!00Cf!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff5554664-f970-4321-9863-c08c8239c17f_3902x2440.png 1456w\" sizes=\"100vw\"><img src=\"https://substackcdn.com/image/fetch/$s_!00Cf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff5554664-f970-4321-9863-c08c8239c17f_3902x2440.png\" width=\"1456\" height=\"910\" data-attrs=\"{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f5554664-f970-4321-9863-c08c8239c17f_3902x2440.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:910,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:899739,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.interconnects.ai/i/189829062?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff5554664-f970-4321-9863-c08c8239c17f_3902x2440.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}\" class=\"sizing-normal\" alt=\"\" title=\"\" srcset=\"https://substackcdn.com/image/fetch/$s_!00Cf!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff5554664-f970-4321-9863-c08c8239c17f_3902x2440.png 424w, https://substackcdn.com/image/fetch/$s_!00Cf!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff5554664-f970-4321-9863-c08c8239c17f_3902x2440.png 848w, https://substackcdn.com/image/fetch/$s_!00Cf!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff5554664-f970-4321-9863-c08c8239c17f_3902x2440.png 1272w, https://substackcdn.com/image/fetch/$s_!00Cf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff5554664-f970-4321-9863-c08c8239c17f_3902x2440.png 1456w\" sizes=\"100vw\" loading=\"lazy\"></picture><div class=\"image-link-expand\"><div class=\"pencraft pc-display-flex pc-gap-8 pc-reset\"><button tabindex=\"0\" type=\"button\" class=\"pencraft pc-reset pencraft icon-container restack-image\"><svg role=\"img\" width=\"20\" height=\"20\" viewBox=\"0 0 20 20\" fill=\"none\" stroke-width=\"1.5\" stroke=\"var(--color-fg-primary)\" stroke-linecap=\"round\" stroke-linejoin=\"round\" xmlns=\"http://www.w3.org/2000/svg\"><g><title></title><path d=\"M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882\"></path></g></svg></button><button tabindex=\"0\" type=\"button\" class=\"pencraft pc-reset pencraft icon-container view-image\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"20\" height=\"20\" viewBox=\"0 0 24 24\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"2\" stroke-linecap=\"round\" stroke-linejoin=\"round\" class=\"lucide lucide-maximize2 lucide-maximize-2\"><polyline points=\"15 3 21 3 21 9\"></polyline><polyline points=\"9 21 3 21 3 15\"></polyline><line x1=\"21\" x2=\"14\" y1=\"3\" y2=\"10\"></line><line x1=\"3\" x2=\"10\" y1=\"21\" y2=\"14\"></line></svg></button></div></div></div></a><figcaption class=\"image-caption\">Data for this is available <a href=\"https://gist.github.com/natolambert/0a6ad2e9f513d7a72b76d9e3a7b0bbb1\">here</a>.</figcaption></figure></div><p>Effectively, the 7B hybrid model today takes more compute to train with RL than our 7B dense model (that doesn&#8217;t even have a common memory saving technique, GQA). The total compute estimate from the table at different context lengths is below (more visuals in the <a href=\"https://docs.google.com/presentation/d/1K3bM3K7q_CBcXzUCX7a1YvUHAycpvTKZbJElKSOdiok/edit\">slides from my recent CMU talk</a>).</p><div class=\"captioned-image-container\"><figure><a class=\"image-link image2 is-viewable-img\" target=\"_blank\" href=\"https://substackcdn.com/image/fetch/$s_!GmWW!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b42e8e5-d75c-433c-a41a-1ae5aa18b114_3571x2073.png\" data-component-name=\"Image2ToDOM\"><div class=\"image2-inset\"><picture><source type=\"image/webp\" srcset=\"https://substackcdn.com/image/fetch/$s_!GmWW!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b42e8e5-d75c-433c-a41a-1ae5aa18b114_3571x2073.png 424w, https://substackcdn.com/image/fetch/$s_!GmWW!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b42e8e5-d75c-433c-a41a-1ae5aa18b114_3571x2073.png 848w, https://substackcdn.com/image/fetch/$s_!GmWW!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b42e8e5-d75c-433c-a41a-1ae5aa18b114_3571x2073.png 1272w, https://substackcdn.com/image/fetch/$s_!GmWW!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b42e8e5-d75c-433c-a41a-1ae5aa18b114_3571x2073.png 1456w\" sizes=\"100vw\"><img src=\"https://substackcdn.com/image/fetch/$s_!GmWW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b42e8e5-d75c-433c-a41a-1ae5aa18b114_3571x2073.png\" width=\"1456\" height=\"845\" data-attrs=\"{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7b42e8e5-d75c-433c-a41a-1ae5aa18b114_3571x2073.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:845,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:317359,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.interconnects.ai/i/189829062?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b42e8e5-d75c-433c-a41a-1ae5aa18b114_3571x2073.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}\" class=\"sizing-normal\" alt=\"\" srcset=\"https://substackcdn.com/image/fetch/$s_!GmWW!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b42e8e5-d75c-433c-a41a-1ae5aa18b114_3571x2073.png 424w, https://substackcdn.com/image/fetch/$s_!GmWW!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b42e8e5-d75c-433c-a41a-1ae5aa18b114_3571x2073.png 848w, https://substackcdn.com/image/fetch/$s_!GmWW!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b42e8e5-d75c-433c-a41a-1ae5aa18b114_3571x2073.png 1272w, https://substackcdn.com/image/fetch/$s_!GmWW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b42e8e5-d75c-433c-a41a-1ae5aa18b114_3571x2073.png 1456w\" sizes=\"100vw\" loading=\"lazy\"></picture><div class=\"image-link-expand\"><div class=\"pencraft pc-display-flex pc-gap-8 pc-reset\"><button tabindex=\"0\" type=\"button\" class=\"pencraft pc-reset pencraft icon-container restack-image\"><svg role=\"img\" width=\"20\" height=\"20\" viewBox=\"0 0 20 20\" fill=\"none\" stroke-width=\"1.5\" stroke=\"var(--color-fg-primary)\" stroke-linecap=\"round\" stroke-linejoin=\"round\" xmlns=\"http://www.w3.org/2000/svg\"><g><title></title><path d=\"M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882\"></path></g></svg></button><button tabindex=\"0\" type=\"button\" class=\"pencraft pc-reset pencraft icon-container view-image\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"20\" height=\"20\" viewBox=\"0 0 24 24\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"2\" stroke-linecap=\"round\" stroke-linejoin=\"round\" class=\"lucide lucide-maximize2 lucide-maximize-2\"><polyline points=\"15 3 21 3 21 9\"></polyline><polyline points=\"9 21 3 21 3 15\"></polyline><line x1=\"21\" x2=\"14\" y1=\"3\" y2=\"10\"></line><line x1=\"3\" x2=\"10\" y1=\"21\" y2=\"14\"></line></svg></button></div></div></div></a></figure></div><p>The good news is that these are solvable problems &#8212; and improving the tooling could even improve benchmark numbers &#8212; but it&#8217;s going to take a good bit of time and hard work in the OSS community. </p><p>This leads to my final question. If I&#8217;m optimistic about the open ecosystem evolving to support these models with ease, motivated by the better fundamental scaling of the architectures and a large cluster of leading open model builders already using it, are closed models like GPT and Claude built like this? </p><p>To be clear, this answer is a total guess (which I don&#8217;t normally do), but with the evidence I have I&#8217;d put the chance of one of the 3 frontier models being an RNN being around a coin flip. I&#8217;ll let you know if I learn for sure either way. If the scaling advantages hold at frontier scale, the economic case becomes hard to ignore, but they could already have architectures that are efficient like RNNs, but with even more benefits.</p><div><hr></div><p>I&#8217;m going to follow up this post with more architecture discussions, particularly on why Mixture of Expert (MoE) models are a major headache to post-train, so make sure to subscribe if that sounds interesting to you!</p><p class=\"button-wrapper\" data-attrs=\"{&quot;url&quot;:&quot;https://www.interconnects.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}\" data-component-name=\"ButtonCreateButton\"><a class=\"button primary\" href=\"https://www.interconnects.ai/subscribe?\"><span>Subscribe now</span></a></p><p><em>Thanks to Will Merrill and Finbarr Timbers for some discussions that helped inform this post.</em></p><div class=\"footnote\" data-component-name=\"FootnoteToDOM\"><a id=\"footnote-1\" href=\"#footnote-anchor-1\" class=\"footnote-number\" contenteditable=\"false\" target=\"_self\">1</a><div class=\"footnote-content\"><p>and still my <a href=\"https://www.youtube.com/watch?v=OFFHiJzPpCQ&amp;list=PLlp6Ex8YB3QOH0SibhH3oDZucFrqc8K9v&amp;index=17&amp;t=1s&amp;pp=iAQB\">most-viewed interview</a> on YouTube, as the first one I did.</p><p></p></div></div>"
          },
          {
            "guid": {
              "#text": "https://www.interconnects.ai/p/latest-open-artifacts-19-qwen-35",
              "@_isPermaLink": "false"
            },
            "link": "https://www.interconnects.ai/p/latest-open-artifacts-19-qwen-35",
            "title": "Latest open artifacts (#19): Qwen 3.5, GLM 5, MiniMax 2.5 — Chinese labs' latest push of the frontier",
            "pubDate": "Tue, 03 Mar 2026 16:30:59 GMT",
            "enclosure": {
              "@_url": "https://substackcdn.com/image/fetch/$s_!e_rH!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8c9fc29-d070-40a9-aa6e-548ec4a81714_1792x629.jpeg",
              "@_type": "image/jpeg",
              "@_length": "0"
            },
            "dc:creator": "Florian Brand",
            "description": "Welcome to the year of the horse!",
            "content:encoded": "<p>It&#8217;s been a busy month at the top end of open-weights AI &#8212; with new flagship models from all of Qwen, MiniMax, Z.ai, Ant Ling, and StepFun. Still, all eyes are on DeepSeek V4&#8217;s pending release, which rumors continue to accelerate towards. Outside of the large, frontier models, this issue is a bit lighter on the long-tail of niche modalities and model sizes.</p><p class=\"button-wrapper\" data-attrs=\"{&quot;url&quot;:&quot;https://www.interconnects.ai/p/latest-open-artifacts-19-qwen-35?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}\" data-component-name=\"ButtonCreateButton\"><a class=\"button primary\" href=\"https://www.interconnects.ai/p/latest-open-artifacts-19-qwen-35?utm_source=substack&utm_medium=email&utm_content=share&action=share\"><span>Share</span></a></p><p>With all these new releases, we&#8217;re tracking them with our new <a href=\"https://atomproject.ai/relative-adoption-metric\">Relative Adoption Metrics (RAM)</a>, a measurement tool that normalizes model downloads relative to peer models in their size class. This has already been an extremely useful tool for us, highlighting underrated models like GPT-OSS, which is literally off the charts in how downloaded it is &#8212; the most popular American open-weights model since Llama 3.1. A RAM score &gt;1 means the model is on track to be a top 10 all-time downloaded model in its size class. We&#8217;re particularly interested to see how the early adoption of the smaller Qwen 3.5 dense models will go relative to Qwen 3 &#8212; balancing Qwen&#8217;s ever growing brand with a trickier, hybrid model architecture that can push the limits of some open-source tools.</p><p>A summary of the RAM scores for some of the popular models released late in 2025 is below, highlighting Kimi K2 Thinking and some OCR models as clear winners. DeepSeek V3.2, and their other recent large models, have wildly underperformed DeepSeek&#8217;s earlier releases in 2025.</p><div class=\"captioned-image-container\"><figure><a class=\"image-link image2 is-viewable-img\" target=\"_blank\" href=\"https://substackcdn.com/image/fetch/$s_!eppK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F726fcde9-2645-4c77-9779-03882beb295b_2554x1414.png\" data-component-name=\"Image2ToDOM\"><div class=\"image2-inset\"><picture><source type=\"image/webp\" srcset=\"https://substackcdn.com/image/fetch/$s_!eppK!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F726fcde9-2645-4c77-9779-03882beb295b_2554x1414.png 424w, https://substackcdn.com/image/fetch/$s_!eppK!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F726fcde9-2645-4c77-9779-03882beb295b_2554x1414.png 848w, https://substackcdn.com/image/fetch/$s_!eppK!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F726fcde9-2645-4c77-9779-03882beb295b_2554x1414.png 1272w, https://substackcdn.com/image/fetch/$s_!eppK!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F726fcde9-2645-4c77-9779-03882beb295b_2554x1414.png 1456w\" sizes=\"100vw\"><img src=\"https://substackcdn.com/image/fetch/$s_!eppK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F726fcde9-2645-4c77-9779-03882beb295b_2554x1414.png\" width=\"1456\" height=\"806\" data-attrs=\"{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/726fcde9-2645-4c77-9779-03882beb295b_2554x1414.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:806,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:285255,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.interconnects.ai/i/189756490?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F726fcde9-2645-4c77-9779-03882beb295b_2554x1414.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}\" class=\"sizing-normal\" alt=\"\" srcset=\"https://substackcdn.com/image/fetch/$s_!eppK!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F726fcde9-2645-4c77-9779-03882beb295b_2554x1414.png 424w, https://substackcdn.com/image/fetch/$s_!eppK!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F726fcde9-2645-4c77-9779-03882beb295b_2554x1414.png 848w, https://substackcdn.com/image/fetch/$s_!eppK!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F726fcde9-2645-4c77-9779-03882beb295b_2554x1414.png 1272w, https://substackcdn.com/image/fetch/$s_!eppK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F726fcde9-2645-4c77-9779-03882beb295b_2554x1414.png 1456w\" sizes=\"100vw\" fetchpriority=\"high\"></picture><div class=\"image-link-expand\"><div class=\"pencraft pc-display-flex pc-gap-8 pc-reset\"><button tabindex=\"0\" type=\"button\" class=\"pencraft pc-reset pencraft icon-container restack-image\"><svg role=\"img\" width=\"20\" height=\"20\" viewBox=\"0 0 20 20\" fill=\"none\" stroke-width=\"1.5\" stroke=\"var(--color-fg-primary)\" stroke-linecap=\"round\" stroke-linejoin=\"round\" xmlns=\"http://www.w3.org/2000/svg\"><g><title></title><path d=\"M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882\"></path></g></svg></button><button tabindex=\"0\" type=\"button\" class=\"pencraft pc-reset pencraft icon-container view-image\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"20\" height=\"20\" viewBox=\"0 0 24 24\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"2\" stroke-linecap=\"round\" stroke-linejoin=\"round\" class=\"lucide lucide-maximize2 lucide-maximize-2\"><polyline points=\"15 3 21 3 21 9\"></polyline><polyline points=\"9 21 3 21 3 15\"></polyline><line x1=\"21\" x2=\"14\" y1=\"3\" y2=\"10\"></line><line x1=\"3\" x2=\"10\" y1=\"21\" y2=\"14\"></line></svg></button></div></div></div></a><figcaption class=\"image-caption\">The time here is days since release.</figcaption></figure></div><h1>Artifacts Log</h1><h3>Our Picks</h3><ul><li><p><strong><a href=\"https://huggingface.co/Qwen/Qwen3.5-397B-A17B\">Qwen3.5-397B-A17B</a></strong> by <a href=\"https://huggingface.co/Qwen\">Qwen</a>: The long-awaited update to Qwen is finally here. It comes in various sizes from 0.8B to 27B (dense) and 35B-A3B to 397B-A17B (MoE), some of them even with base models. All of them are multi-modal, use reasoning by default and are based on the Qwen-Next architecture with GDN layers.<br></p><p>We tested these models over the last few days, and they are a clear upgrade over the previous version: There are a lot of substantial improvements across the board, making them perfect workhorses for a wide range of tasks.<br>Their style and instruction-following have improved, and the models are even better at multilingual tasks, covering more languages.<br><br>However, at least the small models (still) tend to overthink. You can turn off reasoning by disabling it in the chat template.</p><div class=\"captioned-image-container\"><figure><a class=\"image-link image2 is-viewable-img\" target=\"_blank\" href=\"https://substackcdn.com/image/fetch/$s_!E9hc!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F979b5a61-90d3-413e-a2ab-55215d8d3541_17277x8176.png\" data-component-name=\"Image2ToDOM\"><div class=\"image2-inset\"><picture><source type=\"image/webp\" srcset=\"https://substackcdn.com/image/fetch/$s_!E9hc!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F979b5a61-90d3-413e-a2ab-55215d8d3541_17277x8176.png 424w, https://substackcdn.com/image/fetch/$s_!E9hc!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F979b5a61-90d3-413e-a2ab-55215d8d3541_17277x8176.png 848w, https://substackcdn.com/image/fetch/$s_!E9hc!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F979b5a61-90d3-413e-a2ab-55215d8d3541_17277x8176.png 1272w, https://substackcdn.com/image/fetch/$s_!E9hc!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F979b5a61-90d3-413e-a2ab-55215d8d3541_17277x8176.png 1456w\" sizes=\"100vw\"><img src=\"https://substackcdn.com/image/fetch/$s_!E9hc!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F979b5a61-90d3-413e-a2ab-55215d8d3541_17277x8176.png\" width=\"1456\" height=\"689\" data-attrs=\"{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/979b5a61-90d3-413e-a2ab-55215d8d3541_17277x8176.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:689,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Benchmark Results&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}\" class=\"sizing-normal\" alt=\"Benchmark Results\" title=\"Benchmark Results\" srcset=\"https://substackcdn.com/image/fetch/$s_!E9hc!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F979b5a61-90d3-413e-a2ab-55215d8d3541_17277x8176.png 424w, https://substackcdn.com/image/fetch/$s_!E9hc!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F979b5a61-90d3-413e-a2ab-55215d8d3541_17277x8176.png 848w, https://substackcdn.com/image/fetch/$s_!E9hc!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F979b5a61-90d3-413e-a2ab-55215d8d3541_17277x8176.png 1272w, https://substackcdn.com/image/fetch/$s_!E9hc!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F979b5a61-90d3-413e-a2ab-55215d8d3541_17277x8176.png 1456w\" sizes=\"100vw\"></picture><div class=\"image-link-expand\"><div class=\"pencraft pc-display-flex pc-gap-8 pc-reset\"><button tabindex=\"0\" type=\"button\" class=\"pencraft pc-reset pencraft icon-container restack-image\"><svg role=\"img\" width=\"20\" height=\"20\" viewBox=\"0 0 20 20\" fill=\"none\" stroke-width=\"1.5\" stroke=\"var(--color-fg-primary)\" stroke-linecap=\"round\" stroke-linejoin=\"round\" xmlns=\"http://www.w3.org/2000/svg\"><g><title></title><path d=\"M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882\"></path></g></svg></button><button tabindex=\"0\" type=\"button\" class=\"pencraft pc-reset pencraft icon-container view-image\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"20\" height=\"20\" viewBox=\"0 0 24 24\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"2\" stroke-linecap=\"round\" stroke-linejoin=\"round\" class=\"lucide lucide-maximize2 lucide-maximize-2\"><polyline points=\"15 3 21 3 21 9\"></polyline><polyline points=\"9 21 3 21 3 15\"></polyline><line x1=\"21\" x2=\"14\" y1=\"3\" y2=\"10\"></line><line x1=\"3\" x2=\"10\" y1=\"21\" y2=\"14\"></line></svg></button></div></div></div></a></figure></div></li><li><p><strong><a href=\"https://huggingface.co/stepfun-ai/Step-3.5-Flash\">Step-3.5-Flash</a></strong> by <a href=\"https://huggingface.co/stepfun-ai\">stepfun-ai</a>: StepFun really stepped up its game (no pun intended), releasing a 196B-A11B MoE with strong metrics across the board. It is especially strong in math benchmarks, beating out models that are several times larger than it.</p></li><li><p><strong><a href=\"https://huggingface.co/zai-org/GLM-5\">GLM-5</a></strong> by <a href=\"https://huggingface.co/zai-org\">zai-org</a>: A 744B-A40B release from the Zhipu team, which has resulted in such a big increase in demand that they <a href=\"https://www.reuters.com/technology/chinese-ai-startup-zhipu-hikes-prices-coding-plan-demand-rises-2026-02-12/\">raised prices</a> for their coding plan. It also comes with an <a href=\"https://arxiv.org/abs/2602.15763\">accompanying tech report</a>.</p></li><li><p><strong><a href=\"https://huggingface.co/MiniMaxAI/MiniMax-M2.5\">MiniMax-M2.5</a></strong> by <a href=\"https://huggingface.co/MiniMaxAI\">MiniMaxAI</a>: Despite the relatively small size, Minimax-M2.5 can rival models such as GLM-5 and Kimi K2.5 and has quickly become one of the favorites of the community.</p><div class=\"captioned-image-container\"><figure><a class=\"image-link image2 is-viewable-img\" target=\"_blank\" href=\"https://substackcdn.com/image/fetch/$s_!oDwH!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf043d8f-881e-4132-a606-99a25f9b5305_1280x617.png\" data-component-name=\"Image2ToDOM\"><div class=\"image2-inset\"><picture><source type=\"image/webp\" srcset=\"https://substackcdn.com/image/fetch/$s_!oDwH!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf043d8f-881e-4132-a606-99a25f9b5305_1280x617.png 424w, https://substackcdn.com/image/fetch/$s_!oDwH!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf043d8f-881e-4132-a606-99a25f9b5305_1280x617.png 848w, https://substackcdn.com/image/fetch/$s_!oDwH!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf043d8f-881e-4132-a606-99a25f9b5305_1280x617.png 1272w, https://substackcdn.com/image/fetch/$s_!oDwH!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf043d8f-881e-4132-a606-99a25f9b5305_1280x617.png 1456w\" sizes=\"100vw\"><img src=\"https://substackcdn.com/image/fetch/$s_!oDwH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf043d8f-881e-4132-a606-99a25f9b5305_1280x617.png\" width=\"1280\" height=\"617\" data-attrs=\"{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/bf043d8f-881e-4132-a606-99a25f9b5305_1280x617.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:617,&quot;width&quot;:1280,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}\" class=\"sizing-normal\" alt=\"\" srcset=\"https://substackcdn.com/image/fetch/$s_!oDwH!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf043d8f-881e-4132-a606-99a25f9b5305_1280x617.png 424w, https://substackcdn.com/image/fetch/$s_!oDwH!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf043d8f-881e-4132-a606-99a25f9b5305_1280x617.png 848w, https://substackcdn.com/image/fetch/$s_!oDwH!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf043d8f-881e-4132-a606-99a25f9b5305_1280x617.png 1272w, https://substackcdn.com/image/fetch/$s_!oDwH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf043d8f-881e-4132-a606-99a25f9b5305_1280x617.png 1456w\" sizes=\"100vw\"></picture><div class=\"image-link-expand\"><div class=\"pencraft pc-display-flex pc-gap-8 pc-reset\"><button tabindex=\"0\" type=\"button\" class=\"pencraft pc-reset pencraft icon-container restack-image\"><svg role=\"img\" width=\"20\" height=\"20\" viewBox=\"0 0 20 20\" fill=\"none\" stroke-width=\"1.5\" stroke=\"var(--color-fg-primary)\" stroke-linecap=\"round\" stroke-linejoin=\"round\" xmlns=\"http://www.w3.org/2000/svg\"><g><title></title><path d=\"M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882\"></path></g></svg></button><button tabindex=\"0\" type=\"button\" class=\"pencraft pc-reset pencraft icon-container view-image\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"20\" height=\"20\" viewBox=\"0 0 24 24\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"2\" stroke-linecap=\"round\" stroke-linejoin=\"round\" class=\"lucide lucide-maximize2 lucide-maximize-2\"><polyline points=\"15 3 21 3 21 9\"></polyline><polyline points=\"9 21 3 21 3 15\"></polyline><line x1=\"21\" x2=\"14\" y1=\"3\" y2=\"10\"></line><line x1=\"3\" x2=\"10\" y1=\"21\" y2=\"14\"></line></svg></button></div></div></div></a></figure></div></li><li><p><strong><a href=\"https://huggingface.co/open-thoughts/OpenThinker-Agent-v1\">OpenThinker-Agent-v1</a></strong> by <a href=\"https://huggingface.co/open-thoughts\">open-thoughts</a>: OpenThinkers, known for their open reasoning releases (such as <a href=\"https://huggingface.co/datasets/open-thoughts/OpenThoughts3-1.2M\">OpenThoughts 3</a>) are now tackling agentic reasoning. Their initial release includes <a href=\"https://huggingface.co/open-thoughts/OpenThinker-Agent-v1-SFT\">SFT</a> and <a href=\"https://huggingface.co/datasets/open-thoughts/OpenThoughts-Agent-v1-RL\">RL</a> data, as well as a &#8220;lite&#8221; <a href=\"https://huggingface.co/datasets/open-thoughts/OpenThoughts-TBLite\">version</a> of terminal-based tasks to evaluate smaller models.</p></li></ul><p>The subtle differences in architecture of these models are covered in detail in the similar, more technically focused, round-up from <span class=\"mention-wrap\" data-attrs=\"{&quot;name&quot;:&quot;Sebastian Raschka, PhD&quot;,&quot;id&quot;:27393275,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F61f4c017-506f-4e9b-a24f-76340dad0309_800x800.jpeg&quot;,&quot;uuid&quot;:&quot;8d475c1b-577d-4a48-929d-0f5195d8fd33&quot;}\" data-component-name=\"MentionToDOM\"></span> &#8212; it&#8217;s a good complement if you&#8217;re looking to go deeper: </p><div class=\"embedded-post-wrap\" data-attrs=\"{&quot;id&quot;:189051354,&quot;url&quot;:&quot;https://magazine.sebastianraschka.com/p/a-dream-of-spring-for-open-weight&quot;,&quot;publication_id&quot;:1174659,&quot;publication_name&quot;:&quot;Ahead of AI&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!96vs!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49f25d0a-212b-4853-8bcb-128d0a3edbbf_1196x1196.png&quot;,&quot;title&quot;:&quot;A Dream of Spring for Open-Weight LLMs: 10 Architectures from Jan-Feb 2026&quot;,&quot;truncated_body_text&quot;:&quot;If you have struggled a bit to keep up with open-weight model releases this month, this article should catch you up on the main themes.&quot;,&quot;date&quot;:&quot;2026-02-25T13:26:56.028Z&quot;,&quot;like_count&quot;:150,&quot;comment_count&quot;:7,&quot;bylines&quot;:[{&quot;id&quot;:27393275,&quot;name&quot;:&quot;Sebastian Raschka, PhD&quot;,&quot;handle&quot;:&quot;rasbt&quot;,&quot;previous_name&quot;:&quot;Sebastian Raschka&quot;,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F61f4c017-506f-4e9b-a24f-76340dad0309_800x800.jpeg&quot;,&quot;bio&quot;:&quot;I'm an LLM research engineer 10+ years of experience in artificial intelligence. My expertise lies in AI &amp; LLM research focusing on code-driven implementations. I am also the author of \\&quot;Build a Large Language Model From Scratch\\&quot; (amzn.to/4fqvn0D).&quot;,&quot;profile_set_up_at&quot;:&quot;2022-10-09T16:19:59.744Z&quot;,&quot;reader_installed_at&quot;:&quot;2022-11-07T19:56:32.129Z&quot;,&quot;publicationUsers&quot;:[{&quot;id&quot;:1127862,&quot;user_id&quot;:27393275,&quot;publication_id&quot;:1174659,&quot;role&quot;:&quot;admin&quot;,&quot;public&quot;:true,&quot;is_primary&quot;:true,&quot;publication&quot;:{&quot;id&quot;:1174659,&quot;name&quot;:&quot;Ahead of AI&quot;,&quot;subdomain&quot;:&quot;sebastianraschka&quot;,&quot;custom_domain&quot;:&quot;magazine.sebastianraschka.com&quot;,&quot;custom_domain_optional&quot;:false,&quot;hero_text&quot;:&quot;Ahead of AI focuses on machine learning and AI research and is read by more than 150,000 researchers and practitioners who want to stay ahead in a rapidly evolving field.&quot;,&quot;logo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/49f25d0a-212b-4853-8bcb-128d0a3edbbf_1196x1196.png&quot;,&quot;author_id&quot;:27393275,&quot;primary_user_id&quot;:27393275,&quot;theme_var_background_pop&quot;:&quot;#2096FF&quot;,&quot;created_at&quot;:&quot;2022-11-04T18:30:05.218Z&quot;,&quot;email_from_name&quot;:null,&quot;copyright&quot;:&quot;Raschka AI Research (RAIR) Lab LLC&quot;,&quot;founding_plan_name&quot;:&quot;Founding plan&quot;,&quot;community_enabled&quot;:true,&quot;invite_only&quot;:false,&quot;payments_state&quot;:&quot;enabled&quot;,&quot;language&quot;:null,&quot;explicit&quot;:false,&quot;homepage_type&quot;:&quot;newspaper&quot;,&quot;is_personal_mode&quot;:false}}],&quot;twitter_screen_name&quot;:&quot;rasbt&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:1000,&quot;status&quot;:{&quot;bestsellerTier&quot;:1000,&quot;subscriberTier&quot;:1,&quot;leaderboard&quot;:null,&quot;vip&quot;:false,&quot;badge&quot;:{&quot;type&quot;:&quot;bestseller&quot;,&quot;tier&quot;:1000},&quot;paidPublicationIds&quot;:[1783977,9873],&quot;subscriber&quot;:null}}],&quot;utm_campaign&quot;:null,&quot;belowTheFold&quot;:true,&quot;type&quot;:&quot;newsletter&quot;,&quot;language&quot;:&quot;en&quot;,&quot;source&quot;:null}\" data-component-name=\"EmbeddedPostToDOM\"><a class=\"embedded-post\" native=\"true\" href=\"https://magazine.sebastianraschka.com/p/a-dream-of-spring-for-open-weight?utm_source=substack&amp;utm_campaign=post_embed&amp;utm_medium=web\"><div class=\"embedded-post-header\"><img class=\"embedded-post-publication-logo\" src=\"https://substackcdn.com/image/fetch/$s_!96vs!,w_56,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49f25d0a-212b-4853-8bcb-128d0a3edbbf_1196x1196.png\" loading=\"lazy\"><span class=\"embedded-post-publication-name\">Ahead of AI</span></div><div class=\"embedded-post-title-wrapper\"><div class=\"embedded-post-title\">A Dream of Spring for Open-Weight LLMs: 10 Architectures from Jan-Feb 2026</div></div><div class=\"embedded-post-body\">If you have struggled a bit to keep up with open-weight model releases this month, this article should catch you up on the main themes&#8230;</div><div class=\"embedded-post-cta-wrapper\"><span class=\"embedded-post-cta\">Read more</span></div><div class=\"embedded-post-meta\">3 months ago &#183; 150 likes &#183; 7 comments &#183; Sebastian Raschka, PhD</div></a></div><h3>Models</h3><h4>General Purpose</h4><ul><li><p><strong><a href=\"https://huggingface.co/trillionlabs/Tri-21B-Think\">Tri-21B-Think</a></strong> by <a href=\"https://huggingface.co/trillionlabs\">trillionlabs</a>: The Korean Trillion Labs is a repeated guest at the Artifacts series. This time, they are releasing a 21B reasoning model with support for English, Korean and Japanese.</p></li><li><p><strong><a href=\"https://huggingface.co/openbmb/MiniCPM-SALA\">MiniCPM-SALA</a></strong> by <a href=\"https://huggingface.co/openbmb\">openbmb</a>: An English and Chinese 8B model with sparse attention, supporting a 1M context window.</p></li></ul>\n      <p>\n          <a href=\"https://www.interconnects.ai/p/latest-open-artifacts-19-qwen-35\">\n              Read more\n          </a>\n      </p>\n   "
          },
          {
            "guid": {
              "#text": "https://www.interconnects.ai/p/how-much-does-distillation-really",
              "@_isPermaLink": "false"
            },
            "link": "https://www.interconnects.ai/p/how-much-does-distillation-really",
            "title": "How much does distillation really matter for Chinese LLMs?",
            "pubDate": "Tue, 24 Feb 2026 16:06:43 GMT",
            "enclosure": {
              "@_url": "https://substack-post-media.s3.amazonaws.com/public/images/80416abb-1851-41da-97ba-26150f154e3b_3182x1790.png",
              "@_type": "image/jpeg",
              "@_length": "0"
            },
            "dc:creator": "Nathan Lambert",
            "description": "Reacting to Anthropic's post on \"distillation attacks.\"",
            "content:encoded": "<p>Distillation has been one of the most frequent topics of discussion in the broader US-China and technological diffusion story for AI. Distillation is a term with many definitions &#8212; the colloquial one today is using a stronger AI model&#8217;s outputs to teach a weaker model. The word itself is derived from a more technical and specific definition of <em><a href=\"https://arxiv.org/abs/1503.02531\">knowledge distillation</a></em> (Hinton, Vinyals, &amp; Dean 2015), which involves a specific way of learning to match the probability distribution of a teacher model.</p><p>The distillation of today is better described generally as synthetic data. You take outputs from a stronger model, usually via an API, and you train your model to predict those. The technical form of knowledge distillation is not actually possible from API models because they don&#8217;t expose the right information to the user.</p><p>Synthetic data is arguably the single most useful method that an AI researcher today uses to improve the models on a day to day basis. Yes, architecture is crucial, some data still needs exclusively human inputs, and new ideas like reinforcement learning with verifiable rewards at scale can transform the industry, but so much of the day to day life in improving models today is figuring out how to properly capture and scale up synthetic data.</p><p>To flesh out the point from the start of this piece, the argument has repeatedly been that the leading Chinese labs are using distillation for their models to steal  capabilities from the best American API-based counterparts. The most prominent case to date was surrounding the <a href=\"https://fortune.com/2025/01/29/deepseek-openais-what-is-distillation-david-sacks/\">release</a> <a href=\"https://techcrunch.com/2025/01/29/microsoft-probing-whether-deepseek-improperly-used-openais-api/\">of</a> <a href=\"https://www.scmp.com/tech/big-tech/article/3296827/deepseeks-ai-distillation-theft-openai-seeks-answers-over-chinas-breakthrough\">DeepSeek</a> R1 &#8212; where <a href=\"https://www.bloomberg.com/news/articles/2026-02-12/openai-accuses-deepseek-of-distilling-us-models-to-gain-an-edge\">OpenAI accused DeepSeek of stealing their reasoning traces</a> by jailbreaking the API (they&#8217;re not exposed by default &#8212; for context, a reasoning trace is a colloquial word of art referring to the internal reasoning process, such as what open weight reasoning models expose to the user). Fear of distillation is also likely why Gemini quickly flipped from exposing the reasoning traces to users to hiding them. There was even very prominent, early <a href=\"https://arxiv.org/abs/2501.19393\">reasoning research that built on Gemini</a>!</p><p>This all leads us to today&#8217;s news, where <a href=\"https://www.anthropic.com/news/detecting-and-preventing-distillation-attacks\">Anthropic named and directly accused a series of Chinese labs</a> for elaborate distillation campaigns on their Claude models. This is a complex issue. In this post we unpack a series of questions, beginning with the impact, and ending with politics. The core question is &#8212; how much of a performance benefit do Chinese labs get from distilling from American models.</p><div class=\"subscription-widget-wrap-editor\" data-attrs=\"{&quot;url&quot;:&quot;https://www.interconnects.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}\" data-component-name=\"SubscribeWidgetToDOM\"><div class=\"subscription-widget show-subscribe\"><div class=\"preamble\"><p class=\"cta-caption\">Interconnects AI is a reader-supported publication. Consider becoming a subscriber.</p></div><form class=\"subscription-widget-subscribe\"><input type=\"email\" class=\"email-input\" name=\"email\" placeholder=\"Type your email&#8230;\" tabindex=\"-1\"><input type=\"submit\" class=\"button primary\" value=\"Subscribe\"><div class=\"fake-input-wrapper\"><div class=\"fake-input\"></div><div class=\"fake-button\"></div></div></form></div></div><p>To start, let&#8217;s review what Anthropic shared. From the <a href=\"https://www.anthropic.com/news/detecting-and-preventing-distillation-attacks\">blog post</a>, emphasis mine:</p><blockquote><p>We have identified industrial-scale campaigns by three AI laboratories&#8212;DeepSeek, Moonshot, and MiniMax&#8212;to illicitly extract Claude&#8217;s capabilities to improve their own models. These labs generated over 16 million exchanges with Claude through approximately 24,000 fraudulent accounts, in violation of our terms of service and regional access restrictions.</p><p>These labs used a technique called &#8220;distillation,&#8221; which involves training a less capable model on the outputs of a stronger one. <strong>Distillation is a widely used and legitimate training method.</strong> For example, frontier AI labs routinely distill their own models to create smaller, cheaper versions for their customers. But distillation can also be used for illicit purposes: competitors can use it to acquire powerful capabilities from other labs in a fraction of the time, and at a fraction of the cost, that it would take to develop them independently.</p></blockquote><p>Much like the models themselves, the benefits of distillation are very jagged. For some capabilities, particularly if you don&#8217;t have a full training pipeline setup for it, quickly distilling some data from the leading frontier model in that area can yield massive performance boosts. This can definitely help the lab distilling from the API catch up much more quickly than they otherwise would. Most distillation is rather benign, using many tokens of an LLM to help process and refine existing data &#8212; putting a lot of compute into getting a few, high quality training tokens out. This sort of raw data processing work can be done on many different APIs, but one tends to be best.</p><p>When we go into what Anthropic says the three Chinese LLM builders actually used the Claude API for &#8212; as an aside, Anthropic didn&#8217;t confirm that the attack was done through the API, the chat app, or Claude Code &#8212; the actual impact of the operations is very mixed. It&#8217;s hard to know how much untracked usage these labs deployed for other projects (or other American models).</p><p>To start, Anthropic puts DeepSeek first in their blog post because they&#8217;re the household name in the US for Chinese AI. The extent of their use is actually quite small, showing how this post is more about the big picture than the details:</p><blockquote><p><strong>DeepSeek</strong></p><p><em>Scale: Over 150,000 exchanges</em></p><p>The operation targeted:</p><ul><li><p>Reasoning capabilities across diverse tasks</p></li><li><p>Rubric-based grading tasks that made Claude function as a reward model for reinforcement learning</p></li><li><p>Creating censorship-safe alternatives to policy sensitive queries</p></li></ul></blockquote><p>In the scale of training a language model, 150K samples is only scratching the surface as a substantive experiment. It looks like they were experimenting with some rubrics, which could&#8217;ve been for an online RL run, but that&#8217;s extremely unlikely with how distributed the access was, and then some minor stuff on completions for sensitive queries. This usage of Anthropic&#8217;s API will have a negligible impact on DeepSeek&#8217;s long-rumored V4 model (or whichever model the data here contributed to). This was also very likely a small team at DeepSeek and unknown to much of the broader training organization.</p><p>The other two labs, Moonshot AI (makers of the <a href=\"https://www.interconnects.ai/p/kimi-k2-thinking-what-it-means\">Kimi</a> models) and MiniMax reflected much broader usage.</p><blockquote><p><strong>Moonshot AI</strong></p><p><em>Scale: Over 3.4 million exchanges</em></p><p>The operation targeted:</p><ul><li><p>Agentic reasoning and tool use</p></li><li><p>Coding and data analysis</p></li><li><p>Computer-use agent development</p></li><li><p>Computer vision</p></li></ul><p><strong>MiniMax</strong></p><p><em>Scale: Over 13 million exchanges</em></p><p>The operation targeted:</p><ul><li><p>Agentic coding</p></li><li><p>Tool use and orchestration</p></li></ul></blockquote><p>The role of distillation is constantly changing. Distilling from Claude today for its agentic behavior is much more valuable than versions of Claude have been as a teacher in the past. Claude Opus 4.6 has a well-rounded agentic navigation that none of the other models quite match. Why not try training on some of the model outputs to see if your model absorbs it? Over the next few months, that&#8217;ll be less differentiated. It&#8217;s sort of like how all the models are way better at math today than most people need &#8212; there are plenty of places to distill from.</p><p>Estimates will vary, but if each response had 10-25K tokens per exchange, the total tokens across these two labs, mostly with MiniMax, would be 150-400 billion tokens. This is a substantial amount, which could meaningfully improve a models&#8217; post-training. For example, in Olmo 3 we had an SFT dataset of 20 billion tokens that could be built like this, and increasing it by 10X would be very reasonable.</p><p>These numbers are just scratching the surface of total synthetic data generation across APIs hosted by US companies. At the same time, quantity is a pretty crude way to measure impact. Just taking the outputs from Claude and figuring out how to add them to your model pipeline isn&#8217;t easy. The research community has seen many cases where taking outputs from a certain teacher model unexpectedly makes the student worse &#8212; subtle interactions between the data make it variable and tricky to do this type of distillation. It&#8217;s fundamentally a research problem.</p><p>This is what I&#8217;m sure the Chinese labs are innovating at. There&#8217;s an argument that Chinese frontier labs are substantially more efficient than their Western counterparts &#8212; this is misleading.</p><p>The labs operate under different constraints. The Chinese labs are likely slightly more efficient out of necessity in being lower on resources, but overall the picture of talent access is very similar. The Chinese labs also approach benchmarks differently, making it appear that they&#8217;re a bit closer than they really are (and <a href=\"https://www.interconnects.ai/p/open-models-in-perpetual-catch-up\">appearing as if they&#8217;re potentially surpassing</a>). This is needed to get momentum and brand recognition in the AI market.</p><p>The Chinese labs likely innovate greatly on distilling from leading API models, due to their restricted access to GPUs. GPUs could be used to construct synthetic data, but for organizations with more funding than they can spend on research compute (being supply limited), using API-based models is one of the few other options for effectively getting more compute. It&#8217;s way easier to figure out getting access to &#8220;banned&#8221; API models than it is to smuggle tens of thousands of physical GPUs and get them set up.</p><p class=\"button-wrapper\" data-attrs=\"{&quot;url&quot;:&quot;https://www.interconnects.ai/p/how-much-does-distillation-really?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}\" data-component-name=\"ButtonCreateButton\"><a class=\"button primary\" href=\"https://www.interconnects.ai/p/how-much-does-distillation-really?utm_source=substack&utm_medium=email&utm_content=share&action=share\"><span>Share</span></a></p><p>It&#8217;s not only the Chinese labs that operate like this. Synthetic data from a model you don&#8217;t own is all arguably distillation. Distillation is a shortcut to more compute for anyone. It&#8217;s also a far less risky cost, as having a big cluster for research requires a very large financial commitment, where APIs are pay-as-you-go. For example, in <a href=\"https://arxiv.org/abs/2512.13961\">Olmo 3</a> we used millions of GPU hours on the <a href=\"https://en.wikipedia.org/wiki/Frontier_(supercomputer)\">Frontier supercomputer</a> and Azure credits through <a href=\"https://nairrpilot.org/\">NAIRR</a> for synthetic data. We didn&#8217;t have the equivalent in GPUs (or really the cash, thank you research credits!).</p><p>All together, it&#8217;s very fair for Anthropic to be concerned about this. I still wouldn&#8217;t say it is a <em>crucial</em> factor in these Chinese labs post-training capabilities, especially not one that&#8217;ll be easy to measure in a time gap to matching the model they&#8217;re distilling from a la the US-China performance lag.</p><p>If we take a step back, there was even a time when Claude Sonnet was the flagship model ahead of Opus (I think this was with  Sonnet 3.5), much of this comes from it being <em>well distilled</em> internally from Opus checkpoints. Fast iteration and high-quality data can go very far, letting student models surpass the teacher. Frontier labs use this to their advantage, by having internal-only models for generating synthetic data, but saying that Chinese models could never pass the US frontier due to data distillation is like saying that Claude Sonnet could never beat Opus. It's unlikely, and it depends a lot on release times, but with AI models making dramatic progress, weirder things like this have already literally happened.</p><p>The biggest factor unaddressed here is how distillation from stronger teacher models is harder in an era when reinforcement learning at scale is needed to train the best models. You can spend compute carefully crafting and filtering prompts, but you still need to train the model yourself with substantial, on-policy inference &#8212; generation is the majority of the compute cost for RL and it can&#8217;t be generations from another model. For this reason, I expected this story to die down a bit. It&#8217;s clear from their <a href=\"https://arxiv.org/abs/2501.12948\">open</a> <a href=\"https://arxiv.org/abs/2506.13585\">research</a> <a href=\"https://arxiv.org/abs/2507.20534\">that</a> <a href=\"https://arxiv.org/abs/2602.15763\">Chinese</a> <a href=\"https://arxiv.org/abs/2512.02556\">labs</a> have excellent RL infrastructure, despite the compute shortages.</p><p>The reason I expected it to fade is that not being allowed to distill models for &#8220;competitive purposes&#8221; has violated the terms of service for API models for quite some time. Academics and open model builders in the US used to greatly worry about and debate this (and I&#8217;ve written about it multiple times in <a href=\"https://www.interconnects.ai/p/ml-moats\">2022</a> and <a href=\"https://www.interconnects.ai/p/llm-synthetic-data\">2023</a>). Only later in 2024 did that worry die down in the community (and no action has been taken against any smaller model builders).</p><p>This action from Anthropic represents another continued step ratcheting up the AI geopolitical tension. Kneecapping model distillation will be far harder than restricting the shipments of physical goods like GPUs. In many ways it seems like fully restricting distillation through distributed access methods seems almost impossible, and restricting GPU sales would be far more impactful.</p><p>Anthropic and the AI industry should choose their battles. When API endpoints are available for the best models, other entities will use that to train variants of said model. This is a natural evolution of AI models. If AI models are so precious that distillation is an extreme risk, then the models will be restricted to first-party products. Anthropic has a choice to do this with their latest models. The market for API-based model alternatives may be so competitive that some companies go this path &#8212; likely in part due to Chinese models undercutting on price &#8212; but an API is a fundamental offering that no leading lab will risk walking back from anytime soon.</p>"
          },
          {
            "guid": {
              "#text": "https://www.interconnects.ai/p/open-models-in-perpetual-catch-up",
              "@_isPermaLink": "false"
            },
            "link": "https://www.interconnects.ai/p/open-models-in-perpetual-catch-up",
            "title": "Open models in perpetual catch-up",
            "pubDate": "Tue, 17 Feb 2026 17:27:36 GMT",
            "enclosure": {
              "@_url": "https://substackcdn.com/image/fetch/$s_!oyTU!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2cad176-f718-4046-8486-161c1111435e_2680x1366.png",
              "@_type": "image/jpeg",
              "@_length": "0"
            },
            "dc:creator": "Nathan Lambert",
            "description": "The open-closed gap, distillation, innovation timescales, how open models win, specialized models, what&#8217;s missing, etc.",
            "content:encoded": "<p>Every 4-6 months a new open-weights model comes out that causes a clamor of discussion on how open models are closer than they ever have been to the best closed, frontier models. The most recent is Z.ai&#8217;s <a href=\"https://huggingface.co/zai-org/GLM-5\">GLM 5</a> model, which is the latest, leading open weights model from a Chinese company. In the last 12 months the new part of this story is that all of the open models of discussion are coming from China, where previously they were almost always Meta&#8217;s Llamas. These moments of discussion are always reflective for me &#8212; for, despite being one of open models&#8217; biggest advocates, I always find the narrative to be overblown &#8212; open models are not meaningfully accelerating towards matching the best closed models in absolute performance. The ~6month gap is holding steady.</p><p>At the same time, it&#8217;s worth discussing what happens as open models keep getting way better. Open models are staying far closer on the heels of the best closed models than I, and many other experts following the ecosystem, would expect. On paper the top three American labs &#8212; in Anthropic, OpenAI, and Google &#8212; have vastly more resources at play for training in research. In this world, many would have expected a more obviously growing margin between the best open and closed models. Raw research compute, data purchases, user data, etc. all are providing relatively fine margins. Maybe it&#8217;s the scaling laws log-linear relationship from compute to performance coming into play?</p><p>The plot of the day is ArtificialAnalysis Intelligence Index for <a href=\"https://artificialanalysis.ai/models/open-source\">open vs. closed models over time</a>. The point of this post isn&#8217;t to nitpick this index&#8217;s many limitations, or any other, but to reflect on what this chart doesn&#8217;t represent and what it means for the AI world for open weights to keep pace year in and year out.</p><div class=\"captioned-image-container\"><figure><a class=\"image-link image2 is-viewable-img\" target=\"_blank\" href=\"https://substackcdn.com/image/fetch/$s_!oyTU!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2cad176-f718-4046-8486-161c1111435e_2680x1366.png\" data-component-name=\"Image2ToDOM\"><div class=\"image2-inset\"><picture><source type=\"image/webp\" srcset=\"https://substackcdn.com/image/fetch/$s_!oyTU!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2cad176-f718-4046-8486-161c1111435e_2680x1366.png 424w, https://substackcdn.com/image/fetch/$s_!oyTU!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2cad176-f718-4046-8486-161c1111435e_2680x1366.png 848w, https://substackcdn.com/image/fetch/$s_!oyTU!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2cad176-f718-4046-8486-161c1111435e_2680x1366.png 1272w, https://substackcdn.com/image/fetch/$s_!oyTU!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2cad176-f718-4046-8486-161c1111435e_2680x1366.png 1456w\" sizes=\"100vw\"><img src=\"https://substackcdn.com/image/fetch/$s_!oyTU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2cad176-f718-4046-8486-161c1111435e_2680x1366.png\" width=\"1456\" height=\"742\" data-attrs=\"{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c2cad176-f718-4046-8486-161c1111435e_2680x1366.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:742,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:592365,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.interconnects.ai/i/188211391?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2cad176-f718-4046-8486-161c1111435e_2680x1366.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}\" class=\"sizing-normal\" alt=\"\" srcset=\"https://substackcdn.com/image/fetch/$s_!oyTU!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2cad176-f718-4046-8486-161c1111435e_2680x1366.png 424w, https://substackcdn.com/image/fetch/$s_!oyTU!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2cad176-f718-4046-8486-161c1111435e_2680x1366.png 848w, https://substackcdn.com/image/fetch/$s_!oyTU!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2cad176-f718-4046-8486-161c1111435e_2680x1366.png 1272w, https://substackcdn.com/image/fetch/$s_!oyTU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2cad176-f718-4046-8486-161c1111435e_2680x1366.png 1456w\" sizes=\"100vw\" fetchpriority=\"high\"></picture><div class=\"image-link-expand\"><div class=\"pencraft pc-display-flex pc-gap-8 pc-reset\"><button tabindex=\"0\" type=\"button\" class=\"pencraft pc-reset pencraft icon-container restack-image\"><svg role=\"img\" width=\"20\" height=\"20\" viewBox=\"0 0 20 20\" fill=\"none\" stroke-width=\"1.5\" stroke=\"var(--color-fg-primary)\" stroke-linecap=\"round\" stroke-linejoin=\"round\" xmlns=\"http://www.w3.org/2000/svg\"><g><title></title><path d=\"M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882\"></path></g></svg></button><button tabindex=\"0\" type=\"button\" class=\"pencraft pc-reset pencraft icon-container view-image\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"20\" height=\"20\" viewBox=\"0 0 24 24\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"2\" stroke-linecap=\"round\" stroke-linejoin=\"round\" class=\"lucide lucide-maximize2 lucide-maximize-2\"><polyline points=\"15 3 21 3 21 9\"></polyline><polyline points=\"9 21 3 21 3 15\"></polyline><line x1=\"21\" x2=\"14\" y1=\"3\" y2=\"10\"></line><line x1=\"3\" x2=\"10\" y1=\"21\" y2=\"14\"></line></svg></button></div></div></div></a></figure></div><p>The benchmark mixes a ton of factors into 1 score that judges model &#8220;quality.&#8221; This compresses far too many error bars, stories, and weaknesses into one metric. These metrics will always be used to inform policy and help more people understand the high-level trends of AI, but they do a poor job of capturing the <em>frontier</em> of AI progress. </p><p>The frontier of AI has <a href=\"https://www.interconnects.ai/p/opus-46-vs-codex-53\">never been harder to capture in public benchmarks</a>. Building benchmarks is now super expensive and requires extreme knowledge regarding the latest models and what they do and do not excel at. Well known issues like SWE-Bench being almost 3/4 Django or Terminal Bench 2 being crowdsourced and a bit noisy will never be captured here. </p><p>Time and time again it has been shown that the leading frontier labs in the U.S. have a better read on the capabilities that actually matter, and the public benchmarks tend to be a bit easier to overfit to. Qwen&#8217;s recent flagship v3.5 model has been plagued again with numerous complaints of benchmaxing (while some out-of-distribution weirdness is debatably implementation errors, on Alibaba&#8217;s own API).</p><p>The combination of all these factors has pushed me to advocate for &#8220;no averaging across our evaluation suite&#8221; when communicating the value of our latest Olmo models at Ai2 (see my <a href=\"https://youtu.be/uaZ3yRdYg8A?si=31zxbDFqqqXHwJIR&amp;t=2465\">recent talk</a> on evals). The best models are indeed very close together, but averages can totally hide a single eval being dramatically different from an unscrupulous reader.</p><p>All together, I&#8217;d bet that the current Artificial Analysis Intelligence Index is a bit unrepresentative of the true frontier, rather than open models being closer to the closed models than ever before (yes, I know, it&#8217;s not like I am offering any obvious ways to improve it). The one domain where I foresee open models staying close behind is coding, where public GitHub data and clever verifiable rewards present a ton of potential performance gains.</p><p>The overall balance in the ecosystem is in between the value of the most intelligent model &#8212; which many people like myself still pay for despite open models&#8217; improvements &#8212; and the incredible cost-reductions that come once a given task is achievable by a permissively licensed open model. The best closed models keep unlocking even more valuable tasks, keeping open models in a state of perpetual catch-up. The industry continues to reinvent itself at a blistering pace.</p><p class=\"button-wrapper\" data-attrs=\"{&quot;url&quot;:&quot;https://www.interconnects.ai/p/open-models-in-perpetual-catch-up?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}\" data-component-name=\"ButtonCreateButton\"><a class=\"button primary\" href=\"https://www.interconnects.ai/p/open-models-in-perpetual-catch-up?utm_source=substack&utm_medium=email&utm_content=share&action=share\"><span>Share</span></a></p><p>Onto the 7 biggest other trends in open models.</p><h3>1. The open model frontier is brutally competitive</h3><p>2025 witnessed a sort of &#8220;Cambrian Explosion&#8221; of open weight models with very impressive benchmark scores. This market is far more populated than closed, API based models (where there are 4 substantive providers), so open model adoption is brutally concentrated. Only the most-successful models ever get any adoption. This is going to push many small and mid-sized model builders across the ecosystem to shift to a specific niche or a different business plan over the coming months or years.</p><p>As a model builder, I feel this super close to home. Even though models are fairly sticky (at least more sticky than the general coverage would indicate) &#8212; many open models are set up once if performance is good enough, and never replaced &#8211; the likelihood for most models to even get tried once goes down month over month with the ecosystem getting more competitive.</p><p>In my <a href=\"https://www.interconnects.ai/p/8-plots-that-explain-the-state-of\">post</a> on the state of open models earlier this year, I even learned that Qwen gets dominated on adoption metrics at the biggest scale of models. This continues to surprise me!</p><div class=\"captioned-image-container\"><figure><a class=\"image-link image2 is-viewable-img\" target=\"_blank\" href=\"https://substackcdn.com/image/fetch/$s_!L-lz!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26a16175-d9e6-4ca9-ae31-46c84f25d693_1872x1156.png\" data-component-name=\"Image2ToDOM\"><div class=\"image2-inset\"><picture><source type=\"image/webp\" srcset=\"https://substackcdn.com/image/fetch/$s_!L-lz!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26a16175-d9e6-4ca9-ae31-46c84f25d693_1872x1156.png 424w, https://substackcdn.com/image/fetch/$s_!L-lz!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26a16175-d9e6-4ca9-ae31-46c84f25d693_1872x1156.png 848w, https://substackcdn.com/image/fetch/$s_!L-lz!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26a16175-d9e6-4ca9-ae31-46c84f25d693_1872x1156.png 1272w, https://substackcdn.com/image/fetch/$s_!L-lz!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26a16175-d9e6-4ca9-ae31-46c84f25d693_1872x1156.png 1456w\" sizes=\"100vw\"><img src=\"https://substackcdn.com/image/fetch/$s_!L-lz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26a16175-d9e6-4ca9-ae31-46c84f25d693_1872x1156.png\" width=\"1456\" height=\"899\" data-attrs=\"{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/26a16175-d9e6-4ca9-ae31-46c84f25d693_1872x1156.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:899,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}\" class=\"sizing-normal\" alt=\"\" srcset=\"https://substackcdn.com/image/fetch/$s_!L-lz!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26a16175-d9e6-4ca9-ae31-46c84f25d693_1872x1156.png 424w, https://substackcdn.com/image/fetch/$s_!L-lz!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26a16175-d9e6-4ca9-ae31-46c84f25d693_1872x1156.png 848w, https://substackcdn.com/image/fetch/$s_!L-lz!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26a16175-d9e6-4ca9-ae31-46c84f25d693_1872x1156.png 1272w, https://substackcdn.com/image/fetch/$s_!L-lz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26a16175-d9e6-4ca9-ae31-46c84f25d693_1872x1156.png 1456w\" sizes=\"100vw\" loading=\"lazy\"></picture><div class=\"image-link-expand\"><div class=\"pencraft pc-display-flex pc-gap-8 pc-reset\"><button tabindex=\"0\" type=\"button\" class=\"pencraft pc-reset pencraft icon-container restack-image\"><svg role=\"img\" width=\"20\" height=\"20\" viewBox=\"0 0 20 20\" fill=\"none\" stroke-width=\"1.5\" stroke=\"var(--color-fg-primary)\" stroke-linecap=\"round\" stroke-linejoin=\"round\" xmlns=\"http://www.w3.org/2000/svg\"><g><title></title><path d=\"M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882\"></path></g></svg></button><button tabindex=\"0\" type=\"button\" class=\"pencraft pc-reset pencraft icon-container view-image\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"20\" height=\"20\" viewBox=\"0 0 24 24\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"2\" stroke-linecap=\"round\" stroke-linejoin=\"round\" class=\"lucide lucide-maximize2 lucide-maximize-2\"><polyline points=\"15 3 21 3 21 9\"></polyline><polyline points=\"9 21 3 21 3 15\"></polyline><line x1=\"21\" x2=\"14\" y1=\"3\" y2=\"10\"></line><line x1=\"3\" x2=\"10\" y1=\"21\" y2=\"14\"></line></svg></button></div></div></div></a></figure></div><p>The upshot is that competition at the frontier of performance for models is most concentrated in the popular benchmarks of the day, especially with large MoE models &#8212; this will drive exploration and innovation towards other cases where open models can actually win on overall business value.</p><h3>2. Specialized, small, fast, and cheap open models are missing</h3><p>There&#8217;s a large underserved market in specialized models for the enterprise, particularly with tools (maybe GPT OSS&#8217;s success is somewhat related to this). Generally, the idea would be to either release the weights, or the method for creating them, that are excellent in valuable, repetitive tasks. With agents becoming more prominent, these models should be able to perform repetitive, agent sub-tasks at small percentages of the cost of large frontier models, while being faster, private, and directly owned. For example, what if one open weight model is deployed with multiple PEFT-adapters per skill, allowing high-utilization and extensibility.</p><p>I&#8217;ve specifically heard this request from multiple enterprises building agents. While the Qwen models are fantastic at small sizes, open models tend to be very jagged in performance, so multiple options would likely be needed to get this off the ground. It&#8217;s also limited by a general lack of frontier-quality, post-training recipes, especially when it comes to adapting a model to specific domain or set of tasks not covered in academic benchmarks. In this view, most of the domain-specific models of today, like math or biology models, are actually not specialized enough.</p><p>This is one of many issues that I see repeatedly in how the open model ecosystem has major blind spots. The biggest reason that the open model ecosystem seems a bit misunderstood externally, or confused in itself, is that open models take a long time to figure out and get into the world.</p><h3>3. Understanding open models is massively under-indexed on</h3><p>There should be more research organizations fully dedicated to understanding how open models work technically and geopolitically. There could be entire think-tanks in DC informing the public on what is happening, and uncovering information buried in hackathons and new research labs in San Francisco. For Interconnects and <a href=\"https://atomproject.ai/\">The ATOM Project</a> I&#8217;m at the frontier of this work, which often entails <em>uncovering new raw data</em> on how open models are used. This data is always messy and imperfect, and often flat out confusing. Understanding open models is how we keep track of the direction of global diffusion for the most important technology in decades, and it feels like there is almost no public work doing so.</p><p>Here&#8217;s some new data on open model <em>usage</em> courtesy of <a href=\"http://openrouter.ai/\">OpenRouter</a>, which largely mirrors the adoption trends we&#8217;ve been seeing. While HuggingFace downloads are obviously very noisy, almost every other adoption metric over time looks strongly correlated with them, especially on U.S. vs. China issues.</p><div class=\"captioned-image-container\"><figure><a class=\"image-link image2 is-viewable-img\" target=\"_blank\" href=\"https://substackcdn.com/image/fetch/$s_!P0Nw!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2b13721-7a67-46c9-b83f-1ea16f4cff7c_1806x1218.jpeg\" data-component-name=\"Image2ToDOM\"><div class=\"image2-inset\"><picture><source type=\"image/webp\" srcset=\"https://substackcdn.com/image/fetch/$s_!P0Nw!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2b13721-7a67-46c9-b83f-1ea16f4cff7c_1806x1218.jpeg 424w, https://substackcdn.com/image/fetch/$s_!P0Nw!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2b13721-7a67-46c9-b83f-1ea16f4cff7c_1806x1218.jpeg 848w, https://substackcdn.com/image/fetch/$s_!P0Nw!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2b13721-7a67-46c9-b83f-1ea16f4cff7c_1806x1218.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!P0Nw!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2b13721-7a67-46c9-b83f-1ea16f4cff7c_1806x1218.jpeg 1456w\" sizes=\"100vw\"><img src=\"https://substackcdn.com/image/fetch/$s_!P0Nw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2b13721-7a67-46c9-b83f-1ea16f4cff7c_1806x1218.jpeg\" width=\"1456\" height=\"982\" data-attrs=\"{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d2b13721-7a67-46c9-b83f-1ea16f4cff7c_1806x1218.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:982,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Image&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}\" class=\"sizing-normal\" alt=\"Image\" title=\"Image\" srcset=\"https://substackcdn.com/image/fetch/$s_!P0Nw!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2b13721-7a67-46c9-b83f-1ea16f4cff7c_1806x1218.jpeg 424w, https://substackcdn.com/image/fetch/$s_!P0Nw!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2b13721-7a67-46c9-b83f-1ea16f4cff7c_1806x1218.jpeg 848w, https://substackcdn.com/image/fetch/$s_!P0Nw!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2b13721-7a67-46c9-b83f-1ea16f4cff7c_1806x1218.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!P0Nw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2b13721-7a67-46c9-b83f-1ea16f4cff7c_1806x1218.jpeg 1456w\" sizes=\"100vw\" loading=\"lazy\"></picture><div class=\"image-link-expand\"><div class=\"pencraft pc-display-flex pc-gap-8 pc-reset\"><button tabindex=\"0\" type=\"button\" class=\"pencraft pc-reset pencraft icon-container restack-image\"><svg role=\"img\" width=\"20\" height=\"20\" viewBox=\"0 0 20 20\" fill=\"none\" stroke-width=\"1.5\" stroke=\"var(--color-fg-primary)\" stroke-linecap=\"round\" stroke-linejoin=\"round\" xmlns=\"http://www.w3.org/2000/svg\"><g><title></title><path d=\"M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882\"></path></g></svg></button><button tabindex=\"0\" type=\"button\" class=\"pencraft pc-reset pencraft icon-container view-image\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"20\" height=\"20\" viewBox=\"0 0 24 24\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"2\" stroke-linecap=\"round\" stroke-linejoin=\"round\" class=\"lucide lucide-maximize2 lucide-maximize-2\"><polyline points=\"15 3 21 3 21 9\"></polyline><polyline points=\"9 21 3 21 3 15\"></polyline><line x1=\"21\" x2=\"14\" y1=\"3\" y2=\"10\"></line><line x1=\"3\" x2=\"10\" y1=\"21\" y2=\"14\"></line></svg></button></div></div></div></a></figure></div><p><em>As an aside, if this work monitoring the open ecosystem sounds appealing to you, please reach out or leave a comment &#8212; I&#8217;m thinking about how to scale up our impact in this area!</em></p><div class=\"subscription-widget-wrap-editor\" data-attrs=\"{&quot;url&quot;:&quot;https://www.interconnects.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}\" data-component-name=\"SubscribeWidgetToDOM\"><div class=\"subscription-widget show-subscribe\"><div class=\"preamble\"><p class=\"cta-caption\">Interconnects AI is a reader-supported publication. Consider becoming a subscriber.</p></div><form class=\"subscription-widget-subscribe\"><input type=\"email\" class=\"email-input\" name=\"email\" placeholder=\"Type your email&#8230;\" tabindex=\"-1\"><input type=\"submit\" class=\"button primary\" value=\"Subscribe\"><div class=\"fake-input-wrapper\"><div class=\"fake-input\"></div><div class=\"fake-button\"></div></div></form></div></div><h3>4. Nations will turn to open models as the only way to get an initial foothold in sovereign AI (and sovereign AI is the real deal)</h3><p>Sovereign AI has largely been unfolding slowly in the background of frontier AI discussions and the U.S.-China arms race, but it&#8217;ll only become more prevalent as AI becomes more deeply embedded in our technological <em>reality</em>. Every wealthy nation will see AI as a direction for influence in addition to a necessity for national security. Open models will likely be the only way to get this off the ground as a real effort, in order to have the local AI community and economy seamlessly integrate with it.</p><h3>5. Futures where open-source wins the frontier are still possible, but seemingly less likely</h3><p>The most likely (by far) outcome is for the status quo to continue and for the best open models to lag the best closed models by 6-9months. A large portion of the perpetual catch-up is likely due to the best open model builders constantly distilling their models on the strongest, currently available closed API models, but this direction seems less relevant with the rise of RL. Post-training today is more about the model undergoing <em>experience</em> rather than directly learning from the smartest teacher you can find. The paths to open models winning come through fundamental innovation. This looks like the ability to merge, rotate, and share expert models, a dramatic (100X+) cost reduction in the cost of training, etc. Predicting this before it happens is more of a sci-fi story than a faithful science, as then I&#8217;d just go build the damn thing.</p><p class=\"button-wrapper\" data-attrs=\"{&quot;url&quot;:&quot;https://www.interconnects.ai/p/open-models-in-perpetual-catch-up?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}\" data-component-name=\"ButtonCreateButton\"><a class=\"button primary\" href=\"https://www.interconnects.ai/p/open-models-in-perpetual-catch-up?utm_source=substack&utm_medium=email&utm_content=share&action=share\"><span>Share</span></a></p><h3>6. China&#8217;s open model &#8220;ecosystem&#8221; makes it the most likely place for a discovery around who wins</h3><p>China has <a href=\"https://www.interconnects.ai/p/2025-open-models-year-in-review\">many labs</a> building models on top of their peers&#8217; innovations. This intentional sharing of ideas provides immense benefits relative to Silicon Valley&#8217;s quid pro quo where it&#8217;s accepted that people go home at the end of their day and chat with some of their friends on the latest technical secrets of their models. The sort of sharing the Chinese companies do, especially considering more of them have closer ties to the nation&#8217;s scientific and academic institutions, is the sort of setup that lets new standards converge much faster and breakthroughs be shared. This is another unknown factor, like potential innovation where open models &#8220;win,&#8221; but it&#8217;s important because China has created their own conditions of potential, massive success, and the U.S. has no answer. This divergence in how the ecosystems operate could be nothing in the long-term, but U.S. AI companies cannot do much to compete with it if it takes off.</p><h3>7. Open models dictate science and diffusion &#8212; slower trends than the frontier of AI</h3><p>The biggest impact in AI in terms of transforming day to day life, and even the world&#8217;s power structures, will obviously come from the most powerful and intelligent models. It is fairly obvious then that the open models that end up in closest proximity to this capture the headlines &#8212; if an open-weights model does, somehow, happen to claim that title as &#8220;the world&#8217;s most powerful model,&#8221; there will be extreme economic consequences.</p><p>In the real world, the one with the highest probability of occurring, open models&#8217; biggest influence will be in two, very slow-moving sectors: 1) fundamental research/innovation and 2) global technological diffusion. I&#8217;ve personally realized how much of the excitement I can have for open models is a bit misguided &#8212; I&#8217;m trying to understand the frontier of AI through the lens of these models, missing the bigger story in how technology slowly reshapes the world&#8217;s biggest companies.</p><p>Consider when Llama was the open SOTA model, everyone in the U.S. and China did science on Llama, which then impacted subsequent models &#8212; even if we didn&#8217;t hear directly from Meta on how-so. Now this default is Qwen. Qwen is the anchor of the Chinese ecosystem. Language model research is proceeding extremely fast, which could make the fundamental improvements made in research labs impact the frontier of the technology much faster than usual.</p><p>At the same time, the global default for using AI outside of the wealthiest few nations will be to use either free applications like ChatGPT or open weight models. ChatGPT doesn&#8217;t fit a lot of business use-cases, so open weight models are a melting pot for innovation that we largely have no visibility into. When we zoom out to a timeline closer to decades, open model&#8217;s global adoption seems like a top trend to follow in AI.</p><h2>Conclusion</h2>\n      <p>\n          <a href=\"https://www.interconnects.ai/p/open-models-in-perpetual-catch-up\">\n              Read more\n          </a>\n      </p>\n   "
          }
        ],
        "link": "https://www.interconnects.ai",
        "image": {
          "url": "https://substackcdn.com/image/fetch/$s_!djof!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc52e8097-8f3d-4f7e-808b-2f4ad37f3b52_720x720.png",
          "link": "https://www.interconnects.ai",
          "title": "Interconnects AI"
        },
        "title": "Interconnects AI",
        "language": "en",
        "atom:link": {
          "@_rel": "self",
          "@_href": "https://www.interconnects.ai/feed",
          "@_type": "application/rss+xml"
        },
        "copyright": "Interconnects AI, LLC",
        "generator": "Substack",
        "webMaster": "mail@interconnects.ai",
        "description": "The cutting edge of AI, from inside the frontier AI labs, minus the hype. The border between high-level and technical thinking. Read by leading engineers, researchers, and investors.",
        "itunes:block": "Yes",
        "itunes:owner": {
          "itunes:name": "Nathan Lambert",
          "itunes:email": "mail@interconnects.ai"
        },
        "itunes:author": "Nathan Lambert",
        "lastBuildDate": "Tue, 26 May 2026 21:05:13 GMT",
        "googleplay:email": "mail@interconnects.ai",
        "googleplay:owner": "mail@interconnects.ai",
        "googleplay:author": "Nathan Lambert"
      },
      "@_version": "2.0",
      "@_xmlns:dc": "http://purl.org/dc/elements/1.1/",
      "@_xmlns:atom": "http://www.w3.org/2005/Atom",
      "@_xmlns:itunes": "http://www.itunes.com/dtds/podcast-1.0.dtd",
      "@_xmlns:content": "http://purl.org/rss/1.0/modules/content/",
      "@_xmlns:googleplay": "http://www.google.com/schemas/play-podcasts/1.0"
    },
    "?xml": {
      "@_version": "1.0",
      "@_encoding": "UTF-8"
    }
  },
  "entry_raw": {
    "guid": {
      "#text": "https://www.interconnects.ai/p/some-ideas-for-what-comes-next-may",
      "@_isPermaLink": "false"
    },
    "link": "https://www.interconnects.ai/p/some-ideas-for-what-comes-next-may",
    "title": "Some ideas for what comes next, May 2026",
    "pubDate": "Tue, 26 May 2026 15:39:02 GMT",
    "enclosure": {
      "@_url": "https://substackcdn.com/image/fetch/$s_!-711!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F723ba9cb-c351-4b89-8860-2ac4eda7f335_2288x1584.png",
      "@_type": "image/jpeg",
      "@_length": "0"
    },
    "dc:creator": "Nathan Lambert",
    "description": "Gemini Flash 3.5, Mythos, open-closed balance, America's open-source surge, emerging power struggles and more.",
    "content:encoded": "<p>As the years of AI progress go by, it&#8217;s been accompanied by a slowly rising tide of consequence. Models are getting more capable, how we work is changing quickly, economics of AI are becoming real, just as real-world risks come to the forefront. 2026 is the first year where I don&#8217;t think there&#8217;ll be any breaks from this. The hard part to prepare for is that there&#8217;s a good chance things just continue to ratchet up from here &#8211; more disruption, more surprises, more stakes.</p><p>On my end, there&#8217;s been a growing list of topics that are very fateful to how I see the current state of AI, but I haven&#8217;t even gotten to write about them (at least not from all the angles I want to)! All of these are closely related to the implications of different models reaching new capability levels and how I use that to infer what may come next.</p><p class=\"button-wrapper\" data-attrs=\"{&quot;url&quot;:&quot;https://www.interconnects.ai/p/some-ideas-for-what-comes-next-may?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}\" data-component-name=\"ButtonCreateButton\"><a class=\"button primary\" href=\"https://www.interconnects.ai/p/some-ideas-for-what-comes-next-may?utm_source=substack&utm_medium=email&utm_content=share&action=share\"><span>Share</span></a></p><h3>1. Open models haven&#8217;t had their true agent moment like Opus 4.5</h3><p>The time gap between open and closed models is very often discussed, but the reality is that we have a nice time-gating that&#8217;s independent of debatable benchmarks &#8211; if open-weight models do or do not become super useful in agentic harnesses. The <a href=\"https://www.interconnects.ai/p/claude-code-hits-different\">Opus 4.5 in Claude Code moment</a> of December 2025 was so loud and obvious, that if open models hit this performance level for price points as low as $5/month, there will be an explosion in usage.</p><p>Right now we are about 5-6 months in with no equivalent open model. I suspect the robustness of the best closed frontier models that I write about could make this moment take a good amount longer, say closer to 12+ months. In this time, Claude Code and Codex may seem like different categories of products. In the standard flurry of new, state-of-the-art open models from a variety of labs, benchmarks will definitely keep climbing, but the open-closed gap should become more interpretable as real-world use becomes the real litmus test.</p><h3>2. Gemini still doesn&#8217;t have a meaningful competitor for Claude Code and Codex</h3><p>The best exclamation point I can offer to reinforce my prediction that open models are further behind than the benchmarks claim is that even the mighty Google doesn&#8217;t have a clear competitor for Claude Code and Codex. I&#8217;m sure the Gemini team is pushing very hard on this.</p><p>I still need to do a lot more testing on Gemini 3.5 Flash, but reading reviews makes it clear that it&#8217;s not a substitute for how I&#8217;m working today. It&#8217;s maybe not the Gemini team explicitly specializing for Google&#8217;s existing products (search, YouTube, etc.), but the model seems to suit them. If Google doesn&#8217;t have a powerful tool here soon, I don&#8217;t expect the open model labs to either. The open models are going to be used more for automated, enterprise agents and low-cost domains, rather than being the driving tool of modern knowledge work. This will feed directly into the economic engine of funding future models, where the agents like Claude Code and Codex are the current best path to massive AI revenue growth.</p><p><em>I discussed how the current environment is quietly driving labs in China to specialize on <a href=\"https://aiproem.substack.com/p/nathan-lambert-reflects-on-chinas\">AI Proem</a> with Grace Shao and this is central to my <a href=\"https://www.interconnects.ai/p/the-next-phase-of-open-models\">expectations of open models specializing</a> over the next few years instead of competing with OpenAI, Anthropic, and Google.</em></p><div class=\"subscription-widget-wrap-editor\" data-attrs=\"{&quot;url&quot;:&quot;https://www.interconnects.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}\" data-component-name=\"SubscribeWidgetToDOM\"><div class=\"subscription-widget show-subscribe\"><div class=\"preamble\"><p class=\"cta-caption\">Interconnects AI is a reader-supported publication. Consider becoming a subscriber.</p></div><form class=\"subscription-widget-subscribe\"><input type=\"email\" class=\"email-input\" name=\"email\" placeholder=\"Type your email&#8230;\" tabindex=\"-1\"><input type=\"submit\" class=\"button primary\" value=\"Subscribe\"><div class=\"fake-input-wrapper\"><div class=\"fake-input\"></div><div class=\"fake-button\"></div></div></form></div></div><h3>3. I don&#8217;t expect an open-weights Mythos this year</h3><p>While I don&#8217;t think Mythos is a general &#8220;god model&#8221; that will crush the competition in every domain, I do think it&#8217;s a remarkable technical achievement in software engineering and cybersecurity. Mythos is obviously a watershed moment for those fields. Having spoken to most of the Chinese labs &#8211; particularly those with the most prominent, large, open MoE models like Kimi, Z.ai, DeepSeek, and Qwen &#8211; I think they&#8217;re heavily resource limited and don&#8217;t have an immediate path to scaling up training processes like the big labs in the U.S. For the labs which are more corporate, which comes with more resources, such as Alibaba and Bytedance, they also have more conservative stances on safety and security.<br><br>Mythos is a bellwether of the massive acceleration in training and research compute available to the largest American companies.</p><p><em>Epoch AI recently had a nice <a href=\"https://epoch.ai/gradient-updates/frontier-labs-dont-use-most-ai-compute\">piece</a> on the compute available to various labs (~Google 25%, Meta 11%, OpenAI 11%, Anthropic 6%). All of these numbers are vastly higher than any Chinese lab.</em></p><h3>4. American open models are slowly gaining steam</h3><p>Nvidia with Nemotron, Google with Gemma, Arcee AI and others are slowly stabilizing the open model ecosystem in the U.S. There&#8217;s a lot that&#8217;s hard to measure here, especially in the rise of local agents like OpenClaw and Hermes, but there are adoption numbers of American models that we haven&#8217;t seen since Llama 3.<br><br>Gemma 4&#8217;s models are all tying or outperforming the equivalently sized Qwen 3.5/3.6 models &#8212; where Qwen has for years now been the default open model at these sizes. These Qwen 3.5/3.6 models have been tricky to get working in a lot of post-training research, partially due to architecture/tooling and partially likely due to modeling (i.e. the model is not easy to finetune for some training decision). I&#8217;ve heard few complaints about Gemma, but it also could be because Gemma is not yet the <em>researcher</em> default.</p><div class=\"captioned-image-container\"><figure><a class=\"image-link image2 is-viewable-img\" target=\"_blank\" href=\"https://substackcdn.com/image/fetch/$s_!-711!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F723ba9cb-c351-4b89-8860-2ac4eda7f335_2288x1584.png\" data-component-name=\"Image2ToDOM\"><div class=\"image2-inset\"><picture><source type=\"image/webp\" srcset=\"https://substackcdn.com/image/fetch/$s_!-711!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F723ba9cb-c351-4b89-8860-2ac4eda7f335_2288x1584.png 424w, https://substackcdn.com/image/fetch/$s_!-711!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F723ba9cb-c351-4b89-8860-2ac4eda7f335_2288x1584.png 848w, https://substackcdn.com/image/fetch/$s_!-711!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F723ba9cb-c351-4b89-8860-2ac4eda7f335_2288x1584.png 1272w, https://substackcdn.com/image/fetch/$s_!-711!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F723ba9cb-c351-4b89-8860-2ac4eda7f335_2288x1584.png 1456w\" sizes=\"100vw\"><img src=\"https://substackcdn.com/image/fetch/$s_!-711!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F723ba9cb-c351-4b89-8860-2ac4eda7f335_2288x1584.png\" width=\"1456\" height=\"1008\" data-attrs=\"{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/723ba9cb-c351-4b89-8860-2ac4eda7f335_2288x1584.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1008,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:207124,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.interconnects.ai/i/199119723?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F723ba9cb-c351-4b89-8860-2ac4eda7f335_2288x1584.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}\" class=\"sizing-normal\" alt=\"\" srcset=\"https://substackcdn.com/image/fetch/$s_!-711!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F723ba9cb-c351-4b89-8860-2ac4eda7f335_2288x1584.png 424w, https://substackcdn.com/image/fetch/$s_!-711!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F723ba9cb-c351-4b89-8860-2ac4eda7f335_2288x1584.png 848w, https://substackcdn.com/image/fetch/$s_!-711!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F723ba9cb-c351-4b89-8860-2ac4eda7f335_2288x1584.png 1272w, https://substackcdn.com/image/fetch/$s_!-711!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F723ba9cb-c351-4b89-8860-2ac4eda7f335_2288x1584.png 1456w\" sizes=\"100vw\" loading=\"lazy\"></picture><div class=\"image-link-expand\"><div class=\"pencraft pc-display-flex pc-gap-8 pc-reset\"><button tabindex=\"0\" type=\"button\" class=\"pencraft pc-reset pencraft icon-container restack-image\"><svg role=\"img\" width=\"20\" height=\"20\" viewBox=\"0 0 20 20\" fill=\"none\" stroke-width=\"1.5\" stroke=\"var(--color-fg-primary)\" stroke-linecap=\"round\" stroke-linejoin=\"round\" xmlns=\"http://www.w3.org/2000/svg\"><g><title></title><path d=\"M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882\"></path></g></svg></button><button tabindex=\"0\" type=\"button\" class=\"pencraft pc-reset pencraft icon-container view-image\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"20\" height=\"20\" viewBox=\"0 0 24 24\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"2\" stroke-linecap=\"round\" stroke-linejoin=\"round\" class=\"lucide lucide-maximize2 lucide-maximize-2\"><polyline points=\"15 3 21 3 21 9\"></polyline><polyline points=\"9 21 3 21 3 15\"></polyline><line x1=\"21\" x2=\"14\" y1=\"3\" y2=\"10\"></line><line x1=\"3\" x2=\"10\" y1=\"21\" y2=\"14\"></line></svg></button></div></div></div></a></figure></div><p>There's a simple reality that we've seen recently with models like GPT-OSS, Nemotron 3, and now Gemma 4, that if a model is in the right range of benchmarks and released by an American lab with a truly permissive license, it'll get a large amount of adoption (in this cycle, recall that Gemma 4 adopted the Apache 2.0 License, changing from one with use-case restrictions on earlier Gemmas). This early phase of American growth in open models is establishing key brands directly with developers. The consensus is that more neolabs like Reflection and Thinking Machines are likely to participate in this space, but being too patient will lose the time when new agentic workflows and enterprise relationships are built.</p><h3>5. Anthropic and OpenAI are just getting up to speed in model iterations</h3><p>I expect the rest of this year to be a ruthless competition between these two flagship companies. I&#8217;m at an interesting balance where I think GPT 5.5 is a bit smarter of a model and I love the Codex App, so I&#8217;m structuring much of my work to be possible there. At the same time, for a lot of writing-related and broader surface area tasks I really still love Claude. These models are rapidly changing how we work, I run Codex from my phone while doing other things, am setting up automated open model analysis jobs on the back of agents, and expect to be able to scale the research side of Interconnects widely.</p><p>AI is beginning to drive companies to the two extremes in the scaling era. The biggest companies will be way bigger than ever, using resources and mass talent to have sustained progress at the frontier of raw AI capabilities. On the other side, tiny businesses like Interconnects thrive by using agents to refine, present, and sell niche expertise. The mass social job displacement that&#8217;ll come is going to reduce employability for various knowledge workers that don&#8217;t fit into either of these extremes for the raw technical side (big or small companies), while sustaining and maybe even amplifying careers that interface directly with humans (e.g. doctors) or other power structures with means to sustain themselves (law/government).</p><h3>6. More existing power structures will assert themselves on AI</h3><p>Just in the last few days while writing this, we had the Pope release <a href=\"https://www.vatican.va/content/leo-xiv/en/encyclicals/documents/20260515-magnifica-humanitas.html\">an over 40,000 word document</a> on where AI is going<em><strong> </strong></em>and <a href=\"https://www.bloomberg.com/news/articles/2026-05-26/china-expands-travel-curbs-to-top-ai-talent-at-private-firms\">China expand personnel movement restrictions</a> on top AI researchers across industry. At the same time, the U.S. has <a href=\"https://www.axios.com/2026/04/19/nsa-anthropic-mythos-pentagon\">designated Anthropic a supply chain risk and continues to use its models for national security</a>. The list of news like this is only going to grow. Existing power structures are realizing there&#8217;s a finite time window for them to exert themselves in the AI dynamic &#8212; an intuition that could be mapped to influence going down as AI models get more powerful. This intuition is potentially dangerous, as it sets up meaningful conflict in who controls the technology (as I <a href=\"https://www.interconnects.ai/p/how-anthropic-vs-dow-impacts-open\">discussed</a> with Dean Ball after the Anthropic-DoW spat).</p><div><hr></div><h3>Next: Where technical becomes social</h3><p>These largely technical and <em>power</em> trends accelerating are going to put more pressure on the social and political anti-AI sentiments within the U.S. This is currently the most obvious barrier to continued AI development and beneficial diffusion. Reflecting on this, many people in the tech discourse get too focused on the details, where yes a lot of data-center-detractors are making genuinely wrong factual claims in defense of their position. </p><p>The real position that a large swath of Americans has is that they have a voice in saying no to the current trend &#8212; by not granting permission to build data centers. This is a voice that they haven&#8217;t been granted by the tech industry that changed the face of the global economy and power structures in the last few decades. </p><p>This is setting us up for a challenging year ahead for the industry. The labs are aggregating and concentrating talent to peak levels. There are few neutral messengers to communicate the reality of AI to the public. The frontier labs leadership is largely gearing up to IPO and stay ahead in the capabilities race. With the status quo, there are few actions to unwind this <a href=\"https://jasmi.news/p/warning-shots\">path toward social conflict</a>. </p><p>It takes individuals in the AI ecosystem to zag and go against the groupthink of needing to make your wealth today, of needing to be at a lab to do impactful work, and so on. I&#8217;m personally continuing to bet on this, by trying to make a vibrant and diverse open model ecosystem supported by clear, unbiased information. If you agree with this and have been watching from the sidelines, it&#8217;s a good time to get involved, before the situation spirals into something uncontrollable.</p>"
  },
  "stats_raw": {
    "published_at": "2026-05-26T15:39:02.000Z",
    "categories_count": 0,
    "content_excerpt_length": 110
  },
  "aux_raw": {
    "author": "Nathan Lambert",
    "categories": [],
    "feed_title": "Interconnects AI",
    "raw_excerpt": "Gemini Flash 3.5, Mythos, open-closed balance, America's open-source surge, emerging power struggles and more.",
    "feed_site_url": "https://www.interconnects.ai",
    "content_excerpt": "Gemini Flash 3.5, Mythos, open-closed balance, America's open-source surge, emerging power struggles and more.",
    "feed_description": "The cutting edge of AI, from inside the frontier AI labs, minus the hype. The border between high-level and technical thinking. Read by leading engineers, researchers, and investors."
  },
  "selection_meta": {
    "feed_id": "interconnects",
    "feed_priority": 5,
    "lookback_days": 7,
    "snapshot_version": "newsletter_rss_entry_v1",
    "max_items_per_feed": 5,
    "max_normalized_items": 40
  },
  "created_at": "2026-05-26T22:02:29.463Z",
  "updated_at": "2026-05-26T22:02:29.463Z"
}