返回 Discover
Field DispatchHacker News10 · 2026-05-30

Liquid AI reveals 8B-A1B MoE trained on 38T

www.liquid.ai

Points
113
Comments
34
日榜排名
#10
Host
www.liquid.ai
痛点分析发布于 2026/05/29

痛点为 AI 基于上游原始证据的初步提炼;未包含额外中国市场检索。

痛点

用户讨论集中在模型规模与训练数据量的比例上,例如有评论指出38T tokens对8B模型来说似乎过多('overtraining'),并对比了Chinchilla scaling法则(20倍活跃参数)与Mistral(2倍)的差异,认为当前模型达到了1800倍。这表明在模型开发中,用户面临如何确定最优训练数据量与模型参数比例的问题,现有经验法则(如Chinchilla)可能不再适用,导致资源浪费或性能未达预期。这种不确定性增加了模型调优的试错成本和时间消耗。

External Article

External article summary

Today, we’re releasing LFM2.5-8B-A1B, a high-throughput edge model optimized for fast, reliable tool calling and complex instruction following on consumer hardware, delivering compressed performance competitive with much larger models and day-one support across major inference frameworks.

External Article

External article source

Article title
LFM2.5-8B-A1B: an Even Better on-Device Mixture-of-Experts | Liquid AI
Host
www.liquid.ai
§ Dossier

Selected HN comments

The small models are getting really impressive. I recently realized that Qwen3.5:4B is way more capable than I thought a model that size could be. Combine that with the work Liquid puts into RL and fine tuning, and you get models that perform extremely well on minimal hardware. Combine that with your own fine tuning, and you get a specialized tool that is fast, private, and doesn’t require internet connection.

chabes

Hmm, I asked it who made it, and it says Google?

kilroy123

Woah, chinchilla scaling is 20 x active_params. I think mistral was 2 x Chinchilla. This is 1800 x

irthomasthomas

Anybody use their localcowork [1] before? That is where the demo lives. Or not? [1] https://github.com/Liquid4All/cookbook/tree/main/examples/lo...

SubiculumCode

Liquid does amazing work, but I kinda feel like they are overtraining their models. 38T tokens seems like a lot for an 8B model

Ifkaluva
源数据· Raw Archive
source
Hacker News
upstream_source
hacker_news
upstream_item_id
48325306
daily_ranking_item_id
93681b17-40f4-4ae6-97b7-5cc2ca2308e6
rank_date
2026-05-30
rank
10
name
Liquid AI reveals 8B-A1B MoE trained on 38T
tagline
www.liquid.ai
votes_count
113
comments_count
34
created_at_on_source
2026-05-29T16:19:54.000Z
media / source-specific data
{
  "author": "simjnd",
  "hn_item_id": 48325306,
  "external_url": "https://www.liquid.ai/blog/lfm2-5-8b-a1b"
}
raw_payload
{
  "by": "simjnd",
  "id": 48325306,
  "url": "https://www.liquid.ai/blog/lfm2-5-8b-a1b",
  "kids": [
    48328322,
    48329846,
    48329703,
    48328511,
    48329676,
    48328307,
    48327949,
    48329706,
    48328208,
    48328281,
    48328261,
    48328092,
    48328194
  ],
  "time": 1780071594,
  "type": "story",
  "score": 113,
  "title": "Liquid AI reveals 8B-A1B MoE trained on 38T",
  "descendants": 34
}
source_raw_snapshot
{
  "id": "c209503f-a46e-470d-b29a-1857c47e35d3",
  "daily_ranking_item_id": "93681b17-40f4-4ae6-97b7-5cc2ca2308e6",
  "source": "hacker_news",
  "external_id": "48325306",
  "fetched_at": "2026-05-29T22:01:20.908Z",
  "story_raw": {
    "by": "simjnd",
    "id": 48325306,
    "url": "https://www.liquid.ai/blog/lfm2-5-8b-a1b",
    "kids": [
      48328322,
      48329846,
      48329703,
      48328511,
      48329676,
      48328307,
      48327949,
      48329706,
      48328208,
      48328281,
      48328261,
      48328092,
      48328194
    ],
    "time": 1780071594,
    "type": "story",
    "score": 113,
    "title": "Liquid AI reveals 8B-A1B MoE trained on 38T",
    "descendants": 34
  },
  "stats_raw": {
    "time": 1780071594,
    "score": 113,
    "descendants": 34
  },
  "aux_raw": {
    "external_url": "https://www.liquid.ai/blog/lfm2-5-8b-a1b",
    "hn_comment_url": "https://news.ycombinator.com/item?id=48325306",
    "normalized_text": null,
    "external_article": {
      "title": "LFM2.5-8B-A1B: an Even Better on-Device Mixture-of-Experts | Liquid AI",
      "excerpt": "Today, we're releasing LFM2.5-8B-A1B , an edge model built for fast, reliable tool calling on consumer hardware.\n\nIt builds on our LFM2-8B-A1B release from October 2025, with an expanded 128K context window, scaled-up pretraining (from 12T to 38T tokens), and large-scale reinforcement learning. We also doubled its vocabulary to improve tokenization efficiency for non-Latin languages. The result is a model that chains tool calls, achieves tasks, and fits comfortably even on an entry-level laptop.\n\nThe base (LFM2.5-8B-A1B-Base) and post-trained (LFM2.5-8B-A1B) models are available today on Hugging Face and our Playground . Check out our docs on how to run and fine-tune them locally.\n\nCompared to LFM2-8B-A1B, this new version expands the context window from 32,768 to 128,000 tokens . This allows the model to process longer documents and reason for longer. Its vocabulary size was also scaled up from 65,536 to 128,000 to tokenize non-Latin scripts more efficiently . We see particularly strong compression gains in Hindi, Thai, Vietnamese, Indonesian, and Arabic. The rest of the architecture follows the same combination of MoE, GQA, and gated short convolution blocks as LFM2-8B-A1B, as sh",
      "final_url": "https://www.liquid.ai/blog/lfm2-5-8b-a1b",
      "fetched_at": "2026-05-29T22:01:18.268Z",
      "description": "Today, we’re releasing LFM2.5-8B-A1B, a high-throughput edge model optimized for fast, reliable tool calling and complex instruction following on consumer hardware, delivering compressed performance competitive with much larger models and day-one support across major inference frameworks."
    },
    "selected_comments": [
      {
        "id": 48328322,
        "raw": {
          "by": "chabes",
          "id": 48328322,
          "kids": [
            48328383
          ],
          "text": "The small models are getting really impressive.<p>I recently realized that Qwen3.5:4B is way more capable than I thought a model that size could be.<p>Combine that with the work Liquid puts into RL and fine tuning, and you get models that perform extremely well on minimal hardware.<p>Combine that with your own fine tuning, and you get a specialized tool that is fast, private, and doesn’t require internet connection.",
          "time": 1780084352,
          "type": "comment",
          "parent": 48325306
        },
        "body": "The small models are getting really impressive. I recently realized that Qwen3.5:4B is way more capable than I thought a model that size could be. Combine that with the work Liquid puts into RL and fine tuning, and you get models that perform extremely well on minimal hardware. Combine that with your own fine tuning, and you get a specialized tool that is fast, private, and doesn’t require internet connection.",
        "is_op": false,
        "author": "chabes",
        "raw_body": "The small models are getting really impressive.<p>I recently realized that Qwen3.5:4B is way more capable than I thought a model that size could be.<p>Combine that with the work Liquid puts into RL and fine tuning, and you get models that perform extremely well on minimal hardware.<p>Combine that with your own fine tuning, and you get a specialized tool that is fast, private, and doesn’t require internet connection.",
        "created_at": 1780084352,
        "reply_count": 1
      },
      {
        "id": 48329846,
        "raw": {
          "by": "kilroy123",
          "id": 48329846,
          "text": "Hmm, I asked it who made it, and it says Google?",
          "time": 1780091823,
          "type": "comment",
          "parent": 48325306
        },
        "body": "Hmm, I asked it who made it, and it says Google?",
        "is_op": false,
        "author": "kilroy123",
        "raw_body": "Hmm, I asked it who made it, and it says Google?",
        "created_at": 1780091823,
        "reply_count": 0
      },
      {
        "id": 48329703,
        "raw": {
          "by": "irthomasthomas",
          "id": 48329703,
          "text": "Woah, chinchilla scaling is 20 x active_params. I think mistral was 2 x Chinchilla. This is 1800 x",
          "time": 1780090938,
          "type": "comment",
          "parent": 48325306
        },
        "body": "Woah, chinchilla scaling is 20 x active_params. I think mistral was 2 x Chinchilla. This is 1800 x",
        "is_op": false,
        "author": "irthomasthomas",
        "raw_body": "Woah, chinchilla scaling is 20 x active_params. I think mistral was 2 x Chinchilla. This is 1800 x",
        "created_at": 1780090938,
        "reply_count": 0
      },
      {
        "id": 48328511,
        "raw": {
          "by": "SubiculumCode",
          "id": 48328511,
          "text": "Anybody use their localcowork [1] before? \nThat is where the demo lives. Or not?<p>[1] <a href=\"https:&#x2F;&#x2F;github.com&#x2F;Liquid4All&#x2F;cookbook&#x2F;tree&#x2F;main&#x2F;examples&#x2F;localcowork\" rel=\"nofollow\">https:&#x2F;&#x2F;github.com&#x2F;Liquid4All&#x2F;cookbook&#x2F;tree&#x2F;main&#x2F;examples&#x2F;lo...</a>",
          "time": 1780085260,
          "type": "comment",
          "parent": 48325306
        },
        "body": "Anybody use their localcowork [1] before? That is where the demo lives. Or not? [1] https://github.com/Liquid4All/cookbook/tree/main/examples/lo...",
        "is_op": false,
        "author": "SubiculumCode",
        "raw_body": "Anybody use their localcowork [1] before? \nThat is where the demo lives. Or not?<p>[1] <a href=\"https:&#x2F;&#x2F;github.com&#x2F;Liquid4All&#x2F;cookbook&#x2F;tree&#x2F;main&#x2F;examples&#x2F;localcowork\" rel=\"nofollow\">https:&#x2F;&#x2F;github.com&#x2F;Liquid4All&#x2F;cookbook&#x2F;tree&#x2F;main&#x2F;examples&#x2F;lo...</a>",
        "created_at": 1780085260,
        "reply_count": 0
      },
      {
        "id": 48329676,
        "raw": {
          "by": "Ifkaluva",
          "id": 48329676,
          "kids": [
            48329781
          ],
          "text": "Liquid does amazing work, but I kinda feel like they are overtraining their models. 38T tokens seems like a lot for an 8B model",
          "time": 1780090811,
          "type": "comment",
          "parent": 48325306
        },
        "body": "Liquid does amazing work, but I kinda feel like they are overtraining their models. 38T tokens seems like a lot for an 8B model",
        "is_op": false,
        "author": "Ifkaluva",
        "raw_body": "Liquid does amazing work, but I kinda feel like they are overtraining their models. 38T tokens seems like a lot for an 8B model",
        "created_at": 1780090811,
        "reply_count": 1
      }
    ],
    "presentation_fields": {
      "title": "Liquid AI reveals 8B-A1B MoE trained on 38T",
      "tagline": "www.liquid.ai",
      "website_url": "https://www.liquid.ai/blog/lfm2-5-8b-a1b",
      "canonical_url": "https://news.ycombinator.com/item?id=48325306"
    },
    "external_url_hostname": "www.liquid.ai",
    "selected_comments_raw": [
      {
        "by": "chabes",
        "id": 48328322,
        "kids": [
          48328383
        ],
        "text": "The small models are getting really impressive.<p>I recently realized that Qwen3.5:4B is way more capable than I thought a model that size could be.<p>Combine that with the work Liquid puts into RL and fine tuning, and you get models that perform extremely well on minimal hardware.<p>Combine that with your own fine tuning, and you get a specialized tool that is fast, private, and doesn’t require internet connection.",
        "time": 1780084352,
        "type": "comment",
        "parent": 48325306
      },
      {
        "by": "kilroy123",
        "id": 48329846,
        "text": "Hmm, I asked it who made it, and it says Google?",
        "time": 1780091823,
        "type": "comment",
        "parent": 48325306
      },
      {
        "by": "irthomasthomas",
        "id": 48329703,
        "text": "Woah, chinchilla scaling is 20 x active_params. I think mistral was 2 x Chinchilla. This is 1800 x",
        "time": 1780090938,
        "type": "comment",
        "parent": 48325306
      },
      {
        "by": "SubiculumCode",
        "id": 48328511,
        "text": "Anybody use their localcowork [1] before? \nThat is where the demo lives. Or not?<p>[1] <a href=\"https:&#x2F;&#x2F;github.com&#x2F;Liquid4All&#x2F;cookbook&#x2F;tree&#x2F;main&#x2F;examples&#x2F;localcowork\" rel=\"nofollow\">https:&#x2F;&#x2F;github.com&#x2F;Liquid4All&#x2F;cookbook&#x2F;tree&#x2F;main&#x2F;examples&#x2F;lo...</a>",
        "time": 1780085260,
        "type": "comment",
        "parent": 48325306
      },
      {
        "by": "Ifkaluva",
        "id": 48329676,
        "kids": [
          48329781
        ],
        "text": "Liquid does amazing work, but I kinda feel like they are overtraining their models. 38T tokens seems like a lot for an 8B model",
        "time": 1780090811,
        "type": "comment",
        "parent": 48325306
      }
    ]
  },
  "selection_meta": {
    "discussion_depth": "top_comments_v1",
    "external_article": {
      "status": "ok",
      "final_url": "https://www.liquid.ai/blog/lfm2-5-8b-a1b",
      "status_code": 200,
      "content_type": "text/html; charset=utf-8",
      "failure_reason": null
    },
    "snapshot_version": "hn_story_v3",
    "selected_comments_count": 5,
    "external_article_resolved": true,
    "text_normalization_applied": false
  },
  "created_at": "2026-05-29T22:01:21.179Z",
  "updated_at": "2026-05-29T22:01:21.179Z"
}