星火 SparkCN

痛点分析发布于 2026/05/31

痛点为 AI 基于上游原始证据的初步提炼；未包含额外中国市场检索。

痛点

用户在使用 Hugging Face Transformers 库时，希望将 logits（经 softmax 后的 token 概率向量）作为模型输入，而非传统的 token IDs。当前做法是手动计算嵌入，但用户担心这会丢失模型所需的位置编码（如 RoPE）。核心痛点在于：用户需要理解 inputs_embeds 参数是否绕过位置编码步骤，以及如何正确保留位置信息。由于 Hugging Face 文档未明确说明，用户不得不深入阅读源码（如 LlamaModel 的 modeling_llama.py）来验证，这增加了学习成本和试错时间。对于不熟悉模型内部实现的开发者，这种不确定性可能导致错误使用，例如模型输出异常或性能下降，进而浪费调试时间。

§ Dossier

Stack Overflow question

Using the Hugging Face transformer library, I want to feed logits (i.e., a vector such that if the softmax is taken would have the probability of each token) as the input to a model. Currently, I convert the logits to an embedding with the following code: token_probabilities = F.softmax(logits,dim=-1) embeddings = token_probabilities @ model.embed_tokens.weight out = model(inputs_embed = embeddings, attention_mask=attention_mask, labels=labels) However, I am concerned that this throws away the positional encoding that the model needs. Does inputs_embed bypass the part of the model where positional data is attached? (for example, when the RoPE is added). Ideally, this would be answered with the general interface that Hugging Face models use, but I specifically care about LlamaForCausalLM if there is no general answer.

§ Dossier

Question details

View count: 58
Answer count: 1
Last activity: 2026/05/26

§ Dossier

Answers

By reading the source code for LlamaModel , it appears that passing inputs_embeds does not suppress the positional embedding step, since inputs_embeds is assigned to hidden_states which is then passed to self.rotary_emb() . The only difference from passing input_ids is that self.embed_tokens() isn't called. Presumably, other models are implemented similarly.

评论作者信息不可用2 votes

源数据· Raw Archive

source: Stack Overflow
upstream_source: stackoverflow
upstream_item_id: 79946518
daily_ranking_item_id: 741266d1-5c39-4614-9a21-1b3bb591d41a
rank_date: 2026-06-01
rank: 6
name: Does inputs_embeds bypass the positional encoding step of the model?
tagline: huggingface-transformers, large-language-model
description: Using the Hugging Face transformer library, I want to feed logits (i.e., a vector such that if the softmax is taken would have the probability of each token) as the input to a model. Currently, I convert the logits to an embedding with the following code: token_probabilities = F.softmax(logits,dim=-1) embeddings = token_probabilities @ model.embed_tokens.weight out = model(inputs_embed = embeddings, attention_mask=attention_mask, labels=labels) However, I am concerned that this throws away the positional encoding that the model needs. Does inputs_embed bypass the part of the model where positional data is attached? (for example, when the RoPE is added). Ideally, this would be answered with the general interface that Hugging Face models use, but I specifically care about LlamaForCausalLM if there is no general answer.
votes_count: 0
comments_count: 1
created_at_on_source: 2026-05-25T23:08:03.000Z
source_url: https://stackoverflow.com/questions/79946518/does-inputs-embeds-bypass-the-positional-encoding-step-of-the-model

topics

huggingface-transformerslarge-language-model

media / source-specific data

{
  "stackoverflow": {
    "score": 0,
    "view_count": 58,
    "is_answered": true,
    "top_answers": [
      {
        "body": "By reading the source code for LlamaModel , it appears that passing inputs_embeds does not suppress the positional embedding step, since inputs_embeds is assigned to hidden_states which is then passed to self.rotary_emb() . The only difference from passing input_ids is that self.embed_tokens() isn't called. Presumably, other models are implemented similarly.",
        "score": 2,
        "answer_id": 79946519,
        "is_accepted": false
      }
    ],
    "answer_count": 1,
    "accepted_answer_id": null,
    "last_activity_date": 1779774273
  }
}

raw_payload

{
  "stats": {
    "score": 0,
    "view_count": 58,
    "is_answered": true,
    "answer_count": 1,
    "creation_date": 1779750483,
    "last_edit_date": 1779774273,
    "accepted_answer_id": null,
    "last_activity_date": 1779774273
  },
  "api_wrapper": {
    "backoff": null,
    "has_more": true,
    "page_size": 8,
    "quota_max": 300,
    "quota_remaining": 248
  },
  "question_id": 79946518,
  "answer_fetch": {
    "has_more": false,
    "answers_fetched": 1,
    "answer_page_size": 3
  },
  "snapshot_version": "stackoverflow_question_v1"
}

source_raw_snapshot

{
  "id": "2375c83b-7a42-4cae-9ec3-55ee8d9ddcfd",
  "daily_ranking_item_id": "741266d1-5c39-4614-9a21-1b3bb591d41a",
  "source": "stackoverflow",
  "external_id": "79946518",
  "fetched_at": "2026-05-31T22:01:57.241Z",
  "question_raw": {
    "body": "<p>Using the Hugging Face transformer library, I want to feed logits (i.e., a vector such that if the softmax is taken would have the probability of each token) as the input to a model.</p>\n<p>Currently, I convert the logits to an embedding with the following code:</p>\n<pre class=\"lang-py prettyprint-override\"><code>token_probabilities = F.softmax(logits,dim=-1)\nembeddings = token_probabilities @ model.embed_tokens.weight\nout = model(inputs_embed = embeddings, attention_mask=attention_mask, labels=labels)\n</code></pre>\n<p>However, I am concerned that this throws away the positional encoding that the model needs. Does inputs_embed bypass the part of the model where positional data is attached? (for example, when the RoPE is added).</p>\n<p>Ideally, this would be answered with the general interface that Hugging Face models use, but I specifically care about LlamaForCausalLM if there is no general answer.</p>\n",
    "link": "https://stackoverflow.com/questions/79946518/does-inputs-embeds-bypass-the-positional-encoding-step-of-the-model",
    "tags": [
      "huggingface-transformers",
      "large-language-model"
    ],
    "owner": {
      "link": "https://stackoverflow.com/users/3750874/algorithmic-canary",
      "user_id": 3750874,
      "user_type": "registered",
      "account_id": 4627519,
      "reputation": 819,
      "display_name": "Algorithmic Canary",
      "profile_image": "https://www.gravatar.com/avatar/faf681a6d1d92e9f4fca8eabb2c5d03f?s=256&d=identicon&r=PG&f=y&so-version=2"
    },
    "score": 0,
    "title": "Does inputs_embeds bypass the positional encoding step of the model?",
    "view_count": 58,
    "is_answered": true,
    "question_id": 79946518,
    "answer_count": 1,
    "creation_date": 1779750483,
    "last_edit_date": 1779774273,
    "content_license": "CC BY-SA 4.0",
    "last_activity_date": 1779774273
  },
  "answers_raw": [
    {
      "body": "<p>By reading the source code for <a href=\"https://github.com/huggingface/transformers/blob/ece1ea0635367989ad4dfab0c084bcc57e5d897b/src/transformers/models/llama/modeling_llama.py#L355\" rel=\"nofollow noreferrer\">LlamaModel</a>, it appears that passing <code>inputs_embeds</code> does not suppress the positional embedding step, since <code>inputs_embeds</code> is assigned to <code>hidden_states</code> which is then passed to <code>self.rotary_emb()</code>. The only difference from passing <code>input_ids</code> is that <code>self.embed_tokens()</code> isn't called.</p>\n<p>Presumably, other models are implemented similarly.</p>\n",
      "owner": {
        "link": "https://stackoverflow.com/users/3750874/algorithmic-canary",
        "user_id": 3750874,
        "user_type": "registered",
        "account_id": 4627519,
        "reputation": 819,
        "display_name": "Algorithmic Canary",
        "profile_image": "https://www.gravatar.com/avatar/faf681a6d1d92e9f4fca8eabb2c5d03f?s=256&d=identicon&r=PG&f=y&so-version=2"
      },
      "score": 2,
      "answer_id": 79946519,
      "is_accepted": false,
      "question_id": 79946518,
      "creation_date": 1779751197,
      "content_license": "CC BY-SA 4.0",
      "last_activity_date": 1779751197
    }
  ],
  "tags_raw": [
    "huggingface-transformers",
    "large-language-model"
  ],
  "stats_raw": {
    "score": 0,
    "view_count": 58,
    "is_answered": true,
    "answer_count": 1,
    "creation_date": 1779750483,
    "last_edit_date": 1779774273,
    "accepted_answer_id": null,
    "last_activity_date": 1779774273
  },
  "selection_meta": {
    "site": "stackoverflow",
    "api_wrapper": {
      "backoff": null,
      "has_more": true,
      "page_size": 8,
      "quota_max": 300,
      "quota_remaining": 248
    },
    "answer_fetch": {
      "backoff": null,
      "has_more": false,
      "answers_fetched": 1,
      "quota_remaining": 219,
      "answer_page_size": 3
    },
    "snapshot_version": "stackoverflow_question_v1",
    "selection_strategy": "tag_whitelist_unanswered_high_score_recent_active"
  },
  "created_at": "2026-05-31T22:01:57.429Z",
  "updated_at": "2026-05-31T22:01:57.429Z"
}