星火 SparkCN

痛点分析发布于 2026/05/28

痛点为 AI 基于上游原始证据的初步提炼；未包含额外中国市场检索。

痛点

用户在使用 Hugging Face 的 LlamaForCausalLM 模型时，希望将自定义的 logits 向量（经 softmax 后与嵌入矩阵相乘得到嵌入）作为输入，但担心这样做会绕过模型的位置编码步骤（如 RoPE）。现有流程中，用户需要手动将 logits 转换为嵌入，但不确定转换后的嵌入是否仍能正确附加位置信息，导致对模型行为的不确定性。这种不确定性可能迫使开发者进行额外的实验验证或查阅源码，增加了开发时间和认知负担，尤其在需要精确控制输入表示的场景下，可能造成决策困难或模型输出不符合预期。

§ Dossier

Stack Overflow question

Using the Hugging Face transformer library, I want to feed logits (i.e., a vector such that if the softmax is taken would have the probability of each token) as the input to a model. Currently, I convert the logits to an embedding with the following code: token_probabilities = F.softmax(logits,dim=-1) embeddings = token_probabilities @ model.embed_tokens.weight out = model(inputs_embed = embeddings, attention_mask=attention_mask, labels=labels) However, I am concerned that this throws away the positional encoding that the model needs. Does inputs_embed bypass the part of the model where positional data is attached? (for example, when the RoPE is added). Ideally, this would be answered with the general interface that Hugging Face models use, but I specifically care about LlamaForCausalLM if there is no general answer.

§ Dossier

Question details

View count: 51
Answer count: 1
Last activity: 2026/05/26

§ Dossier

Answers

By reading the source code for LlamaModel , it appears that passing inputs_embeds does not suppress the positional embedding step, since inputs_embeds is assigned to hidden_states which is then passed to self.rotary_emb() . The only difference from passing input_ids is that self.embed_tokens() isn't called. Presumably, other models are implemented similarly.

评论作者信息不可用1 votes

源数据· Raw Archive

source: Stack Overflow
upstream_source: stackoverflow
upstream_item_id: 79946518
daily_ranking_item_id: 2bd1eaba-cf44-4384-a76f-40bf4c0e5147
rank_date: 2026-05-29
rank: 6
name: Does inputs_embeds bypass the positional encoding step of the model?
tagline: huggingface-transformers, large-language-model
description: Using the Hugging Face transformer library, I want to feed logits (i.e., a vector such that if the softmax is taken would have the probability of each token) as the input to a model. Currently, I convert the logits to an embedding with the following code: token_probabilities = F.softmax(logits,dim=-1) embeddings = token_probabilities @ model.embed_tokens.weight out = model(inputs_embed = embeddings, attention_mask=attention_mask, labels=labels) However, I am concerned that this throws away the positional encoding that the model needs. Does inputs_embed bypass the part of the model where positional data is attached? (for example, when the RoPE is added). Ideally, this would be answered with the general interface that Hugging Face models use, but I specifically care about LlamaForCausalLM if there is no general answer.
votes_count: -1
comments_count: 1
created_at_on_source: 2026-05-25T23:08:03.000Z
source_url: https://stackoverflow.com/questions/79946518/does-inputs-embeds-bypass-the-positional-encoding-step-of-the-model

topics

huggingface-transformerslarge-language-model

media / source-specific data

{
  "stackoverflow": {
    "score": -1,
    "view_count": 51,
    "is_answered": true,
    "top_answers": [
      {
        "body": "By reading the source code for LlamaModel , it appears that passing inputs_embeds does not suppress the positional embedding step, since inputs_embeds is assigned to hidden_states which is then passed to self.rotary_emb() . The only difference from passing input_ids is that self.embed_tokens() isn't called. Presumably, other models are implemented similarly.",
        "score": 1,
        "answer_id": 79946519,
        "is_accepted": false
      }
    ],
    "answer_count": 1,
    "accepted_answer_id": null,
    "last_activity_date": 1779774273
  }
}

raw_payload

{
  "stats": {
    "score": -1,
    "view_count": 51,
    "is_answered": true,
    "answer_count": 1,
    "creation_date": 1779750483,
    "last_edit_date": 1779774273,
    "accepted_answer_id": null,
    "last_activity_date": 1779774273
  },
  "api_wrapper": {
    "backoff": null,
    "has_more": true,
    "page_size": 8,
    "quota_max": 300,
    "quota_remaining": 294
  },
  "question_id": 79946518,
  "answer_fetch": {
    "has_more": false,
    "answers_fetched": 1,
    "answer_page_size": 3
  },
  "snapshot_version": "stackoverflow_question_v1"
}

source_raw_snapshot

{
  "id": "838eaef1-9685-485f-9bdf-27ce8b4ef304",
  "daily_ranking_item_id": "2bd1eaba-cf44-4384-a76f-40bf4c0e5147",
  "source": "stackoverflow",
  "external_id": "79946518",
  "fetched_at": "2026-05-28T22:02:15.509Z",
  "question_raw": {
    "body": "<p>Using the Hugging Face transformer library, I want to feed logits (i.e., a vector such that if the softmax is taken would have the probability of each token) as the input to a model.</p>\n<p>Currently, I convert the logits to an embedding with the following code:</p>\n<pre class=\"lang-py prettyprint-override\"><code>token_probabilities = F.softmax(logits,dim=-1)\nembeddings = token_probabilities @ model.embed_tokens.weight\nout = model(inputs_embed = embeddings, attention_mask=attention_mask, labels=labels)\n</code></pre>\n<p>However, I am concerned that this throws away the positional encoding that the model needs. Does inputs_embed bypass the part of the model where positional data is attached? (for example, when the RoPE is added).</p>\n<p>Ideally, this would be answered with the general interface that Hugging Face models use, but I specifically care about LlamaForCausalLM if there is no general answer.</p>\n",
    "link": "https://stackoverflow.com/questions/79946518/does-inputs-embeds-bypass-the-positional-encoding-step-of-the-model",
    "tags": [
      "huggingface-transformers",
      "large-language-model"
    ],
    "owner": {
      "link": "https://stackoverflow.com/users/3750874/algorithmic-canary",
      "user_id": 3750874,
      "user_type": "registered",
      "account_id": 4627519,
      "reputation": 799,
      "display_name": "Algorithmic Canary",
      "profile_image": "https://www.gravatar.com/avatar/faf681a6d1d92e9f4fca8eabb2c5d03f?s=256&d=identicon&r=PG&f=y&so-version=2"
    },
    "score": -1,
    "title": "Does inputs_embeds bypass the positional encoding step of the model?",
    "view_count": 51,
    "is_answered": true,
    "question_id": 79946518,
    "answer_count": 1,
    "creation_date": 1779750483,
    "last_edit_date": 1779774273,
    "content_license": "CC BY-SA 4.0",
    "last_activity_date": 1779774273
  },
  "answers_raw": [
    {
      "body": "<p>By reading the source code for <a href=\"https://github.com/huggingface/transformers/blob/ece1ea0635367989ad4dfab0c084bcc57e5d897b/src/transformers/models/llama/modeling_llama.py#L355\" rel=\"nofollow noreferrer\">LlamaModel</a>, it appears that passing <code>inputs_embeds</code> does not suppress the positional embedding step, since <code>inputs_embeds</code> is assigned to <code>hidden_states</code> which is then passed to <code>self.rotary_emb()</code>. The only difference from passing <code>input_ids</code> is that <code>self.embed_tokens()</code> isn't called.</p>\n<p>Presumably, other models are implemented similarly.</p>\n",
      "owner": {
        "link": "https://stackoverflow.com/users/3750874/algorithmic-canary",
        "user_id": 3750874,
        "user_type": "registered",
        "account_id": 4627519,
        "reputation": 799,
        "display_name": "Algorithmic Canary",
        "profile_image": "https://www.gravatar.com/avatar/faf681a6d1d92e9f4fca8eabb2c5d03f?s=256&d=identicon&r=PG&f=y&so-version=2"
      },
      "score": 1,
      "answer_id": 79946519,
      "is_accepted": false,
      "question_id": 79946518,
      "creation_date": 1779751197,
      "content_license": "CC BY-SA 4.0",
      "last_activity_date": 1779751197
    }
  ],
  "tags_raw": [
    "huggingface-transformers",
    "large-language-model"
  ],
  "stats_raw": {
    "score": -1,
    "view_count": 51,
    "is_answered": true,
    "answer_count": 1,
    "creation_date": 1779750483,
    "last_edit_date": 1779774273,
    "accepted_answer_id": null,
    "last_activity_date": 1779774273
  },
  "selection_meta": {
    "site": "stackoverflow",
    "api_wrapper": {
      "backoff": null,
      "has_more": true,
      "page_size": 8,
      "quota_max": 300,
      "quota_remaining": 294
    },
    "answer_fetch": {
      "backoff": null,
      "has_more": false,
      "answers_fetched": 1,
      "quota_remaining": 267,
      "answer_page_size": 3
    },
    "snapshot_version": "stackoverflow_question_v1",
    "selection_strategy": "tag_whitelist_unanswered_high_score_recent_active"
  },
  "created_at": "2026-05-28T22:02:15.701Z",
  "updated_at": "2026-05-28T22:02:15.701Z"
}