返回 Discover
Field DispatchStack Overflow6 · 2026-05-27

Does inputs_embeds bypass the positional encoding step of the model?

Tags
huggingface-transformerslarge-language-model
Score
-4
Answers
1
Views
39
Answered
Yes
痛点分析发布于 2026/05/26

痛点为 AI 基于上游原始证据的初步提炼;未包含额外中国市场检索。

痛点

用户在使用 Hugging Face 的 LlamaForCausalLM 模型时,尝试通过 inputs_embeds 传入自定义的 logits 转换后的嵌入向量,但担心这会绕过模型的位置编码(如 RoPE)步骤。从问题描述看,用户的核心任务是精确控制模型输入(例如注入外部 logits),但现有文档和接口说明不清晰,导致用户无法确定 inputs_embeds 是否保留了位置编码。这种不确定性迫使开发者需要阅读源码来验证行为,增加了调试时间和认知负担。具体后果包括:开发者在实现自定义输入时可能因位置信息丢失而得到错误结果,需要额外编写测试代码来确认模型行为,或者被迫放弃更灵活的输入方式而改用标准 input_ids。Stack Overflow 上该问题得分为 -4(负面),说明社区可能认为问题过于基础或文档已覆盖,但用户仍感到困惑,反映出官方文档在解释 inputs_embeds 与位置编码交互方面存在缺口。

§ Dossier

Stack Overflow question

Using the Hugging Face transformer library, I want to feed logits (i.e., a vector such that if the softmax is taken would have the probability of each token) as the input to a model. Currently, I convert the logits to an embedding with the following code: token_probabilities = F.softmax(logits,dim=-1) embeddings = token_probabilities @ model.embed_tokens.weight out = model(inputs_embed = embeddings, attention_mask=attention_mask, labels=labels) However, I am concerned that this throws away the positional encoding that the model needs. Does inputs_embed bypass the part of the model where positional data is attached? (for example, when the RoPE is added). Ideally, this would be answered with the general interface that Hugging Face models use, but I specifically care about LlamaForCausalLM if there is no general answer.

§ Dossier

Question details

View count
39
Answer count
1
Last activity
2026/05/26
§ Dossier

Answers

By reading the source code for LlamaModel , it appears that passing inputs_embeds does not suppress the positional embedding step, since inputs_embeds is assigned to hidden_states which is then passed to self.rotary_emb() . The only difference from passing input_ids is that self.embed_tokens() isn't called. Presumably, other models are implemented similarly.

评论作者信息不可用1 votes
源数据· Raw Archive
source
Stack Overflow
upstream_source
stackoverflow
upstream_item_id
79946518
daily_ranking_item_id
57a9e710-d586-4c5f-8ecd-e1c8bfe730e2
rank_date
2026-05-27
rank
6
name
Does inputs_embeds bypass the positional encoding step of the model?
tagline
huggingface-transformers, large-language-model
description
Using the Hugging Face transformer library, I want to feed logits (i.e., a vector such that if the softmax is taken would have the probability of each token) as the input to a model. Currently, I convert the logits to an embedding with the following code: token_probabilities = F.softmax(logits,dim=-1) embeddings = token_probabilities @ model.embed_tokens.weight out = model(inputs_embed = embeddings, attention_mask=attention_mask, labels=labels) However, I am concerned that this throws away the positional encoding that the model needs. Does inputs_embed bypass the part of the model where positional data is attached? (for example, when the RoPE is added). Ideally, this would be answered with the general interface that Hugging Face models use, but I specifically care about LlamaForCausalLM if there is no general answer.
votes_count
-4
comments_count
1
created_at_on_source
2026-05-25T23:08:03.000Z
topics
huggingface-transformerslarge-language-model
media / source-specific data
{
  "stackoverflow": {
    "score": -4,
    "view_count": 39,
    "is_answered": true,
    "top_answers": [
      {
        "body": "By reading the source code for LlamaModel , it appears that passing inputs_embeds does not suppress the positional embedding step, since inputs_embeds is assigned to hidden_states which is then passed to self.rotary_emb() . The only difference from passing input_ids is that self.embed_tokens() isn't called. Presumably, other models are implemented similarly.",
        "score": 1,
        "answer_id": 79946519,
        "is_accepted": false
      }
    ],
    "answer_count": 1,
    "accepted_answer_id": null,
    "last_activity_date": 1779774273
  }
}
raw_payload
{
  "stats": {
    "score": -4,
    "view_count": 39,
    "is_answered": true,
    "answer_count": 1,
    "creation_date": 1779750483,
    "last_edit_date": 1779774273,
    "accepted_answer_id": null,
    "last_activity_date": 1779774273
  },
  "api_wrapper": {
    "backoff": null,
    "has_more": true,
    "page_size": 8,
    "quota_max": 300,
    "quota_remaining": 296
  },
  "question_id": 79946518,
  "answer_fetch": {
    "has_more": false,
    "answers_fetched": 1,
    "answer_page_size": 3
  },
  "snapshot_version": "stackoverflow_question_v1"
}
source_raw_snapshot
{
  "id": "66e0f34f-4a19-4d67-8d68-605ec62ba4e8",
  "daily_ranking_item_id": "57a9e710-d586-4c5f-8ecd-e1c8bfe730e2",
  "source": "stackoverflow",
  "external_id": "79946518",
  "fetched_at": "2026-05-26T22:02:05.726Z",
  "question_raw": {
    "body": "<p>Using the Hugging Face transformer library, I want to feed logits (i.e., a vector such that if the softmax is taken would have the probability of each token) as the input to a model.</p>\n<p>Currently, I convert the logits to an embedding with the following code:</p>\n<pre class=\"lang-py prettyprint-override\"><code>token_probabilities = F.softmax(logits,dim=-1)\nembeddings = token_probabilities @ model.embed_tokens.weight\nout = model(inputs_embed = embeddings, attention_mask=attention_mask, labels=labels)\n</code></pre>\n<p>However, I am concerned that this throws away the positional encoding that the model needs. Does inputs_embed bypass the part of the model where positional data is attached? (for example, when the RoPE is added).</p>\n<p>Ideally, this would be answered with the general interface that Hugging Face models use, but I specifically care about LlamaForCausalLM if there is no general answer.</p>\n",
    "link": "https://stackoverflow.com/questions/79946518/does-inputs-embeds-bypass-the-positional-encoding-step-of-the-model",
    "tags": [
      "huggingface-transformers",
      "large-language-model"
    ],
    "owner": {
      "link": "https://stackoverflow.com/users/3750874/algorithmic-canary",
      "user_id": 3750874,
      "user_type": "registered",
      "account_id": 4627519,
      "reputation": 799,
      "display_name": "Algorithmic Canary",
      "profile_image": "https://www.gravatar.com/avatar/faf681a6d1d92e9f4fca8eabb2c5d03f?s=256&d=identicon&r=PG&f=y&so-version=2"
    },
    "score": -4,
    "title": "Does inputs_embeds bypass the positional encoding step of the model?",
    "view_count": 39,
    "is_answered": true,
    "question_id": 79946518,
    "answer_count": 1,
    "creation_date": 1779750483,
    "last_edit_date": 1779774273,
    "content_license": "CC BY-SA 4.0",
    "last_activity_date": 1779774273
  },
  "answers_raw": [
    {
      "body": "<p>By reading the source code for <a href=\"https://github.com/huggingface/transformers/blob/ece1ea0635367989ad4dfab0c084bcc57e5d897b/src/transformers/models/llama/modeling_llama.py#L355\" rel=\"nofollow noreferrer\">LlamaModel</a>, it appears that passing <code>inputs_embeds</code> does not suppress the positional embedding step, since <code>inputs_embeds</code> is assigned to <code>hidden_states</code> which is then passed to <code>self.rotary_emb()</code>. The only difference from passing <code>input_ids</code> is that <code>self.embed_tokens()</code> isn't called.</p>\n<p>Presumably, other models are implemented similarly.</p>\n",
      "owner": {
        "link": "https://stackoverflow.com/users/3750874/algorithmic-canary",
        "user_id": 3750874,
        "user_type": "registered",
        "account_id": 4627519,
        "reputation": 799,
        "display_name": "Algorithmic Canary",
        "profile_image": "https://www.gravatar.com/avatar/faf681a6d1d92e9f4fca8eabb2c5d03f?s=256&d=identicon&r=PG&f=y&so-version=2"
      },
      "score": 1,
      "answer_id": 79946519,
      "is_accepted": false,
      "question_id": 79946518,
      "creation_date": 1779751197,
      "content_license": "CC BY-SA 4.0",
      "last_activity_date": 1779751197
    }
  ],
  "tags_raw": [
    "huggingface-transformers",
    "large-language-model"
  ],
  "stats_raw": {
    "score": -4,
    "view_count": 39,
    "is_answered": true,
    "answer_count": 1,
    "creation_date": 1779750483,
    "last_edit_date": 1779774273,
    "accepted_answer_id": null,
    "last_activity_date": 1779774273
  },
  "selection_meta": {
    "site": "stackoverflow",
    "api_wrapper": {
      "backoff": null,
      "has_more": true,
      "page_size": 8,
      "quota_max": 300,
      "quota_remaining": 296
    },
    "answer_fetch": {
      "backoff": null,
      "has_more": false,
      "answers_fetched": 1,
      "quota_remaining": 267,
      "answer_page_size": 3
    },
    "snapshot_version": "stackoverflow_question_v1",
    "selection_strategy": "tag_whitelist_unanswered_high_score_recent_active"
  },
  "created_at": "2026-05-26T22:02:06.067Z",
  "updated_at": "2026-05-26T22:02:06.067Z"
}