返回 Discover
Field DispatchStack Overflow6 · 2026-05-30

Does inputs_embeds bypass the positional encoding step of the model?

Tags
huggingface-transformerslarge-language-model
Score
0
Answers
1
Views
53
Answered
Yes
痛点分析发布于 2026/05/29

痛点为 AI 基于上游原始证据的初步提炼;未包含额外中国市场检索。

痛点

用户在使用 Hugging Face 的 LlamaForCausalLM 模型时,希望将自定义的 logits 向量(经 softmax 后与嵌入权重矩阵相乘得到嵌入)作为模型输入,但担心这样做会丢失模型所需的位置编码(如 RoPE)。用户的核心任务是实现一种非标准输入方式(绕过 token 嵌入层直接传入嵌入向量),同时保留位置编码信息。现有流程中,用户需要手动将 logits 转换为嵌入,但不确定这一操作是否跳过了模型内部的位置编码步骤,导致模型可能无法正确理解序列中 token 的相对位置。这种不确定性造成了明显的摩擦:用户无法确认自己的实现是否正确,可能需要进行大量实验验证,或者不得不阅读模型源代码来排查,这增加了开发时间和认知负担。如果位置编码确实被跳过,模型输出将失去位置信息,严重影响生成质量,但用户缺乏明确的文档或接口说明来指导这一操作。

§ Dossier

Stack Overflow question

Using the Hugging Face transformer library, I want to feed logits (i.e., a vector such that if the softmax is taken would have the probability of each token) as the input to a model. Currently, I convert the logits to an embedding with the following code: token_probabilities = F.softmax(logits,dim=-1) embeddings = token_probabilities @ model.embed_tokens.weight out = model(inputs_embed = embeddings, attention_mask=attention_mask, labels=labels) However, I am concerned that this throws away the positional encoding that the model needs. Does inputs_embed bypass the part of the model where positional data is attached? (for example, when the RoPE is added). Ideally, this would be answered with the general interface that Hugging Face models use, but I specifically care about LlamaForCausalLM if there is no general answer.

§ Dossier

Question details

View count
53
Answer count
1
Last activity
2026/05/26
§ Dossier

Answers

By reading the source code for LlamaModel , it appears that passing inputs_embeds does not suppress the positional embedding step, since inputs_embeds is assigned to hidden_states which is then passed to self.rotary_emb() . The only difference from passing input_ids is that self.embed_tokens() isn't called. Presumably, other models are implemented similarly.

评论作者信息不可用2 votes
源数据· Raw Archive
source
Stack Overflow
upstream_source
stackoverflow
upstream_item_id
79946518
daily_ranking_item_id
462f2ae6-7274-4d27-ba2f-fe9fd855ecc2
rank_date
2026-05-30
rank
6
name
Does inputs_embeds bypass the positional encoding step of the model?
tagline
huggingface-transformers, large-language-model
description
Using the Hugging Face transformer library, I want to feed logits (i.e., a vector such that if the softmax is taken would have the probability of each token) as the input to a model. Currently, I convert the logits to an embedding with the following code: token_probabilities = F.softmax(logits,dim=-1) embeddings = token_probabilities @ model.embed_tokens.weight out = model(inputs_embed = embeddings, attention_mask=attention_mask, labels=labels) However, I am concerned that this throws away the positional encoding that the model needs. Does inputs_embed bypass the part of the model where positional data is attached? (for example, when the RoPE is added). Ideally, this would be answered with the general interface that Hugging Face models use, but I specifically care about LlamaForCausalLM if there is no general answer.
votes_count
0
comments_count
1
created_at_on_source
2026-05-25T23:08:03.000Z
topics
huggingface-transformerslarge-language-model
media / source-specific data
{
  "stackoverflow": {
    "score": 0,
    "view_count": 53,
    "is_answered": true,
    "top_answers": [
      {
        "body": "By reading the source code for LlamaModel , it appears that passing inputs_embeds does not suppress the positional embedding step, since inputs_embeds is assigned to hidden_states which is then passed to self.rotary_emb() . The only difference from passing input_ids is that self.embed_tokens() isn't called. Presumably, other models are implemented similarly.",
        "score": 2,
        "answer_id": 79946519,
        "is_accepted": false
      }
    ],
    "answer_count": 1,
    "accepted_answer_id": null,
    "last_activity_date": 1779774273
  }
}
raw_payload
{
  "stats": {
    "score": 0,
    "view_count": 53,
    "is_answered": true,
    "answer_count": 1,
    "creation_date": 1779750483,
    "last_edit_date": 1779774273,
    "accepted_answer_id": null,
    "last_activity_date": 1779774273
  },
  "api_wrapper": {
    "backoff": null,
    "has_more": true,
    "page_size": 8,
    "quota_max": 300,
    "quota_remaining": 204
  },
  "question_id": 79946518,
  "answer_fetch": {
    "has_more": false,
    "answers_fetched": 1,
    "answer_page_size": 3
  },
  "snapshot_version": "stackoverflow_question_v1"
}
source_raw_snapshot
{
  "id": "302f861a-8d59-4ddf-b38a-60f5cd1e1ff0",
  "daily_ranking_item_id": "462f2ae6-7274-4d27-ba2f-fe9fd855ecc2",
  "source": "stackoverflow",
  "external_id": "79946518",
  "fetched_at": "2026-05-29T22:02:13.965Z",
  "question_raw": {
    "body": "<p>Using the Hugging Face transformer library, I want to feed logits (i.e., a vector such that if the softmax is taken would have the probability of each token) as the input to a model.</p>\n<p>Currently, I convert the logits to an embedding with the following code:</p>\n<pre class=\"lang-py prettyprint-override\"><code>token_probabilities = F.softmax(logits,dim=-1)\nembeddings = token_probabilities @ model.embed_tokens.weight\nout = model(inputs_embed = embeddings, attention_mask=attention_mask, labels=labels)\n</code></pre>\n<p>However, I am concerned that this throws away the positional encoding that the model needs. Does inputs_embed bypass the part of the model where positional data is attached? (for example, when the RoPE is added).</p>\n<p>Ideally, this would be answered with the general interface that Hugging Face models use, but I specifically care about LlamaForCausalLM if there is no general answer.</p>\n",
    "link": "https://stackoverflow.com/questions/79946518/does-inputs-embeds-bypass-the-positional-encoding-step-of-the-model",
    "tags": [
      "huggingface-transformers",
      "large-language-model"
    ],
    "owner": {
      "link": "https://stackoverflow.com/users/3750874/algorithmic-canary",
      "user_id": 3750874,
      "user_type": "registered",
      "account_id": 4627519,
      "reputation": 819,
      "display_name": "Algorithmic Canary",
      "profile_image": "https://www.gravatar.com/avatar/faf681a6d1d92e9f4fca8eabb2c5d03f?s=256&d=identicon&r=PG&f=y&so-version=2"
    },
    "score": 0,
    "title": "Does inputs_embeds bypass the positional encoding step of the model?",
    "view_count": 53,
    "is_answered": true,
    "question_id": 79946518,
    "answer_count": 1,
    "creation_date": 1779750483,
    "last_edit_date": 1779774273,
    "content_license": "CC BY-SA 4.0",
    "last_activity_date": 1779774273
  },
  "answers_raw": [
    {
      "body": "<p>By reading the source code for <a href=\"https://github.com/huggingface/transformers/blob/ece1ea0635367989ad4dfab0c084bcc57e5d897b/src/transformers/models/llama/modeling_llama.py#L355\" rel=\"nofollow noreferrer\">LlamaModel</a>, it appears that passing <code>inputs_embeds</code> does not suppress the positional embedding step, since <code>inputs_embeds</code> is assigned to <code>hidden_states</code> which is then passed to <code>self.rotary_emb()</code>. The only difference from passing <code>input_ids</code> is that <code>self.embed_tokens()</code> isn't called.</p>\n<p>Presumably, other models are implemented similarly.</p>\n",
      "owner": {
        "link": "https://stackoverflow.com/users/3750874/algorithmic-canary",
        "user_id": 3750874,
        "user_type": "registered",
        "account_id": 4627519,
        "reputation": 819,
        "display_name": "Algorithmic Canary",
        "profile_image": "https://www.gravatar.com/avatar/faf681a6d1d92e9f4fca8eabb2c5d03f?s=256&d=identicon&r=PG&f=y&so-version=2"
      },
      "score": 2,
      "answer_id": 79946519,
      "is_accepted": false,
      "question_id": 79946518,
      "creation_date": 1779751197,
      "content_license": "CC BY-SA 4.0",
      "last_activity_date": 1779751197
    }
  ],
  "tags_raw": [
    "huggingface-transformers",
    "large-language-model"
  ],
  "stats_raw": {
    "score": 0,
    "view_count": 53,
    "is_answered": true,
    "answer_count": 1,
    "creation_date": 1779750483,
    "last_edit_date": 1779774273,
    "accepted_answer_id": null,
    "last_activity_date": 1779774273
  },
  "selection_meta": {
    "site": "stackoverflow",
    "api_wrapper": {
      "backoff": null,
      "has_more": true,
      "page_size": 8,
      "quota_max": 300,
      "quota_remaining": 204
    },
    "answer_fetch": {
      "backoff": null,
      "has_more": false,
      "answers_fetched": 1,
      "quota_remaining": 272,
      "answer_page_size": 3
    },
    "snapshot_version": "stackoverflow_question_v1",
    "selection_strategy": "tag_whitelist_unanswered_high_score_recent_active"
  },
  "created_at": "2026-05-29T22:02:14.131Z",
  "updated_at": "2026-05-29T22:02:14.131Z"
}