By reading the source code for LlamaModel , it appears that passing inputs_embeds does not suppress the positional embedding step, since inputs_embeds is assigned to hidden_states which is then passed to self.rotary_emb() . The only difference from passing input_ids is that self.embed_tokens() isn't called. Presumably, other models are implemented similarly.
痛点为 AI 基于上游原始证据的初步提炼;未包含额外中国市场检索。
用户在使用 Hugging Face 的 LlamaForCausalLM 模型时,希望将自定义的 logits 向量(经 softmax 后与嵌入权重矩阵相乘得到嵌入)作为模型输入,但担心这样做会丢失模型所需的位置编码(如 RoPE)。用户的核心任务是实现一种非标准输入方式(绕过 token 嵌入层直接传入嵌入向量),同时保留位置编码信息。现有流程中,用户需要手动将 logits 转换为嵌入,但不确定这一操作是否跳过了模型内部的位置编码步骤,导致模型可能无法正确理解序列中 token 的相对位置。这种不确定性造成了明显的摩擦:用户无法确认自己的实现是否正确,可能需要进行大量实验验证,或者不得不阅读模型源代码来排查,这增加了开发时间和认知负担。如果位置编码确实被跳过,模型输出将失去位置信息,严重影响生成质量,但用户缺乏明确的文档或接口说明来指导这一操作。
Stack Overflow question
Using the Hugging Face transformer library, I want to feed logits (i.e., a vector such that if the softmax is taken would have the probability of each token) as the input to a model. Currently, I convert the logits to an embedding with the following code: token_probabilities = F.softmax(logits,dim=-1) embeddings = token_probabilities @ model.embed_tokens.weight out = model(inputs_embed = embeddings, attention_mask=attention_mask, labels=labels) However, I am concerned that this throws away the positional encoding that the model needs. Does inputs_embed bypass the part of the model where positional data is attached? (for example, when the RoPE is added). Ideally, this would be answered with the general interface that Hugging Face models use, but I specifically care about LlamaForCausalLM if there is no general answer.
Question details
- View count
- 53
- Answer count
- 1
- Last activity
- 2026/05/26
Answers
源数据· Raw Archive
- source
- Stack Overflow
- upstream_source
- stackoverflow
- upstream_item_id
- 79946518
- daily_ranking_item_id
- 462f2ae6-7274-4d27-ba2f-fe9fd855ecc2
- rank_date
- 2026-05-30
- rank
- 6
- name
- Does inputs_embeds bypass the positional encoding step of the model?
- tagline
- huggingface-transformers, large-language-model
- description
- Using the Hugging Face transformer library, I want to feed logits (i.e., a vector such that if the softmax is taken would have the probability of each token) as the input to a model. Currently, I convert the logits to an embedding with the following code: token_probabilities = F.softmax(logits,dim=-1) embeddings = token_probabilities @ model.embed_tokens.weight out = model(inputs_embed = embeddings, attention_mask=attention_mask, labels=labels) However, I am concerned that this throws away the positional encoding that the model needs. Does inputs_embed bypass the part of the model where positional data is attached? (for example, when the RoPE is added). Ideally, this would be answered with the general interface that Hugging Face models use, but I specifically care about LlamaForCausalLM if there is no general answer.
- votes_count
- 0
- comments_count
- 1
- created_at_on_source
- 2026-05-25T23:08:03.000Z
{
"stackoverflow": {
"score": 0,
"view_count": 53,
"is_answered": true,
"top_answers": [
{
"body": "By reading the source code for LlamaModel , it appears that passing inputs_embeds does not suppress the positional embedding step, since inputs_embeds is assigned to hidden_states which is then passed to self.rotary_emb() . The only difference from passing input_ids is that self.embed_tokens() isn't called. Presumably, other models are implemented similarly.",
"score": 2,
"answer_id": 79946519,
"is_accepted": false
}
],
"answer_count": 1,
"accepted_answer_id": null,
"last_activity_date": 1779774273
}
}{
"stats": {
"score": 0,
"view_count": 53,
"is_answered": true,
"answer_count": 1,
"creation_date": 1779750483,
"last_edit_date": 1779774273,
"accepted_answer_id": null,
"last_activity_date": 1779774273
},
"api_wrapper": {
"backoff": null,
"has_more": true,
"page_size": 8,
"quota_max": 300,
"quota_remaining": 204
},
"question_id": 79946518,
"answer_fetch": {
"has_more": false,
"answers_fetched": 1,
"answer_page_size": 3
},
"snapshot_version": "stackoverflow_question_v1"
}{
"id": "302f861a-8d59-4ddf-b38a-60f5cd1e1ff0",
"daily_ranking_item_id": "462f2ae6-7274-4d27-ba2f-fe9fd855ecc2",
"source": "stackoverflow",
"external_id": "79946518",
"fetched_at": "2026-05-29T22:02:13.965Z",
"question_raw": {
"body": "<p>Using the Hugging Face transformer library, I want to feed logits (i.e., a vector such that if the softmax is taken would have the probability of each token) as the input to a model.</p>\n<p>Currently, I convert the logits to an embedding with the following code:</p>\n<pre class=\"lang-py prettyprint-override\"><code>token_probabilities = F.softmax(logits,dim=-1)\nembeddings = token_probabilities @ model.embed_tokens.weight\nout = model(inputs_embed = embeddings, attention_mask=attention_mask, labels=labels)\n</code></pre>\n<p>However, I am concerned that this throws away the positional encoding that the model needs. Does inputs_embed bypass the part of the model where positional data is attached? (for example, when the RoPE is added).</p>\n<p>Ideally, this would be answered with the general interface that Hugging Face models use, but I specifically care about LlamaForCausalLM if there is no general answer.</p>\n",
"link": "https://stackoverflow.com/questions/79946518/does-inputs-embeds-bypass-the-positional-encoding-step-of-the-model",
"tags": [
"huggingface-transformers",
"large-language-model"
],
"owner": {
"link": "https://stackoverflow.com/users/3750874/algorithmic-canary",
"user_id": 3750874,
"user_type": "registered",
"account_id": 4627519,
"reputation": 819,
"display_name": "Algorithmic Canary",
"profile_image": "https://www.gravatar.com/avatar/faf681a6d1d92e9f4fca8eabb2c5d03f?s=256&d=identicon&r=PG&f=y&so-version=2"
},
"score": 0,
"title": "Does inputs_embeds bypass the positional encoding step of the model?",
"view_count": 53,
"is_answered": true,
"question_id": 79946518,
"answer_count": 1,
"creation_date": 1779750483,
"last_edit_date": 1779774273,
"content_license": "CC BY-SA 4.0",
"last_activity_date": 1779774273
},
"answers_raw": [
{
"body": "<p>By reading the source code for <a href=\"https://github.com/huggingface/transformers/blob/ece1ea0635367989ad4dfab0c084bcc57e5d897b/src/transformers/models/llama/modeling_llama.py#L355\" rel=\"nofollow noreferrer\">LlamaModel</a>, it appears that passing <code>inputs_embeds</code> does not suppress the positional embedding step, since <code>inputs_embeds</code> is assigned to <code>hidden_states</code> which is then passed to <code>self.rotary_emb()</code>. The only difference from passing <code>input_ids</code> is that <code>self.embed_tokens()</code> isn't called.</p>\n<p>Presumably, other models are implemented similarly.</p>\n",
"owner": {
"link": "https://stackoverflow.com/users/3750874/algorithmic-canary",
"user_id": 3750874,
"user_type": "registered",
"account_id": 4627519,
"reputation": 819,
"display_name": "Algorithmic Canary",
"profile_image": "https://www.gravatar.com/avatar/faf681a6d1d92e9f4fca8eabb2c5d03f?s=256&d=identicon&r=PG&f=y&so-version=2"
},
"score": 2,
"answer_id": 79946519,
"is_accepted": false,
"question_id": 79946518,
"creation_date": 1779751197,
"content_license": "CC BY-SA 4.0",
"last_activity_date": 1779751197
}
],
"tags_raw": [
"huggingface-transformers",
"large-language-model"
],
"stats_raw": {
"score": 0,
"view_count": 53,
"is_answered": true,
"answer_count": 1,
"creation_date": 1779750483,
"last_edit_date": 1779774273,
"accepted_answer_id": null,
"last_activity_date": 1779774273
},
"selection_meta": {
"site": "stackoverflow",
"api_wrapper": {
"backoff": null,
"has_more": true,
"page_size": 8,
"quota_max": 300,
"quota_remaining": 204
},
"answer_fetch": {
"backoff": null,
"has_more": false,
"answers_fetched": 1,
"quota_remaining": 272,
"answer_page_size": 3
},
"snapshot_version": "stackoverflow_question_v1",
"selection_strategy": "tag_whitelist_unanswered_high_score_recent_active"
},
"created_at": "2026-05-29T22:02:14.131Z",
"updated_at": "2026-05-29T22:02:14.131Z"
}