By reading the source code for LlamaModel , it appears that passing inputs_embeds does not suppress the positional embedding step, since inputs_embeds is assigned to hidden_states which is then passed to self.rotary_emb() . The only difference from passing input_ids is that self.embed_tokens() isn't called. Presumably, other models are implemented similarly.
痛点为 AI 基于上游原始证据的初步提炼;未包含额外中国市场检索。
用户在使用 Hugging Face 的 LlamaForCausalLM 模型时,尝试将自定义的 logits 转换为嵌入向量作为输入,但担心这样做会丢失模型所需的位置编码(如 RoPE)。用户的核心任务是在保持位置信息的前提下,用自定义嵌入替代标准 token 嵌入。现有流程中,用户需要手动将 logits 通过 softmax 和矩阵乘法转换为嵌入,但不确定这一操作是否绕过了模型内部的位置编码步骤。这种不确定性导致用户无法确认输入的正确性,可能造成模型输出错误或训练失败,需要花费额外时间阅读源码或等待社区解答,增加了开发调试的负担。
Stack Overflow question
Using the Hugging Face transformer library, I want to feed logits (i.e., a vector such that if the softmax is taken would have the probability of each token) as the input to a model. Currently, I convert the logits to an embedding with the following code: token_probabilities = F.softmax(logits,dim=-1) embeddings = token_probabilities @ model.embed_tokens.weight out = model(inputs_embed = embeddings, attention_mask=attention_mask, labels=labels) However, I am concerned that this throws away the positional encoding that the model needs. Does inputs_embed bypass the part of the model where positional data is attached? (for example, when the RoPE is added). Ideally, this would be answered with the general interface that Hugging Face models use, but I specifically care about LlamaForCausalLM if there is no general answer.
Question details
- View count
- 47
- Answer count
- 1
- Last activity
- 2026/05/26
Answers
源数据· Raw Archive
- source
- Stack Overflow
- upstream_source
- stackoverflow
- upstream_item_id
- 79946518
- daily_ranking_item_id
- 4bf91f36-2c7f-426c-a7a2-42cc123fcd18
- rank_date
- 2026-05-28
- rank
- 6
- name
- Does inputs_embeds bypass the positional encoding step of the model?
- tagline
- huggingface-transformers, large-language-model
- description
- Using the Hugging Face transformer library, I want to feed logits (i.e., a vector such that if the softmax is taken would have the probability of each token) as the input to a model. Currently, I convert the logits to an embedding with the following code: token_probabilities = F.softmax(logits,dim=-1) embeddings = token_probabilities @ model.embed_tokens.weight out = model(inputs_embed = embeddings, attention_mask=attention_mask, labels=labels) However, I am concerned that this throws away the positional encoding that the model needs. Does inputs_embed bypass the part of the model where positional data is attached? (for example, when the RoPE is added). Ideally, this would be answered with the general interface that Hugging Face models use, but I specifically care about LlamaForCausalLM if there is no general answer.
- votes_count
- -1
- comments_count
- 1
- created_at_on_source
- 2026-05-25T23:08:03.000Z
{
"stackoverflow": {
"score": -1,
"view_count": 47,
"is_answered": true,
"top_answers": [
{
"body": "By reading the source code for LlamaModel , it appears that passing inputs_embeds does not suppress the positional embedding step, since inputs_embeds is assigned to hidden_states which is then passed to self.rotary_emb() . The only difference from passing input_ids is that self.embed_tokens() isn't called. Presumably, other models are implemented similarly.",
"score": 1,
"answer_id": 79946519,
"is_accepted": false
}
],
"answer_count": 1,
"accepted_answer_id": null,
"last_activity_date": 1779774273
}
}{
"stats": {
"score": -1,
"view_count": 47,
"is_answered": true,
"answer_count": 1,
"creation_date": 1779750483,
"last_edit_date": 1779774273,
"accepted_answer_id": null,
"last_activity_date": 1779774273
},
"api_wrapper": {
"backoff": null,
"has_more": true,
"page_size": 8,
"quota_max": 300,
"quota_remaining": 205
},
"question_id": 79946518,
"answer_fetch": {
"has_more": false,
"answers_fetched": 1,
"answer_page_size": 3
},
"snapshot_version": "stackoverflow_question_v1"
}{
"id": "b35dd526-66ce-42cf-9f62-affccc46ef3d",
"daily_ranking_item_id": "4bf91f36-2c7f-426c-a7a2-42cc123fcd18",
"source": "stackoverflow",
"external_id": "79946518",
"fetched_at": "2026-05-27T22:01:45.075Z",
"question_raw": {
"body": "<p>Using the Hugging Face transformer library, I want to feed logits (i.e., a vector such that if the softmax is taken would have the probability of each token) as the input to a model.</p>\n<p>Currently, I convert the logits to an embedding with the following code:</p>\n<pre class=\"lang-py prettyprint-override\"><code>token_probabilities = F.softmax(logits,dim=-1)\nembeddings = token_probabilities @ model.embed_tokens.weight\nout = model(inputs_embed = embeddings, attention_mask=attention_mask, labels=labels)\n</code></pre>\n<p>However, I am concerned that this throws away the positional encoding that the model needs. Does inputs_embed bypass the part of the model where positional data is attached? (for example, when the RoPE is added).</p>\n<p>Ideally, this would be answered with the general interface that Hugging Face models use, but I specifically care about LlamaForCausalLM if there is no general answer.</p>\n",
"link": "https://stackoverflow.com/questions/79946518/does-inputs-embeds-bypass-the-positional-encoding-step-of-the-model",
"tags": [
"huggingface-transformers",
"large-language-model"
],
"owner": {
"link": "https://stackoverflow.com/users/3750874/algorithmic-canary",
"user_id": 3750874,
"user_type": "registered",
"account_id": 4627519,
"reputation": 799,
"display_name": "Algorithmic Canary",
"profile_image": "https://www.gravatar.com/avatar/faf681a6d1d92e9f4fca8eabb2c5d03f?s=256&d=identicon&r=PG&f=y&so-version=2"
},
"score": -1,
"title": "Does inputs_embeds bypass the positional encoding step of the model?",
"view_count": 47,
"is_answered": true,
"question_id": 79946518,
"answer_count": 1,
"creation_date": 1779750483,
"last_edit_date": 1779774273,
"content_license": "CC BY-SA 4.0",
"last_activity_date": 1779774273
},
"answers_raw": [
{
"body": "<p>By reading the source code for <a href=\"https://github.com/huggingface/transformers/blob/ece1ea0635367989ad4dfab0c084bcc57e5d897b/src/transformers/models/llama/modeling_llama.py#L355\" rel=\"nofollow noreferrer\">LlamaModel</a>, it appears that passing <code>inputs_embeds</code> does not suppress the positional embedding step, since <code>inputs_embeds</code> is assigned to <code>hidden_states</code> which is then passed to <code>self.rotary_emb()</code>. The only difference from passing <code>input_ids</code> is that <code>self.embed_tokens()</code> isn't called.</p>\n<p>Presumably, other models are implemented similarly.</p>\n",
"owner": {
"link": "https://stackoverflow.com/users/3750874/algorithmic-canary",
"user_id": 3750874,
"user_type": "registered",
"account_id": 4627519,
"reputation": 799,
"display_name": "Algorithmic Canary",
"profile_image": "https://www.gravatar.com/avatar/faf681a6d1d92e9f4fca8eabb2c5d03f?s=256&d=identicon&r=PG&f=y&so-version=2"
},
"score": 1,
"answer_id": 79946519,
"is_accepted": false,
"question_id": 79946518,
"creation_date": 1779751197,
"content_license": "CC BY-SA 4.0",
"last_activity_date": 1779751197
}
],
"tags_raw": [
"huggingface-transformers",
"large-language-model"
],
"stats_raw": {
"score": -1,
"view_count": 47,
"is_answered": true,
"answer_count": 1,
"creation_date": 1779750483,
"last_edit_date": 1779774273,
"accepted_answer_id": null,
"last_activity_date": 1779774273
},
"selection_meta": {
"site": "stackoverflow",
"api_wrapper": {
"backoff": null,
"has_more": true,
"page_size": 8,
"quota_max": 300,
"quota_remaining": 205
},
"answer_fetch": {
"backoff": null,
"has_more": false,
"answers_fetched": 1,
"quota_remaining": 177,
"answer_page_size": 3
},
"snapshot_version": "stackoverflow_question_v1",
"selection_strategy": "tag_whitelist_unanswered_high_score_recent_active"
},
"created_at": "2026-05-27T22:01:45.496Z",
"updated_at": "2026-05-27T22:01:45.496Z"
}