Do some profiling. At the minimum you can add print statements when each block starts so you will have an exact time of how long each part takes and then you can make a decision based on what you've learned.
痛点为 AI 基于上游原始证据的初步提炼;未包含额外中国市场检索。
用户构建了一个基于LLM的隐私政策分析工具,但每份政策处理耗时25-37秒,导致批量作业效率极低。核心痛点在于长文本分析场景下,LLM的token生成延迟、政策获取与解析步骤以及代码架构都可能成为瓶颈,但用户无法快速定位具体原因。这种性能问题使得工具难以实际应用于需要快速评估大量隐私政策的场景(如合规审计、竞品分析),造成时间浪费和决策延迟。用户需要优化管道或采用最佳实践来加速处理,但目前缺乏明确的优化方向。
Stack Overflow question
I created a tool called Privacy Policy Analyzer, which uses LLMs to evaluate real-world privacy policies. It scores policies from platforms like GitHub, TikTok, and Facebook based on criteria such as data retention, cross-border data transfer, transparency, and user rights. The analyzer provides a strengths and risks breakdown along with an overall score for each policy. You can check out the repository here: https://github.com/myz21/privacy-policy-analyzer And see an example of the analysis results here: https://github.com/myz21/privacy-policy-analyzer/blob/main/RESULTS.md The main issue: While the results are detailed, the analysis process is pretty slow. Each policy takes around 25 to 37 seconds to process. This feels way too long for batch jobs, and I am trying to figure out if the primary bottleneck is LLM token generation latency, the policy fetching and parsing step, or my own code architecture. Has anyone tackled similar performance issues in LLM-powered document analyzers? Any tips for speeding things up, optimizing the pipeline, or best practices for handling long-form text analysis efficiently? All suggestions and critiques are appreciated!
Question details
- View count
- 76
- Answer count
- 4
- Last activity
- 2026/05/26
Answers
Isnt RESULTS.md enough for the analysis?
No. We have no idea what those times include. Is that only LLM time or that plus 10 other things? Also, links to external sites are fine for extra information but all the relevant info needs to go in your question including code and results. In a year, those links may be dead and much of the value of your question will be lost. That said, don't just copy and paste the entirety... you need to figure out what's relevant and post that. You need to do a lot of the heavy lifting and then come back and give us a summary of what you've tried, what worked or didn't work, etc.
源数据· Raw Archive
- source
- Stack Overflow
- upstream_source
- stackoverflow
- upstream_item_id
- 79946304
- daily_ranking_item_id
- 2c286996-fc1d-4a21-8f56-e4253cc759f8
- rank_date
- 2026-05-29
- rank
- 4
- name
- Built a Privacy Policy Analyzer with LLMs, but it's slow
- tagline
- python, selenium-webdriver, langchain, large-language-model, privacy-policy
- description
- I created a tool called Privacy Policy Analyzer, which uses LLMs to evaluate real-world privacy policies. It scores policies from platforms like GitHub, TikTok, and Facebook based on criteria such as data retention, cross-border data transfer, transparency, and user rights. The analyzer provides a strengths and risks breakdown along with an overall score for each policy. You can check out the repository here: https://github.com/myz21/privacy-policy-analyzer And see an example of the analysis results here: https://github.com/myz21/privacy-policy-analyzer/blob/main/RESULTS.md The main issue: While the results are detailed, the analysis process is pretty slow. Each policy takes around 25 to 37 seconds to process. This feels way too long for batch jobs, and I am trying to figure out if the primary bottleneck is LLM token generation latency, the policy fetching and parsing step, or my own code architecture. Has anyone tackled similar performance issues in LLM-powered document analyzers? Any tips for speeding things up, optimizing the pipeline, or best practices for handling long-form text analysis efficiently? All suggestions and critiques are appreciated!
- votes_count
- 1
- comments_count
- 4
- created_at_on_source
- 2026-05-25T13:02:59.000Z
{
"stackoverflow": {
"score": 1,
"view_count": 76,
"is_answered": false,
"top_answers": [
{
"body": "Do some profiling. At the minimum you can add print statements when each block starts so you will have an exact time of how long each part takes and then you can make a decision based on what you've learned.",
"score": 0,
"answer_id": 79946322,
"is_accepted": false
},
{
"body": "Isnt RESULTS.md enough for the analysis?",
"score": 0,
"answer_id": 79946398,
"is_accepted": false
},
{
"body": "No. We have no idea what those times include. Is that only LLM time or that plus 10 other things? Also, links to external sites are fine for extra information but all the relevant info needs to go in your question including code and results. In a year, those links may be dead and much of the value of your question will be lost. That said, don't just copy and paste the entirety... you need to figure out what's relevant and post that. You need to do a lot of the heavy lifting and then come back and give us a summary of what you've tried, what worked or didn't work, etc.",
"score": 0,
"answer_id": 79946483,
"is_accepted": false
}
],
"answer_count": 4,
"accepted_answer_id": null,
"last_activity_date": 1779810999
}
}{
"stats": {
"score": 1,
"view_count": 76,
"is_answered": false,
"answer_count": 4,
"creation_date": 1779714179,
"last_edit_date": null,
"accepted_answer_id": null,
"last_activity_date": 1779810999
},
"api_wrapper": {
"backoff": null,
"has_more": true,
"page_size": 8,
"quota_max": 300,
"quota_remaining": 299
},
"question_id": 79946304,
"answer_fetch": {
"has_more": true,
"answers_fetched": 3,
"answer_page_size": 3
},
"snapshot_version": "stackoverflow_question_v1"
}{
"id": "97ad5361-ae74-4135-8ce3-eee3b2d557fa",
"daily_ranking_item_id": "2c286996-fc1d-4a21-8f56-e4253cc759f8",
"source": "stackoverflow",
"external_id": "79946304",
"fetched_at": "2026-05-28T22:02:15.509Z",
"question_raw": {
"body": "<p>I created a tool called Privacy Policy Analyzer, which uses LLMs to evaluate real-world privacy policies. It scores policies from platforms like GitHub, TikTok, and Facebook based on criteria such as data retention, cross-border data transfer, transparency, and user rights. The analyzer provides a strengths and risks breakdown along with an overall score for each policy. You can check out the repository here: <a href=\"https://github.com/myz21/privacy-policy-analyzer\" rel=\"nofollow noreferrer\">https://github.com/myz21/privacy-policy-analyzer</a> And see an example of the analysis results here: <a href=\"https://github.com/myz21/privacy-policy-analyzer/blob/main/RESULTS.md\" rel=\"nofollow noreferrer\">https://github.com/myz21/privacy-policy-analyzer/blob/main/RESULTS.md</a></p>\n<p>The main issue: While the results are detailed, the analysis process is pretty slow. Each policy takes around 25 to 37 seconds to process. This feels way too long for batch jobs, and I am trying to figure out if the primary bottleneck is LLM token generation latency, the policy fetching and parsing step, or my own code architecture. Has anyone tackled similar performance issues in LLM-powered document analyzers? Any tips for speeding things up, optimizing the pipeline, or best practices for handling long-form text analysis efficiently?</p>\n<p>All suggestions and critiques are appreciated!</p>\n",
"link": "https://stackoverflow.com/questions/79946304/built-a-privacy-policy-analyzer-with-llms-but-its-slow",
"tags": [
"python",
"selenium-webdriver",
"langchain",
"large-language-model",
"privacy-policy"
],
"owner": {
"link": "https://stackoverflow.com/users/32765077/al-gebra",
"user_id": 32765077,
"user_type": "registered",
"account_id": 30685845,
"reputation": 1,
"display_name": "Al-Gebra",
"profile_image": "https://i.sstatic.net/juYO2.jpg?s=256"
},
"score": 1,
"title": "Built a Privacy Policy Analyzer with LLMs, but it's slow",
"view_count": 76,
"is_answered": false,
"question_id": 79946304,
"answer_count": 4,
"creation_date": 1779714179,
"content_license": "CC BY-SA 4.0",
"last_activity_date": 1779810999
},
"answers_raw": [
{
"body": "<p>Do some profiling. At the minimum you can add print statements when each block starts so you will have an exact time of how long each part takes and then you can make a decision based on what you've learned.</p>\n",
"owner": {
"link": "https://stackoverflow.com/users/2386774/jeffc",
"user_id": 2386774,
"user_type": "registered",
"account_id": 2772450,
"reputation": 26577,
"display_name": "JeffC",
"profile_image": "https://www.gravatar.com/avatar/dea7da142cb7e85d5d5a8576e2625431?s=256&d=identicon&r=PG"
},
"score": 0,
"answer_id": 79946322,
"is_accepted": false,
"question_id": 79946304,
"creation_date": 1779718109,
"content_license": "CC BY-SA 4.0",
"last_activity_date": 1779718109
},
{
"body": "<p>Isnt <a href=\"https://github.com/myz21/privacy-policy-analyzer/blob/main/RESULTS.md\" rel=\"nofollow noreferrer\">RESULTS.md</a> enough for the analysis?</p>\n",
"owner": {
"link": "https://stackoverflow.com/users/32765077/al-gebra",
"user_id": 32765077,
"user_type": "registered",
"account_id": 30685845,
"reputation": 1,
"display_name": "Al-Gebra",
"profile_image": "https://i.sstatic.net/juYO2.jpg?s=256"
},
"score": 0,
"answer_id": 79946398,
"is_accepted": false,
"question_id": 79946304,
"creation_date": 1779729166,
"content_license": "CC BY-SA 4.0",
"last_activity_date": 1779729166
},
{
"body": "<p>No. We have no idea what those times include. Is that only LLM time or that plus 10 other things? Also, links to external sites are fine for extra information but all the relevant info needs to go in your question including code and results. In a year, those links may be dead and much of the value of your question will be lost.</p>\n<p>That said, don't just copy and paste the entirety... you need to figure out what's relevant and post that. You need to do a lot of the heavy lifting and then come back and give us a summary of what you've tried, what worked or didn't work, etc.</p>\n",
"owner": {
"link": "https://stackoverflow.com/users/2386774/jeffc",
"user_id": 2386774,
"user_type": "registered",
"account_id": 2772450,
"reputation": 26577,
"display_name": "JeffC",
"profile_image": "https://www.gravatar.com/avatar/dea7da142cb7e85d5d5a8576e2625431?s=256&d=identicon&r=PG"
},
"score": 0,
"answer_id": 79946483,
"is_accepted": false,
"question_id": 79946304,
"creation_date": 1779744808,
"content_license": "CC BY-SA 4.0",
"last_activity_date": 1779744808
}
],
"tags_raw": [
"python",
"selenium-webdriver",
"langchain",
"large-language-model",
"privacy-policy"
],
"stats_raw": {
"score": 1,
"view_count": 76,
"is_answered": false,
"answer_count": 4,
"creation_date": 1779714179,
"last_edit_date": null,
"accepted_answer_id": null,
"last_activity_date": 1779810999
},
"selection_meta": {
"site": "stackoverflow",
"api_wrapper": {
"backoff": null,
"has_more": true,
"page_size": 8,
"quota_max": 300,
"quota_remaining": 299
},
"answer_fetch": {
"backoff": null,
"has_more": true,
"answers_fetched": 3,
"quota_remaining": 267,
"answer_page_size": 3
},
"snapshot_version": "stackoverflow_question_v1",
"selection_strategy": "tag_whitelist_unanswered_high_score_recent_active"
},
"created_at": "2026-05-28T22:02:15.658Z",
"updated_at": "2026-05-28T22:02:15.658Z"
}