Do some profiling. At the minimum you can add print statements when each block starts so you will have an exact time of how long each part takes and then you can make a decision based on what you've learned.
痛点为 AI 基于上游原始证据的初步提炼;未包含额外中国市场检索。
用户构建了一个基于LLM的隐私政策分析工具,但每份政策的处理时间长达25-37秒,导致批量作业效率低下。用户不确定瓶颈是LLM生成延迟、政策获取解析还是代码架构问题,说明现有流程缺乏清晰的性能诊断手段。这种缓慢的响应速度使得工具难以用于大规模或实时分析场景,可能迫使开发者在准确性和速度之间做出妥协,增加了迭代优化的时间成本。
Stack Overflow question
I created a tool called Privacy Policy Analyzer, which uses LLMs to evaluate real-world privacy policies. It scores policies from platforms like GitHub, TikTok, and Facebook based on criteria such as data retention, cross-border data transfer, transparency, and user rights. The analyzer provides a strengths and risks breakdown along with an overall score for each policy. You can check out the repository here: https://github.com/myz21/privacy-policy-analyzer And see an example of the analysis results here: https://github.com/myz21/privacy-policy-analyzer/blob/main/RESULTS.md The main issue: While the results are detailed, the analysis process is pretty slow. Each policy takes around 25 to 37 seconds to process. This feels way too long for batch jobs, and I am trying to figure out if the primary bottleneck is LLM token generation latency, the policy fetching and parsing step, or my own code architecture. Has anyone tackled similar performance issues in LLM-powered document analyzers? Any tips for speeding things up, optimizing the pipeline, or best practices for handling long-form text analysis efficiently? All suggestions and critiques are appreciated!
Question details
- View count
- 69
- Answer count
- 4
- Last activity
- 2026/05/26
Answers
Isnt RESULTS.md enough for the analysis?
No. We have no idea what those times include. Is that only LLM time or that plus 10 other things? Also, links to external sites are fine for extra information but all the relevant info needs to go in your question including code and results. In a year, those links may be dead and much of the value of your question will be lost. That said, don't just copy and paste the entirety... you need to figure out what's relevant and post that. You need to do a lot of the heavy lifting and then come back and give us a summary of what you've tried, what worked or didn't work, etc.
源数据· Raw Archive
- source
- Stack Overflow
- upstream_source
- stackoverflow
- upstream_item_id
- 79946304
- daily_ranking_item_id
- bbd19d40-b1ef-4918-b36a-5d6326d2a567
- rank_date
- 2026-05-27
- rank
- 4
- name
- Built a Privacy Policy Analyzer with LLMs, but it's slow
- tagline
- python, selenium-webdriver, langchain, large-language-model, privacy-policy
- description
- I created a tool called Privacy Policy Analyzer, which uses LLMs to evaluate real-world privacy policies. It scores policies from platforms like GitHub, TikTok, and Facebook based on criteria such as data retention, cross-border data transfer, transparency, and user rights. The analyzer provides a strengths and risks breakdown along with an overall score for each policy. You can check out the repository here: https://github.com/myz21/privacy-policy-analyzer And see an example of the analysis results here: https://github.com/myz21/privacy-policy-analyzer/blob/main/RESULTS.md The main issue: While the results are detailed, the analysis process is pretty slow. Each policy takes around 25 to 37 seconds to process. This feels way too long for batch jobs, and I am trying to figure out if the primary bottleneck is LLM token generation latency, the policy fetching and parsing step, or my own code architecture. Has anyone tackled similar performance issues in LLM-powered document analyzers? Any tips for speeding things up, optimizing the pipeline, or best practices for handling long-form text analysis efficiently? All suggestions and critiques are appreciated!
- votes_count
- 1
- comments_count
- 4
- created_at_on_source
- 2026-05-25T13:02:59.000Z
{
"stackoverflow": {
"score": 1,
"view_count": 69,
"is_answered": false,
"top_answers": [
{
"body": "Do some profiling. At the minimum you can add print statements when each block starts so you will have an exact time of how long each part takes and then you can make a decision based on what you've learned.",
"score": 0,
"answer_id": 79946322,
"is_accepted": false
},
{
"body": "Isnt RESULTS.md enough for the analysis?",
"score": 0,
"answer_id": 79946398,
"is_accepted": false
},
{
"body": "No. We have no idea what those times include. Is that only LLM time or that plus 10 other things? Also, links to external sites are fine for extra information but all the relevant info needs to go in your question including code and results. In a year, those links may be dead and much of the value of your question will be lost. That said, don't just copy and paste the entirety... you need to figure out what's relevant and post that. You need to do a lot of the heavy lifting and then come back and give us a summary of what you've tried, what worked or didn't work, etc.",
"score": 0,
"answer_id": 79946483,
"is_accepted": false
}
],
"answer_count": 4,
"accepted_answer_id": null,
"last_activity_date": 1779810999
}
}{
"stats": {
"score": 1,
"view_count": 69,
"is_answered": false,
"answer_count": 4,
"creation_date": 1779714179,
"last_edit_date": null,
"accepted_answer_id": null,
"last_activity_date": 1779810999
},
"api_wrapper": {
"backoff": null,
"has_more": true,
"page_size": 8,
"quota_max": 300,
"quota_remaining": 296
},
"question_id": 79946304,
"answer_fetch": {
"has_more": true,
"answers_fetched": 3,
"answer_page_size": 3
},
"snapshot_version": "stackoverflow_question_v1"
}{
"id": "229c53b1-dd83-4e96-9783-e075c6748fe4",
"daily_ranking_item_id": "bbd19d40-b1ef-4918-b36a-5d6326d2a567",
"source": "stackoverflow",
"external_id": "79946304",
"fetched_at": "2026-05-26T22:02:05.726Z",
"question_raw": {
"body": "<p>I created a tool called Privacy Policy Analyzer, which uses LLMs to evaluate real-world privacy policies. It scores policies from platforms like GitHub, TikTok, and Facebook based on criteria such as data retention, cross-border data transfer, transparency, and user rights. The analyzer provides a strengths and risks breakdown along with an overall score for each policy. You can check out the repository here: <a href=\"https://github.com/myz21/privacy-policy-analyzer\" rel=\"nofollow noreferrer\">https://github.com/myz21/privacy-policy-analyzer</a> And see an example of the analysis results here: <a href=\"https://github.com/myz21/privacy-policy-analyzer/blob/main/RESULTS.md\" rel=\"nofollow noreferrer\">https://github.com/myz21/privacy-policy-analyzer/blob/main/RESULTS.md</a></p>\n<p>The main issue: While the results are detailed, the analysis process is pretty slow. Each policy takes around 25 to 37 seconds to process. This feels way too long for batch jobs, and I am trying to figure out if the primary bottleneck is LLM token generation latency, the policy fetching and parsing step, or my own code architecture. Has anyone tackled similar performance issues in LLM-powered document analyzers? Any tips for speeding things up, optimizing the pipeline, or best practices for handling long-form text analysis efficiently?</p>\n<p>All suggestions and critiques are appreciated!</p>\n",
"link": "https://stackoverflow.com/questions/79946304/built-a-privacy-policy-analyzer-with-llms-but-its-slow",
"tags": [
"python",
"selenium-webdriver",
"langchain",
"large-language-model",
"privacy-policy"
],
"owner": {
"link": "https://stackoverflow.com/users/32765077/al-gebra",
"user_id": 32765077,
"user_type": "registered",
"account_id": 30685845,
"reputation": 1,
"display_name": "Al-Gebra",
"profile_image": "https://i.sstatic.net/juYO2.jpg?s=256"
},
"score": 1,
"title": "Built a Privacy Policy Analyzer with LLMs, but it's slow",
"view_count": 69,
"is_answered": false,
"question_id": 79946304,
"answer_count": 4,
"creation_date": 1779714179,
"content_license": "CC BY-SA 4.0",
"last_activity_date": 1779810999
},
"answers_raw": [
{
"body": "<p>Do some profiling. At the minimum you can add print statements when each block starts so you will have an exact time of how long each part takes and then you can make a decision based on what you've learned.</p>\n",
"owner": {
"link": "https://stackoverflow.com/users/2386774/jeffc",
"user_id": 2386774,
"user_type": "registered",
"account_id": 2772450,
"reputation": 26577,
"display_name": "JeffC",
"profile_image": "https://www.gravatar.com/avatar/dea7da142cb7e85d5d5a8576e2625431?s=256&d=identicon&r=PG"
},
"score": 0,
"answer_id": 79946322,
"is_accepted": false,
"question_id": 79946304,
"creation_date": 1779718109,
"content_license": "CC BY-SA 4.0",
"last_activity_date": 1779718109
},
{
"body": "<p>Isnt <a href=\"https://github.com/myz21/privacy-policy-analyzer/blob/main/RESULTS.md\" rel=\"nofollow noreferrer\">RESULTS.md</a> enough for the analysis?</p>\n",
"owner": {
"link": "https://stackoverflow.com/users/32765077/al-gebra",
"user_id": 32765077,
"user_type": "registered",
"account_id": 30685845,
"reputation": 1,
"display_name": "Al-Gebra",
"profile_image": "https://i.sstatic.net/juYO2.jpg?s=256"
},
"score": 0,
"answer_id": 79946398,
"is_accepted": false,
"question_id": 79946304,
"creation_date": 1779729166,
"content_license": "CC BY-SA 4.0",
"last_activity_date": 1779729166
},
{
"body": "<p>No. We have no idea what those times include. Is that only LLM time or that plus 10 other things? Also, links to external sites are fine for extra information but all the relevant info needs to go in your question including code and results. In a year, those links may be dead and much of the value of your question will be lost.</p>\n<p>That said, don't just copy and paste the entirety... you need to figure out what's relevant and post that. You need to do a lot of the heavy lifting and then come back and give us a summary of what you've tried, what worked or didn't work, etc.</p>\n",
"owner": {
"link": "https://stackoverflow.com/users/2386774/jeffc",
"user_id": 2386774,
"user_type": "registered",
"account_id": 2772450,
"reputation": 26577,
"display_name": "JeffC",
"profile_image": "https://www.gravatar.com/avatar/dea7da142cb7e85d5d5a8576e2625431?s=256&d=identicon&r=PG"
},
"score": 0,
"answer_id": 79946483,
"is_accepted": false,
"question_id": 79946304,
"creation_date": 1779744808,
"content_license": "CC BY-SA 4.0",
"last_activity_date": 1779744808
}
],
"tags_raw": [
"python",
"selenium-webdriver",
"langchain",
"large-language-model",
"privacy-policy"
],
"stats_raw": {
"score": 1,
"view_count": 69,
"is_answered": false,
"answer_count": 4,
"creation_date": 1779714179,
"last_edit_date": null,
"accepted_answer_id": null,
"last_activity_date": 1779810999
},
"selection_meta": {
"site": "stackoverflow",
"api_wrapper": {
"backoff": null,
"has_more": true,
"page_size": 8,
"quota_max": 300,
"quota_remaining": 296
},
"answer_fetch": {
"backoff": null,
"has_more": true,
"answers_fetched": 3,
"quota_remaining": 267,
"answer_page_size": 3
},
"snapshot_version": "stackoverflow_question_v1",
"selection_strategy": "tag_whitelist_unanswered_high_score_recent_active"
},
"created_at": "2026-05-26T22:02:06.040Z",
"updated_at": "2026-05-26T22:02:06.040Z"
}