星火 SparkCN

痛点分析发布于 2026/05/27

痛点为 AI 基于上游原始证据的初步提炼；未包含额外中国市场检索。

痛点

用户构建了一个基于LLM的隐私政策分析工具，但每份政策处理耗时25-37秒，导致批量作业效率极低。用户无法确定瓶颈是LLM令牌生成延迟、政策获取解析步骤还是代码架构问题，说明现有流程缺乏明确的性能诊断手段。这种缓慢的处理速度使得用户难以高效完成批量分析任务，可能造成时间浪费和决策延迟，尤其当需要分析大量政策时，摩擦感会显著放大。

§ Dossier

Stack Overflow question

I created a tool called Privacy Policy Analyzer, which uses LLMs to evaluate real-world privacy policies. It scores policies from platforms like GitHub, TikTok, and Facebook based on criteria such as data retention, cross-border data transfer, transparency, and user rights. The analyzer provides a strengths and risks breakdown along with an overall score for each policy. You can check out the repository here: https://github.com/myz21/privacy-policy-analyzer And see an example of the analysis results here: https://github.com/myz21/privacy-policy-analyzer/blob/main/RESULTS.md The main issue: While the results are detailed, the analysis process is pretty slow. Each policy takes around 25 to 37 seconds to process. This feels way too long for batch jobs, and I am trying to figure out if the primary bottleneck is LLM token generation latency, the policy fetching and parsing step, or my own code architecture. Has anyone tackled similar performance issues in LLM-powered document analyzers? Any tips for speeding things up, optimizing the pipeline, or best practices for handling long-form text analysis efficiently? All suggestions and critiques are appreciated!

§ Dossier

Question details

View count: 75
Answer count: 4
Last activity: 2026/05/26

§ Dossier

Answers

Do some profiling. At the minimum you can add print statements when each block starts so you will have an exact time of how long each part takes and then you can make a decision based on what you've learned.

评论作者信息不可用0 votes

Isnt RESULTS.md enough for the analysis?

评论作者信息不可用0 votes

No. We have no idea what those times include. Is that only LLM time or that plus 10 other things? Also, links to external sites are fine for extra information but all the relevant info needs to go in your question including code and results. In a year, those links may be dead and much of the value of your question will be lost. That said, don't just copy and paste the entirety... you need to figure out what's relevant and post that. You need to do a lot of the heavy lifting and then come back and give us a summary of what you've tried, what worked or didn't work, etc.

评论作者信息不可用0 votes

源数据· Raw Archive

source: Stack Overflow
upstream_source: stackoverflow
upstream_item_id: 79946304
daily_ranking_item_id: 9fcb0f90-0d6c-4752-810a-3c511ce9b8de
rank_date: 2026-05-28
rank: 4
name: Built a Privacy Policy Analyzer with LLMs, but it's slow
tagline: python, selenium-webdriver, langchain, large-language-model, privacy-policy
description: I created a tool called Privacy Policy Analyzer, which uses LLMs to evaluate real-world privacy policies. It scores policies from platforms like GitHub, TikTok, and Facebook based on criteria such as data retention, cross-border data transfer, transparency, and user rights. The analyzer provides a strengths and risks breakdown along with an overall score for each policy. You can check out the repository here: https://github.com/myz21/privacy-policy-analyzer And see an example of the analysis results here: https://github.com/myz21/privacy-policy-analyzer/blob/main/RESULTS.md The main issue: While the results are detailed, the analysis process is pretty slow. Each policy takes around 25 to 37 seconds to process. This feels way too long for batch jobs, and I am trying to figure out if the primary bottleneck is LLM token generation latency, the policy fetching and parsing step, or my own code architecture. Has anyone tackled similar performance issues in LLM-powered document analyzers? Any tips for speeding things up, optimizing the pipeline, or best practices for handling long-form text analysis efficiently? All suggestions and critiques are appreciated!
votes_count: 1
comments_count: 4
created_at_on_source: 2026-05-25T13:02:59.000Z
source_url: https://stackoverflow.com/questions/79946304/built-a-privacy-policy-analyzer-with-llms-but-its-slow

topics

pythonselenium-webdriverlangchainlarge-language-modelprivacy-policy

media / source-specific data

{
  "stackoverflow": {
    "score": 1,
    "view_count": 75,
    "is_answered": false,
    "top_answers": [
      {
        "body": "Do some profiling. At the minimum you can add print statements when each block starts so you will have an exact time of how long each part takes and then you can make a decision based on what you've learned.",
        "score": 0,
        "answer_id": 79946322,
        "is_accepted": false
      },
      {
        "body": "Isnt RESULTS.md enough for the analysis?",
        "score": 0,
        "answer_id": 79946398,
        "is_accepted": false
      },
      {
        "body": "No. We have no idea what those times include. Is that only LLM time or that plus 10 other things? Also, links to external sites are fine for extra information but all the relevant info needs to go in your question including code and results. In a year, those links may be dead and much of the value of your question will be lost. That said, don't just copy and paste the entirety... you need to figure out what's relevant and post that. You need to do a lot of the heavy lifting and then come back and give us a summary of what you've tried, what worked or didn't work, etc.",
        "score": 0,
        "answer_id": 79946483,
        "is_accepted": false
      }
    ],
    "answer_count": 4,
    "accepted_answer_id": null,
    "last_activity_date": 1779810999
  }
}

raw_payload

{
  "stats": {
    "score": 1,
    "view_count": 75,
    "is_answered": false,
    "answer_count": 4,
    "creation_date": 1779714179,
    "last_edit_date": null,
    "accepted_answer_id": null,
    "last_activity_date": 1779810999
  },
  "api_wrapper": {
    "backoff": null,
    "has_more": true,
    "page_size": 8,
    "quota_max": 300,
    "quota_remaining": 209
  },
  "question_id": 79946304,
  "answer_fetch": {
    "has_more": true,
    "answers_fetched": 3,
    "answer_page_size": 3
  },
  "snapshot_version": "stackoverflow_question_v1"
}

source_raw_snapshot

{
  "id": "abb87f7a-c383-4afe-80c5-e4a7f5a4d050",
  "daily_ranking_item_id": "9fcb0f90-0d6c-4752-810a-3c511ce9b8de",
  "source": "stackoverflow",
  "external_id": "79946304",
  "fetched_at": "2026-05-27T22:01:45.075Z",
  "question_raw": {
    "body": "<p>I created a tool called Privacy Policy Analyzer, which uses LLMs to evaluate real-world privacy policies. It scores policies from platforms like GitHub, TikTok, and Facebook based on criteria such as data retention, cross-border data transfer, transparency, and user rights. The analyzer provides a strengths and risks breakdown along with an overall score for each policy. You can check out the repository here: <a href=\"https://github.com/myz21/privacy-policy-analyzer\" rel=\"nofollow noreferrer\">https://github.com/myz21/privacy-policy-analyzer</a> And see an example of the analysis results here: <a href=\"https://github.com/myz21/privacy-policy-analyzer/blob/main/RESULTS.md\" rel=\"nofollow noreferrer\">https://github.com/myz21/privacy-policy-analyzer/blob/main/RESULTS.md</a></p>\n<p>The main issue: While the results are detailed, the analysis process is pretty slow. Each policy takes around 25 to 37 seconds to process. This feels way too long for batch jobs, and I am trying to figure out if the primary bottleneck is LLM token generation latency, the policy fetching and parsing step, or my own code architecture. Has anyone tackled similar performance issues in LLM-powered document analyzers? Any tips for speeding things up, optimizing the pipeline, or best practices for handling long-form text analysis efficiently?</p>\n<p>All suggestions and critiques are appreciated!</p>\n",
    "link": "https://stackoverflow.com/questions/79946304/built-a-privacy-policy-analyzer-with-llms-but-its-slow",
    "tags": [
      "python",
      "selenium-webdriver",
      "langchain",
      "large-language-model",
      "privacy-policy"
    ],
    "owner": {
      "link": "https://stackoverflow.com/users/32765077/al-gebra",
      "user_id": 32765077,
      "user_type": "registered",
      "account_id": 30685845,
      "reputation": 1,
      "display_name": "Al-Gebra",
      "profile_image": "https://i.sstatic.net/juYO2.jpg?s=256"
    },
    "score": 1,
    "title": "Built a Privacy Policy Analyzer with LLMs, but it&#39;s slow",
    "view_count": 75,
    "is_answered": false,
    "question_id": 79946304,
    "answer_count": 4,
    "creation_date": 1779714179,
    "content_license": "CC BY-SA 4.0",
    "last_activity_date": 1779810999
  },
  "answers_raw": [
    {
      "body": "<p>Do some profiling. At the minimum you can add print statements when each block starts so you will have an exact time of how long each part takes and then you can make a decision based on what you've learned.</p>\n",
      "owner": {
        "link": "https://stackoverflow.com/users/2386774/jeffc",
        "user_id": 2386774,
        "user_type": "registered",
        "account_id": 2772450,
        "reputation": 26577,
        "display_name": "JeffC",
        "profile_image": "https://www.gravatar.com/avatar/dea7da142cb7e85d5d5a8576e2625431?s=256&d=identicon&r=PG"
      },
      "score": 0,
      "answer_id": 79946322,
      "is_accepted": false,
      "question_id": 79946304,
      "creation_date": 1779718109,
      "content_license": "CC BY-SA 4.0",
      "last_activity_date": 1779718109
    },
    {
      "body": "<p>Isnt <a href=\"https://github.com/myz21/privacy-policy-analyzer/blob/main/RESULTS.md\" rel=\"nofollow noreferrer\">RESULTS.md</a> enough for the analysis?</p>\n",
      "owner": {
        "link": "https://stackoverflow.com/users/32765077/al-gebra",
        "user_id": 32765077,
        "user_type": "registered",
        "account_id": 30685845,
        "reputation": 1,
        "display_name": "Al-Gebra",
        "profile_image": "https://i.sstatic.net/juYO2.jpg?s=256"
      },
      "score": 0,
      "answer_id": 79946398,
      "is_accepted": false,
      "question_id": 79946304,
      "creation_date": 1779729166,
      "content_license": "CC BY-SA 4.0",
      "last_activity_date": 1779729166
    },
    {
      "body": "<p>No. We have no idea what those times include. Is that only LLM time or that plus 10 other things? Also, links to external sites are fine for extra information but all the relevant info needs to go in your question including code and results. In a year, those links may be dead and much of the value of your question will be lost.</p>\n<p>That said, don't just copy and paste the entirety... you need to figure out what's relevant and post that. You need to do a lot of the heavy lifting and then come back and give us a summary of what you've tried, what worked or didn't work, etc.</p>\n",
      "owner": {
        "link": "https://stackoverflow.com/users/2386774/jeffc",
        "user_id": 2386774,
        "user_type": "registered",
        "account_id": 2772450,
        "reputation": 26577,
        "display_name": "JeffC",
        "profile_image": "https://www.gravatar.com/avatar/dea7da142cb7e85d5d5a8576e2625431?s=256&d=identicon&r=PG"
      },
      "score": 0,
      "answer_id": 79946483,
      "is_accepted": false,
      "question_id": 79946304,
      "creation_date": 1779744808,
      "content_license": "CC BY-SA 4.0",
      "last_activity_date": 1779744808
    }
  ],
  "tags_raw": [
    "python",
    "selenium-webdriver",
    "langchain",
    "large-language-model",
    "privacy-policy"
  ],
  "stats_raw": {
    "score": 1,
    "view_count": 75,
    "is_answered": false,
    "answer_count": 4,
    "creation_date": 1779714179,
    "last_edit_date": null,
    "accepted_answer_id": null,
    "last_activity_date": 1779810999
  },
  "selection_meta": {
    "site": "stackoverflow",
    "api_wrapper": {
      "backoff": null,
      "has_more": true,
      "page_size": 8,
      "quota_max": 300,
      "quota_remaining": 209
    },
    "answer_fetch": {
      "backoff": null,
      "has_more": true,
      "answers_fetched": 3,
      "quota_remaining": 177,
      "answer_page_size": 3
    },
    "snapshot_version": "stackoverflow_question_v1",
    "selection_strategy": "tag_whitelist_unanswered_high_score_recent_active"
  },
  "created_at": "2026-05-27T22:01:45.431Z",
  "updated_at": "2026-05-27T22:01:45.431Z"
}