返回 Discover
Field DispatchStack Overflow7 · 2026-05-31

Designing a Scalable Code Execution Service (LeetCode-like) for ~20k Users / 1000 Concurrent Users

Tags
postgresqlazurecodesandboxjudge-api
Score
0
Answers
2
Views
61
Answered
No
痛点分析发布于 2026/05/30

痛点为 AI 基于上游原始证据的初步提炼;未包含额外中国市场检索。

痛点

在构建类似 LeetCode 的在线代码执行平台时,随着用户规模增长(约 2 万用户、1000 并发),原有为小规模设计的架构暴露出严重问题:每个执行请求都启动一个 Docker 容器,但容器清理不可靠,导致孤儿容器堆积;并发增加后执行失败率上升,资源使用变得不可预测。用户需要同时保证“运行代码”的低延迟和“提交代码”的异步处理,但现有方案在扩展时既无法维持响应速度,又难以控制基础设施成本。这种架构瓶颈直接造成系统稳定性下降、运维负担加重,并阻碍平台向更大用户规模发展。

§ Dossier

Stack Overflow question

I am building an online code execution platform similar to LeetCode. Current Architecture Fastify Backend Receives requests from users. Sends code execution requests to a separate Judge service. Judge Service (Dedicated VM) Receives code, language, and input. Spins up a Docker container for each execution request. Runs the code inside the container. Returns the output. Removes the container after execution. Current Problem This architecture works reasonably well for small-scale usage (~100 users), but under higher load I start seeing issues: Some Docker containers are created but never removed. Container cleanup becomes unreliable. Execution failures increase as concurrency grows. Resource usage on the Judge VM becomes unpredictable. Submission Flow I currently have two execution paths: Run Code API Used when users click "Run". Direct HTTP request to the Judge service. No queue is used because users expect an immediate response. Submit Code API Used when users submit a solution. Request is pushed to a Redis queue. BullMQ workers consume jobs and run test cases asynchronously. Scaling Goal I want to scale this system to approximately: 20,000 total users ~1,000 concurrent users Fast response times for the "Run Code" feature Cost-efficient infrastructure Reliable sandboxing and container cleanup Questions What architecture do large platforms (e.g., LeetCode, HackerRank, Codeforces) typically use for code execution at scale? Is spinning up a Docker container per request still a good approach at this scale? Should the "Run Code" API also use a queue, or is there a better pattern for low-latency execution? Would Kubernetes be the recommended solution, or are there better alternatives? How should sandbox lifecycle management and cleanup be handled to prevent orphaned containers? What would be a cost-optimized architecture capable of handling ~1,000 concurrent executions? Are there any open-source judge systems or execution architectures worth studying? Any architecture diagrams, production experience, or recommendations would be greatly appreciated.

§ Dossier

Question details

View count
61
Answer count
2
Last activity
2026/05/30
§ Dossier

Answers

for this: "Is spinning up a Docker container per request still a good approach at this scale?" Definitely not . You could potentially share execution within a single container per language or even multiple languages, spawn a container in a cold start with a TTL, this container would receive the requests to execute the code, each execution request should run in an its own terminal inside the container. Once TTL expires you can kill the container. Or you can even implement a kubernetes and balance with pods if you are facing too many requests. You don't really need a single container per execution request. Overtime you balance your pods with the most requested languages. have something in your architecture to talk to kubernetes to see which pods are available to be used. I hope you get the idea

评论作者信息不可用0 votes

Use user's desktop. Checkout Web Containers, you can compile docker that can run in a web assembly inside a worker thread in user's browser. Stackblitz has done it, it isn't that hard to do it. You have to create your own job queue inside postgres database that should keep track of running VM, allocate maximum time and periodically check orphaned containers and remove them.

评论作者信息不可用0 votes
源数据· Raw Archive
source
Stack Overflow
upstream_source
stackoverflow
upstream_item_id
79948683
daily_ranking_item_id
f73f38c6-d5f0-4c03-9284-89138760a3cc
rank_date
2026-05-31
rank
7
name
Designing a Scalable Code Execution Service (LeetCode-like) for ~20k Users / 1000 Concurrent Users
tagline
postgresql, azure, codesandbox, judge-api
description
I am building an online code execution platform similar to LeetCode. Current Architecture Fastify Backend Receives requests from users. Sends code execution requests to a separate Judge service. Judge Service (Dedicated VM) Receives code, language, and input. Spins up a Docker container for each execution request. Runs the code inside the container. Returns the output. Removes the container after execution. Current Problem This architecture works reasonably well for small-scale usage (~100 users), but under higher load I start seeing issues: Some Docker containers are created but never removed. Container cleanup becomes unreliable. Execution failures increase as concurrency grows. Resource usage on the Judge VM becomes unpredictable. Submission Flow I currently have two execution paths: Run Code API Used when users click "Run". Direct HTTP request to the Judge service. No queue is used because users expect an immediate response. Submit Code API Used when users submit a solution. Request is pushed to a Redis queue. BullMQ workers consume jobs and run test cases asynchronously. Scaling Goal I want to scale this system to approximately: 20,000 total users ~1,000 concurrent users Fast response times for the "Run Code" feature Cost-efficient infrastructure Reliable sandboxing and container cleanup Questions What architecture do large platforms (e.g., LeetCode, HackerRank, Codeforces) typically use for code execution at scale? Is spinning up a Docker container per request still a good approach at this scale? Should the "Run Code" API also use a queue, or is there a better pattern for low-latency execution? Would Kubernetes be the recommended solution, or are there better alternatives? How should sandbox lifecycle management and cleanup be handled to prevent orphaned containers? What would be a cost-optimized architecture capable of handling ~1,000 concurrent executions? Are there any open-source judge systems or execution architectures worth studying? Any architecture diagrams, production experience, or recommendations would be greatly appreciated.
votes_count
0
comments_count
2
created_at_on_source
2026-05-29T20:49:30.000Z
topics
postgresqlazurecodesandboxjudge-api
media / source-specific data
{
  "stackoverflow": {
    "score": 0,
    "view_count": 61,
    "is_answered": false,
    "top_answers": [
      {
        "body": "for this: \"Is spinning up a Docker container per request still a good approach at this scale?\" Definitely not . You could potentially share execution within a single container per language or even multiple languages, spawn a container in a cold start with a TTL, this container would receive the requests to execute the code, each execution request should run in an its own terminal inside the container. Once TTL expires you can kill the container. Or you can even implement a kubernetes and balance with pods if you are facing too many requests. You don't really need a single container per execution request. Overtime you balance your pods with the most requested languages. have something in your architecture to talk to kubernetes to see which pods are available to be used. I hope you get the idea",
        "score": 0,
        "answer_id": 79948731,
        "is_accepted": false
      },
      {
        "body": "Use user's desktop. Checkout Web Containers, you can compile docker that can run in a web assembly inside a worker thread in user's browser. Stackblitz has done it, it isn't that hard to do it. You have to create your own job queue inside postgres database that should keep track of running VM, allocate maximum time and periodically check orphaned containers and remove them.",
        "score": 0,
        "answer_id": 79948800,
        "is_accepted": false
      }
    ],
    "answer_count": 2,
    "accepted_answer_id": null,
    "last_activity_date": 1780124795
  }
}
raw_payload
{
  "stats": {
    "score": 0,
    "view_count": 61,
    "is_answered": false,
    "answer_count": 2,
    "creation_date": 1780087770,
    "last_edit_date": null,
    "accepted_answer_id": null,
    "last_activity_date": 1780124795
  },
  "api_wrapper": {
    "backoff": null,
    "has_more": true,
    "page_size": 8,
    "quota_max": 300,
    "quota_remaining": 210
  },
  "question_id": 79948683,
  "answer_fetch": {
    "has_more": false,
    "answers_fetched": 2,
    "answer_page_size": 3
  },
  "snapshot_version": "stackoverflow_question_v1"
}
source_raw_snapshot
{
  "id": "d8ce114f-4161-4b1d-94a2-f49d3da59f24",
  "daily_ranking_item_id": "f73f38c6-d5f0-4c03-9284-89138760a3cc",
  "source": "stackoverflow",
  "external_id": "79948683",
  "fetched_at": "2026-05-30T22:02:04.421Z",
  "question_raw": {
    "body": "<p>I am building an online code execution platform similar to LeetCode.</p>\n<h3>Current Architecture</h3>\n<ul>\n<li><p><strong>Fastify Backend</strong></p>\n<ul>\n<li><p>Receives requests from users.</p>\n</li>\n<li><p>Sends code execution requests to a separate Judge service.</p>\n</li>\n</ul>\n</li>\n<li><p><strong>Judge Service (Dedicated VM)</strong></p>\n<ul>\n<li><p>Receives code, language, and input.</p>\n</li>\n<li><p>Spins up a Docker container for each execution request.</p>\n</li>\n<li><p>Runs the code inside the container.</p>\n</li>\n<li><p>Returns the output.</p>\n</li>\n<li><p>Removes the container after execution.</p>\n</li>\n</ul>\n</li>\n</ul>\n<h3>Current Problem</h3>\n<p>This architecture works reasonably well for small-scale usage (~100 users), but under higher load I start seeing issues:</p>\n<ul>\n<li><p>Some Docker containers are created but never removed.</p>\n</li>\n<li><p>Container cleanup becomes unreliable.</p>\n</li>\n<li><p>Execution failures increase as concurrency grows.</p>\n</li>\n<li><p>Resource usage on the Judge VM becomes unpredictable.</p>\n</li>\n</ul>\n<h3>Submission Flow</h3>\n<p>I currently have two execution paths:</p>\n<ol>\n<li><p><strong>Run Code API</strong></p>\n<ul>\n<li><p>Used when users click &quot;Run&quot;.</p>\n</li>\n<li><p>Direct HTTP request to the Judge service.</p>\n</li>\n<li><p>No queue is used because users expect an immediate response.</p>\n</li>\n</ul>\n</li>\n<li><p><strong>Submit Code API</strong></p>\n<ul>\n<li><p>Used when users submit a solution.</p>\n</li>\n<li><p>Request is pushed to a Redis queue.</p>\n</li>\n<li><p>BullMQ workers consume jobs and run test cases asynchronously.</p>\n</li>\n</ul>\n</li>\n</ol>\n<h3>Scaling Goal</h3>\n<p>I want to scale this system to approximately:</p>\n<ul>\n<li><p>20,000 total users</p>\n</li>\n<li><p>~1,000 concurrent users</p>\n</li>\n<li><p>Fast response times for the &quot;Run Code&quot; feature</p>\n</li>\n<li><p>Cost-efficient infrastructure</p>\n</li>\n<li><p>Reliable sandboxing and container cleanup</p>\n</li>\n</ul>\n<h3>Questions</h3>\n<ol>\n<li><p>What architecture do large platforms (e.g., LeetCode, HackerRank, Codeforces) typically use for code execution at scale?</p>\n</li>\n<li><p>Is spinning up a Docker container per request still a good approach at this scale?</p>\n</li>\n<li><p>Should the &quot;Run Code&quot; API also use a queue, or is there a better pattern for low-latency execution?</p>\n</li>\n<li><p>Would Kubernetes be the recommended solution, or are there better alternatives?</p>\n</li>\n<li><p>How should sandbox lifecycle management and cleanup be handled to prevent orphaned containers?</p>\n</li>\n<li><p>What would be a cost-optimized architecture capable of handling ~1,000 concurrent executions?</p>\n</li>\n<li><p>Are there any open-source judge systems or execution architectures worth studying?</p>\n</li>\n</ol>\n<p>Any architecture diagrams, production experience, or recommendations would be greatly appreciated.</p>\n",
    "link": "https://stackoverflow.com/questions/79948683/designing-a-scalable-code-execution-service-leetcode-like-for-20k-users-100",
    "tags": [
      "postgresql",
      "azure",
      "codesandbox",
      "judge-api"
    ],
    "owner": {
      "link": "https://stackoverflow.com/users/27308944/jnanesh",
      "user_id": 27308944,
      "user_type": "registered",
      "account_id": 35617201,
      "reputation": 1,
      "display_name": "Jnanesh",
      "profile_image": "https://www.gravatar.com/avatar/bc5ab54c3e7477be3f5e77a05d0f1b86?s=256&d=identicon&r=PG&f=y&so-version=2"
    },
    "score": 0,
    "title": "Designing a Scalable Code Execution Service (LeetCode-like) for ~20k Users / 1000 Concurrent Users",
    "view_count": 61,
    "is_answered": false,
    "question_id": 79948683,
    "answer_count": 2,
    "creation_date": 1780087770,
    "content_license": "CC BY-SA 4.0",
    "last_activity_date": 1780124795
  },
  "answers_raw": [
    {
      "body": "<p>for this: &quot;Is spinning up a Docker container per request still a good approach at this scale?&quot; <strong>Definitely not</strong>. You could potentially share execution within a single container per language or even multiple languages, spawn a container in a cold start with a TTL, this container would receive the requests to execute the code, each execution request should run in an its own terminal inside the container. Once TTL expires you can kill the container. Or you can even implement a kubernetes and balance with pods if you are facing too many requests. You don't really need a single container per execution request. Overtime you balance your pods with the most requested languages. have something in your architecture to talk to kubernetes to see which pods are available to be used. I hope you get the idea</p>\n",
      "owner": {
        "link": "https://stackoverflow.com/users/460557/jorge-campos",
        "user_id": 460557,
        "user_type": "registered",
        "account_id": 209503,
        "reputation": 23887,
        "accept_rate": 100,
        "display_name": "Jorge Campos",
        "profile_image": "https://www.gravatar.com/avatar/b4565f97815833390c9c880e9e8522e4?s=256&d=identicon&r=PG"
      },
      "score": 0,
      "answer_id": 79948731,
      "is_accepted": false,
      "question_id": 79948683,
      "creation_date": 1780099694,
      "content_license": "CC BY-SA 4.0",
      "last_activity_date": 1780099694
    },
    {
      "body": "<ol>\n<li><p>Use user's desktop. Checkout Web Containers, you can compile docker that can run in a web assembly inside a worker thread in user's browser. Stackblitz has done it, it isn't that hard to do it.</p>\n</li>\n<li><p>You have to create your own job queue inside postgres database that should keep track of running VM, allocate maximum time and periodically check orphaned containers and remove them.</p>\n</li>\n</ol>\n",
      "owner": {
        "link": "https://stackoverflow.com/users/85597/akash-kava",
        "user_id": 85597,
        "user_type": "registered",
        "account_id": 31223,
        "reputation": 40142,
        "accept_rate": 82,
        "display_name": "Akash Kava",
        "profile_image": "https://i.sstatic.net/pc3cE.jpg?s=256"
      },
      "score": 0,
      "answer_id": 79948800,
      "is_accepted": false,
      "question_id": 79948683,
      "creation_date": 1780124795,
      "content_license": "CC BY-SA 4.0",
      "last_activity_date": 1780124795
    }
  ],
  "tags_raw": [
    "postgresql",
    "azure",
    "codesandbox",
    "judge-api"
  ],
  "stats_raw": {
    "score": 0,
    "view_count": 61,
    "is_answered": false,
    "answer_count": 2,
    "creation_date": 1780087770,
    "last_edit_date": null,
    "accepted_answer_id": null,
    "last_activity_date": 1780124795
  },
  "selection_meta": {
    "site": "stackoverflow",
    "api_wrapper": {
      "backoff": null,
      "has_more": true,
      "page_size": 8,
      "quota_max": 300,
      "quota_remaining": 210
    },
    "answer_fetch": {
      "backoff": null,
      "has_more": false,
      "answers_fetched": 2,
      "quota_remaining": 182,
      "answer_page_size": 3
    },
    "snapshot_version": "stackoverflow_question_v1",
    "selection_strategy": "tag_whitelist_unanswered_high_score_recent_active"
  },
  "created_at": "2026-05-30T22:02:04.650Z",
  "updated_at": "2026-05-30T22:02:04.650Z"
}