痛点分析发布于 2026/05/28
痛点为 AI 基于上游原始证据的初步提炼;未包含额外中国市场检索。
痛点
用户在处理文件转换时,需要将各种格式(如PDF、Office文档)统一转为Markdown,以便后续在AI工具(如LangChain、OpenAI)中处理。现有流程中,手动转换耗时且容易出错,不同格式的兼容性问题导致信息丢失或格式混乱。这造成了重复劳动和协作成本,因为团队成员需要花费额外时间清理和调整转换结果,影响AI应用的输入质量。
README
README summary
MarkItDown [!IMPORTANT] MarkItDown performs I/O with the privileges of the current process. Like open() or requests.get(), it will access resources that the process itself can access. Sanitize your inputs in untrusted environments, and call the narrowest convert_ function needed for your use case (e.g., convert_stream() , or convert_local() ). See the Security Considerations section of the documentation for more information.
Repository
Repository metadata
- Owner
- microsoft
- License
- MIT
- Default branch
- main
- Days since created
- 561
- Days since last push
- 1
Signals
Repository signals
- Watchers
- 127,613
- Open issues
- 671
Topics
autogenautogen-extensionlangchainmarkdownmicrosoft-officeopenaipdf
Contributors
Contributor snapshot
- Contributor count
- 10
- Top contributor share
- 0.457
- Top contributors
- afourney, gagb, sugatoray, PetrAPConsulting, l-lumin
源数据· Raw Archive
- source
- GitHub Trending
- upstream_source
- github_trending
- upstream_item_id
- microsoft--markitdown
- daily_ranking_item_id
- f1dcb0e6-6a0c-4aad-8d4e-bee87a37cbd6
- rank_date
- 2026-05-29
- rank
- 8
- name
- markitdown
- tagline
- Python tool for converting files and office documents to Markdown.
- description
- Python tool for converting files and office documents to Markdown.
- votes_count
- 127,613
- source_url
- https://github.com/microsoft/markitdown
- thumbnail_url
- https://github.com/microsoft.png
- og_image_url
- https://github.com/microsoft.png
topics
autogenautogen-extensionlangchainmarkdownmicrosoft-officeopenaipdf
media / source-specific data
{
"topics": [
"autogen",
"autogen-extension",
"langchain",
"markdown",
"microsoft-office",
"openai",
"pdf"
],
"is_fork": false,
"license": "MIT",
"language": "Python",
"owner_type": "Organization",
"forks_total": 8733,
"has_funding": false,
"is_archived": false,
"owner_login": "microsoft",
"stars_today": 1263,
"stars_total": 127613,
"homepage_url": null,
"default_branch": "main",
"last_pushed_at": "2026-05-26T22:41:34Z",
"readme_summary": "MarkItDown [!IMPORTANT] MarkItDown performs I/O with the privileges of the current process. Like open() or requests.get(), it will access resources that the process itself can access. Sanitize your inputs in untrusted environments, and call the narrowest convert_ function needed for your use case (e.g., convert_stream() , or convert_local() ). See the Security Considerations section of the documentation for more information.",
"repo_full_name": "microsoft/markitdown",
"watchers_count": 127613,
"last_updated_at": "2026-05-28T21:58:23Z",
"top_contributors": [
{
"login": "afourney",
"contributions": 102
},
{
"login": "gagb",
"contributions": 70
},
{
"login": "sugatoray",
"contributions": 9
},
{
"login": "PetrAPConsulting",
"contributions": 8
},
{
"login": "l-lumin",
"contributions": 7
}
],
"contributor_count": 10,
"funding_platforms": [],
"open_issues_count": 671,
"days_since_created": 561,
"created_at_on_source": "2024-11-13T19:56:40Z",
"days_since_last_push": 1,
"top_contributor_share": 0.457
}raw_payload
{
"fetched_at": "2026-05-28T22:00:38.177Z",
"trending_repo": {
"url": "https://github.com/microsoft/markitdown",
"name": "markitdown",
"rank": 8,
"forks": 8733,
"owner": "microsoft",
"stars": 127613,
"fullName": "microsoft/markitdown",
"language": "Python",
"avatarUrl": "https://github.com/microsoft.png",
"rawSummary": "<div class=\"float-right d-flex\">\n\n <div data-view-component=\"true\" class=\"BtnGroup d-flex\">\n <a href=\"/login?return_to=%2Fmicrosoft%2Fmarkitdown\" rel=\"nofollow\" data-hydro-click=\"{"event_type":"authentication.click","payload":{"location_in_page":"star button","repository_id":888092115,"auth_type":"LOG_IN","originating_url":"https://github.com/trending?since=daily","user_id":null}}\" data-hydro-click-hmac=\"84542ae632d33c4b7f6eaedda7f4dc37ae6e1dd6e19f8c0a560212cc2398eb35\" aria-label=\"You must be signed in to star a repository\" data-view-component=\"true\" class=\"tooltipped tooltipped-sw btn-sm btn\"> <svg aria-hidden=\"true\" data-component=\"Octicon\" height=\"16\" viewBox=\"0 0 16 16\" version=\"1.1\" width=\"16\" data-view-component=\"true\" class=\"octicon octicon-star v-align-text-bottom d-none d-md-inline-block mr-2 tmp-mr-2\">\n <path d=\"M8 .25a.75.75 0 0 1 .673.418l1.882 3.815 4.21.612a.75.75 0 0 1 .416 1.279l-3.046 2.97.719 4.192a.751.751 0 0 1-1.088.791L8 12.347l-3.766 1.98a.75.75 0 0 1-1.088-.79l.72-4.194L.818 6.374a.75.75 0 0 1 .416-1.28l4.21-.611L7.327.668A.75.75 0 0 1 8 .25Zm0 2.445L6.615 5.5a.75.75 0 0 1-.564.41l-3.097.45 2.24 2.184a.75.75 0 0 1 .216.664l-.528 3.084 2.769-1.456a.75.75 0 0 1 .698 0l2.77 1.456-.53-3.084a.75.75 0 0 1 .216-.664l2.24-2.183-3.096-.45a.75.75 0 0 1-.564-.41L8 2.694Z\"></path>\n</svg><svg aria-hidden=\"true\" data-component=\"Octicon\" height=\"16\" viewBox=\"0 0 16 16\" version=\"1.1\" width=\"16\" data-view-component=\"true\" class=\"octicon octicon-star mr-0 tmp-mr-0 v-align-text-bottom d-inline-block d-md-none\">\n <path d=\"M8 .25a.75.75 0 0 1 .673.418l1.882 3.815 4.21.612a.75.75 0 0 1 .416 1.279l-3.046 2.97.719 4.192a.751.751 0 0 1-1.088.791L8 12.347l-3.766 1.98a.75.75 0 0 1-1.088-.79l.72-4.194L.818 6.374a.75.75 0 0 1 .416-1.28l4.21-.611L7.327.668A.75.75 0 0 1 8 .25Zm0 2.445L6.615 5.5a.75.75 0 0 1-.564.41l-3.097.45 2.24 2.184a.75.75 0 0 1 .216.664l-.528 3.084 2.769-1.456a.75.75 0 0 1 .698 0l2.77 1.456-.53-3.084a.75.75 0 0 1 .216-.664l2.24-2.183-3.096-.45a.75.75 0 0 1-.564-.41L8 2.694Z\"></path>\n</svg>\n <span data-view-component=\"true\" class=\"d-none d-md-inline\">\n Star\n</span>\n</a></div>\n </div>\n\n <h2 class=\"h3 lh-condensed\">\n <a data-hydro-click=\"{"event_type":"explore.click","payload":{"click_context":"TRENDING_REPOSITORIES_PAGE","click_target":"REPOSITORY","click_visual_representation":"REPOSITORY_NAME_HEADING","actor_id":null,"record_id":888092115,"originating_url":"https://github.com/trending?since=daily","user_id":null}}\" data-hydro-click-hmac=\"e56d7df19306f538e8b48eb101bc899cc5a0b7f2f15a3c2ec0ace84fd5732a36\" href=\"/microsoft/markitdown\" data-view-component=\"true\" class=\"Link\"><svg aria-hidden=\"true\" data-component=\"Octicon\" height=\"16\" viewBox=\"0 0 16 16\" version=\"1.1\" width=\"16\" data-view-component=\"true\" class=\"octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted\">\n <path d=\"M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z\"></path>\n</svg>\n\n <span data-view-component=\"true\" class=\"text-normal\">\n microsoft /\n</span>\n markitdown</a> </h2>\n\n <p class=\"col-9 color-fg-muted my-1 tmp-pr-4\">\n Python tool for converting files and office documents to Markdown.\n </p>\n\n <div class=\"f6 color-fg-muted mt-2\">\n <span class=\"tmp-mr-3 d-inline-block ml-0 tmp-ml-0\">\n <span class=\"repo-language-color\" style=\"background-color: #3572A5\"></span>\n <span itemprop=\"programmingLanguage\">Python</span>\n</span>\n\n\n <a href=\"/microsoft/markitdown/stargazers\" data-view-component=\"true\" class=\"tmp-mr-3 Link Link--muted d-inline-block\"><svg aria-label=\"star\" role=\"img\" data-component=\"Octicon\" height=\"16\" viewBox=\"0 0 16 16\" version=\"1.1\" width=\"16\" data-view-component=\"true\" class=\"octicon octicon-star\">\n <path d=\"M8 .25a.75.75 0 0 1 .673.418l1.882 3.815 4.21.612a.75.75 0 0 1 .416 1.279l-3.046 2.97.719 4.192a.751.751 0 0 1-1.088.791L8 12.347l-3.766 1.98a.75.75 0 0 1-1.088-.79l.72-4.194L.818 6.374a.75.75 0 0 1 .416-1.28l4.21-.611L7.327.668A.75.75 0 0 1 8 .25Zm0 2.445L6.615 5.5a.75.75 0 0 1-.564.41l-3.097.45 2.24 2.184a.75.75 0 0 1 .216.664l-.528 3.084 2.769-1.456a.75.75 0 0 1 .698 0l2.77 1.456-.53-3.084a.75.75 0 0 1 .216-.664l2.24-2.183-3.096-.45a.75.75 0 0 1-.564-.41L8 2.694Z\"></path>\n</svg>\n 127,613</a>\n <a href=\"/microsoft/markitdown/forks\" data-view-component=\"true\" class=\"tmp-mr-3 Link Link--muted d-inline-block\"><svg aria-label=\"fork\" role=\"img\" data-component=\"Octicon\" height=\"16\" viewBox=\"0 0 16 16\" version=\"1.1\" width=\"16\" data-view-component=\"true\" class=\"octicon octicon-repo-forked\">\n <path d=\"M5 5.372v.878c0 .414.336.75.75.75h4.5a.75.75 0 0 0 .75-.75v-.878a2.25 2.25 0 1 1 1.5 0v.878a2.25 2.25 0 0 1-2.25 2.25h-1.5v2.128a2.251 2.251 0 1 1-1.5 0V8.5h-1.5A2.25 2.25 0 0 1 3.5 6.25v-.878a2.25 2.25 0 1 1 1.5 0ZM5 3.25a.75.75 0 1 0-1.5 0 .75.75 0 0 0 1.5 0Zm6.75.75a.75.75 0 1 0 0-1.5.75.75 0 0 0 0 1.5Zm-3 8.75a.75.75 0 1 0-1.5 0 .75.75 0 0 0 1.5 0Z\"></path>\n</svg>\n 8,733</a>\n <span data-view-component=\"true\" class=\"tmp-mr-3 d-inline-block\">\n Built by\n\n <a class=\"d-inline-block\" data-hydro-click=\"{"event_type":"explore.click","payload":{"click_context":"TRENDING_REPOSITORIES_PAGE","click_target":"CONTRIBUTING_DEVELOPER","click_visual_representation":"DEVELOPER_AVATAR","actor_id":null,"record_id":null,"originating_url":"https://github.com/trending?since=daily","user_id":null}}\" data-hydro-click-hmac=\"1624b778276e789a4b952f4f024b178ca5826544f935e57298c1d5b6ad8c3665\" data-hovercard-type=\"user\" data-hovercard-url=\"/users/afourney/hovercard\" data-octo-click=\"hovercard-link-click\" data-octo-dimensions=\"link_type:self\" href=\"/afourney\"><img class=\"avatar mb-1 avatar-user\" src=\"https://avatars.githubusercontent.com/u/4017093?s=40&v=4\" width=\"20\" height=\"20\" alt=\"@afourney\" /></a>\n <a class=\"d-inline-block\" data-hydro-click=\"{"event_type":"explore.click","payload":{"click_context":"TRENDING_REPOSITORIES_PAGE","click_target":"CONTRIBUTING_DEVELOPER","click_visual_representation":"DEVELOPER_AVATAR","actor_id":null,"record_id":null,"originating_url":"https://github.com/trending?since=daily","user_id":null}}\" data-hydro-click-hmac=\"1624b778276e789a4b952f4f024b178ca5826544f935e57298c1d5b6ad8c3665\" data-hovercard-type=\"user\" data-hovercard-url=\"/users/gagb/hovercard\" data-octo-click=\"hovercard-link-click\" data-octo-dimensions=\"link_type:self\" href=\"/gagb\"><img class=\"avatar mb-1 avatar-user\" src=\"https://avatars.githubusercontent.com/u/13227607?s=40&v=4\" width=\"20\" height=\"20\" alt=\"@gagb\" /></a>\n <a class=\"d-inline-block\" data-hydro-click=\"{"event_type":"explore.click","payload":{"click_context":"TRENDING_REPOSITORIES_PAGE","click_target":"CONTRIBUTING_DEVELOPER","click_visual_representation":"DEVELOPER_AVATAR","actor_id":null,"record_id":null,"originating_url":"https://github.com/trending?since=daily","user_id":null}}\" data-hydro-click-hmac=\"1624b778276e789a4b952f4f024b178ca5826544f935e57298c1d5b6ad8c3665\" data-hovercard-type=\"user\" data-hovercard-url=\"/users/sugatoray/hovercard\" data-octo-click=\"hovercard-link-click\" data-octo-dimensions=\"link_type:self\" href=\"/sugatoray\"><img class=\"avatar mb-1 avatar-user\" src=\"https://avatars.githubusercontent.com/u/10201242?s=40&v=4\" width=\"20\" height=\"20\" alt=\"@sugatoray\" /></a>\n <a class=\"d-inline-block\" data-hydro-click=\"{"event_type":"explore.click","payload":{"click_context":"TRENDING_REPOSITORIES_PAGE","click_target":"CONTRIBUTING_DEVELOPER","click_visual_representation":"DEVELOPER_AVATAR","actor_id":null,"record_id":null,"originating_url":"https://github.com/trending?since=daily","user_id":null}}\" data-hydro-click-hmac=\"1624b778276e789a4b952f4f024b178ca5826544f935e57298c1d5b6ad8c3665\" data-hovercard-type=\"user\" data-hovercard-url=\"/users/PetrAPConsulting/hovercard\" data-octo-click=\"hovercard-link-click\" data-octo-dimensions=\"link_type:self\" href=\"/PetrAPConsulting\"><img class=\"avatar mb-1 avatar-user\" src=\"https://avatars.githubusercontent.com/u/173082609?s=40&v=4\" width=\"20\" height=\"20\" alt=\"@PetrAPConsulting\" /></a>\n <a class=\"d-inline-block\" data-hydro-click=\"{"event_type":"explore.click","payload":{"click_context":"TRENDING_REPOSITORIES_PAGE","click_target":"CONTRIBUTING_DEVELOPER","click_visual_representation":"DEVELOPER_AVATAR","actor_id":null,"record_id":null,"originating_url":"https://github.com/trending?since=daily","user_id":null}}\" data-hydro-click-hmac=\"1624b778276e789a4b952f4f024b178ca5826544f935e57298c1d5b6ad8c3665\" data-hovercard-type=\"user\" data-hovercard-url=\"/users/l-lumin/hovercard\" data-octo-click=\"hovercard-link-click\" data-octo-dimensions=\"link_type:self\" href=\"/l-lumin\"><img class=\"avatar mb-1 avatar-user\" src=\"https://avatars.githubusercontent.com/u/71011125?s=40&v=4\" width=\"20\" height=\"20\" alt=\"@l-lumin\" /></a>\n</span>\n <span data-view-component=\"true\" class=\"d-inline-block float-sm-right\">\n <svg aria-hidden=\"true\" data-component=\"Octicon\" height=\"16\" viewBox=\"0 0 16 16\" version=\"1.1\" width=\"16\" data-view-component=\"true\" class=\"octicon octicon-star\">\n <path d=\"M8 .25a.75.75 0 0 1 .673.418l1.882 3.815 4.21.612a.75.75 0 0 1 .416 1.279l-3.046 2.97.719 4.192a.751.751 0 0 1-1.088.791L8 12.347l-3.766 1.98a.75.75 0 0 1-1.088-.79l.72-4.194L.818 6.374a.75.75 0 0 1 .416-1.28l4.21-.611L7.327.668A.75.75 0 0 1 8 .25Zm0 2.445L6.615 5.5a.75.75 0 0 1-.564.41l-3.097.45 2.24 2.184a.75.75 0 0 1 .216.664l-.528 3.084 2.769-1.456a.75.75 0 0 1 .698 0l2.77 1.456-.53-3.084a.75.75 0 0 1 .216-.664l2.24-2.183-3.096-.45a.75.75 0 0 1-.564-.41L8 2.694Z\"></path>\n</svg>\n 1,263 stars today\n</span> </div>",
"starsToday": 1263,
"description": "Python tool for converting files and office documents to Markdown."
},
"snapshot_version": "github_trending_v2"
}source_raw_snapshot
{
"id": "2cd2a39b-e76b-4e8b-9b62-53e521ed81bf",
"daily_ranking_item_id": "f1dcb0e6-6a0c-4aad-8d4e-bee87a37cbd6",
"source": "github_trending",
"external_id": "microsoft--markitdown",
"fetched_at": "2026-05-28T22:00:38.177Z",
"trending_page_raw": {
"url": "https://github.com/microsoft/markitdown",
"name": "markitdown",
"rank": 8,
"forks": 8733,
"owner": "microsoft",
"stars": 127613,
"language": "Python",
"full_name": "microsoft/markitdown",
"avatar_url": "https://github.com/microsoft.png",
"description": "Python tool for converting files and office documents to Markdown.",
"raw_summary": "<div class=\"float-right d-flex\">\n\n <div data-view-component=\"true\" class=\"BtnGroup d-flex\">\n <a href=\"/login?return_to=%2Fmicrosoft%2Fmarkitdown\" rel=\"nofollow\" data-hydro-click=\"{"event_type":"authentication.click","payload":{"location_in_page":"star button","repository_id":888092115,"auth_type":"LOG_IN","originating_url":"https://github.com/trending?since=daily","user_id":null}}\" data-hydro-click-hmac=\"84542ae632d33c4b7f6eaedda7f4dc37ae6e1dd6e19f8c0a560212cc2398eb35\" aria-label=\"You must be signed in to star a repository\" data-view-component=\"true\" class=\"tooltipped tooltipped-sw btn-sm btn\"> <svg aria-hidden=\"true\" data-component=\"Octicon\" height=\"16\" viewBox=\"0 0 16 16\" version=\"1.1\" width=\"16\" data-view-component=\"true\" class=\"octicon octicon-star v-align-text-bottom d-none d-md-inline-block mr-2 tmp-mr-2\">\n <path d=\"M8 .25a.75.75 0 0 1 .673.418l1.882 3.815 4.21.612a.75.75 0 0 1 .416 1.279l-3.046 2.97.719 4.192a.751.751 0 0 1-1.088.791L8 12.347l-3.766 1.98a.75.75 0 0 1-1.088-.79l.72-4.194L.818 6.374a.75.75 0 0 1 .416-1.28l4.21-.611L7.327.668A.75.75 0 0 1 8 .25Zm0 2.445L6.615 5.5a.75.75 0 0 1-.564.41l-3.097.45 2.24 2.184a.75.75 0 0 1 .216.664l-.528 3.084 2.769-1.456a.75.75 0 0 1 .698 0l2.77 1.456-.53-3.084a.75.75 0 0 1 .216-.664l2.24-2.183-3.096-.45a.75.75 0 0 1-.564-.41L8 2.694Z\"></path>\n</svg><svg aria-hidden=\"true\" data-component=\"Octicon\" height=\"16\" viewBox=\"0 0 16 16\" version=\"1.1\" width=\"16\" data-view-component=\"true\" class=\"octicon octicon-star mr-0 tmp-mr-0 v-align-text-bottom d-inline-block d-md-none\">\n <path d=\"M8 .25a.75.75 0 0 1 .673.418l1.882 3.815 4.21.612a.75.75 0 0 1 .416 1.279l-3.046 2.97.719 4.192a.751.751 0 0 1-1.088.791L8 12.347l-3.766 1.98a.75.75 0 0 1-1.088-.79l.72-4.194L.818 6.374a.75.75 0 0 1 .416-1.28l4.21-.611L7.327.668A.75.75 0 0 1 8 .25Zm0 2.445L6.615 5.5a.75.75 0 0 1-.564.41l-3.097.45 2.24 2.184a.75.75 0 0 1 .216.664l-.528 3.084 2.769-1.456a.75.75 0 0 1 .698 0l2.77 1.456-.53-3.084a.75.75 0 0 1 .216-.664l2.24-2.183-3.096-.45a.75.75 0 0 1-.564-.41L8 2.694Z\"></path>\n</svg>\n <span data-view-component=\"true\" class=\"d-none d-md-inline\">\n Star\n</span>\n</a></div>\n </div>\n\n <h2 class=\"h3 lh-condensed\">\n <a data-hydro-click=\"{"event_type":"explore.click","payload":{"click_context":"TRENDING_REPOSITORIES_PAGE","click_target":"REPOSITORY","click_visual_representation":"REPOSITORY_NAME_HEADING","actor_id":null,"record_id":888092115,"originating_url":"https://github.com/trending?since=daily","user_id":null}}\" data-hydro-click-hmac=\"e56d7df19306f538e8b48eb101bc899cc5a0b7f2f15a3c2ec0ace84fd5732a36\" href=\"/microsoft/markitdown\" data-view-component=\"true\" class=\"Link\"><svg aria-hidden=\"true\" data-component=\"Octicon\" height=\"16\" viewBox=\"0 0 16 16\" version=\"1.1\" width=\"16\" data-view-component=\"true\" class=\"octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted\">\n <path d=\"M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z\"></path>\n</svg>\n\n <span data-view-component=\"true\" class=\"text-normal\">\n microsoft /\n</span>\n markitdown</a> </h2>\n\n <p class=\"col-9 color-fg-muted my-1 tmp-pr-4\">\n Python tool for converting files and office documents to Markdown.\n </p>\n\n <div class=\"f6 color-fg-muted mt-2\">\n <span class=\"tmp-mr-3 d-inline-block ml-0 tmp-ml-0\">\n <span class=\"repo-language-color\" style=\"background-color: #3572A5\"></span>\n <span itemprop=\"programmingLanguage\">Python</span>\n</span>\n\n\n <a href=\"/microsoft/markitdown/stargazers\" data-view-component=\"true\" class=\"tmp-mr-3 Link Link--muted d-inline-block\"><svg aria-label=\"star\" role=\"img\" data-component=\"Octicon\" height=\"16\" viewBox=\"0 0 16 16\" version=\"1.1\" width=\"16\" data-view-component=\"true\" class=\"octicon octicon-star\">\n <path d=\"M8 .25a.75.75 0 0 1 .673.418l1.882 3.815 4.21.612a.75.75 0 0 1 .416 1.279l-3.046 2.97.719 4.192a.751.751 0 0 1-1.088.791L8 12.347l-3.766 1.98a.75.75 0 0 1-1.088-.79l.72-4.194L.818 6.374a.75.75 0 0 1 .416-1.28l4.21-.611L7.327.668A.75.75 0 0 1 8 .25Zm0 2.445L6.615 5.5a.75.75 0 0 1-.564.41l-3.097.45 2.24 2.184a.75.75 0 0 1 .216.664l-.528 3.084 2.769-1.456a.75.75 0 0 1 .698 0l2.77 1.456-.53-3.084a.75.75 0 0 1 .216-.664l2.24-2.183-3.096-.45a.75.75 0 0 1-.564-.41L8 2.694Z\"></path>\n</svg>\n 127,613</a>\n <a href=\"/microsoft/markitdown/forks\" data-view-component=\"true\" class=\"tmp-mr-3 Link Link--muted d-inline-block\"><svg aria-label=\"fork\" role=\"img\" data-component=\"Octicon\" height=\"16\" viewBox=\"0 0 16 16\" version=\"1.1\" width=\"16\" data-view-component=\"true\" class=\"octicon octicon-repo-forked\">\n <path d=\"M5 5.372v.878c0 .414.336.75.75.75h4.5a.75.75 0 0 0 .75-.75v-.878a2.25 2.25 0 1 1 1.5 0v.878a2.25 2.25 0 0 1-2.25 2.25h-1.5v2.128a2.251 2.251 0 1 1-1.5 0V8.5h-1.5A2.25 2.25 0 0 1 3.5 6.25v-.878a2.25 2.25 0 1 1 1.5 0ZM5 3.25a.75.75 0 1 0-1.5 0 .75.75 0 0 0 1.5 0Zm6.75.75a.75.75 0 1 0 0-1.5.75.75 0 0 0 0 1.5Zm-3 8.75a.75.75 0 1 0-1.5 0 .75.75 0 0 0 1.5 0Z\"></path>\n</svg>\n 8,733</a>\n <span data-view-component=\"true\" class=\"tmp-mr-3 d-inline-block\">\n Built by\n\n <a class=\"d-inline-block\" data-hydro-click=\"{"event_type":"explore.click","payload":{"click_context":"TRENDING_REPOSITORIES_PAGE","click_target":"CONTRIBUTING_DEVELOPER","click_visual_representation":"DEVELOPER_AVATAR","actor_id":null,"record_id":null,"originating_url":"https://github.com/trending?since=daily","user_id":null}}\" data-hydro-click-hmac=\"1624b778276e789a4b952f4f024b178ca5826544f935e57298c1d5b6ad8c3665\" data-hovercard-type=\"user\" data-hovercard-url=\"/users/afourney/hovercard\" data-octo-click=\"hovercard-link-click\" data-octo-dimensions=\"link_type:self\" href=\"/afourney\"><img class=\"avatar mb-1 avatar-user\" src=\"https://avatars.githubusercontent.com/u/4017093?s=40&v=4\" width=\"20\" height=\"20\" alt=\"@afourney\" /></a>\n <a class=\"d-inline-block\" data-hydro-click=\"{"event_type":"explore.click","payload":{"click_context":"TRENDING_REPOSITORIES_PAGE","click_target":"CONTRIBUTING_DEVELOPER","click_visual_representation":"DEVELOPER_AVATAR","actor_id":null,"record_id":null,"originating_url":"https://github.com/trending?since=daily","user_id":null}}\" data-hydro-click-hmac=\"1624b778276e789a4b952f4f024b178ca5826544f935e57298c1d5b6ad8c3665\" data-hovercard-type=\"user\" data-hovercard-url=\"/users/gagb/hovercard\" data-octo-click=\"hovercard-link-click\" data-octo-dimensions=\"link_type:self\" href=\"/gagb\"><img class=\"avatar mb-1 avatar-user\" src=\"https://avatars.githubusercontent.com/u/13227607?s=40&v=4\" width=\"20\" height=\"20\" alt=\"@gagb\" /></a>\n <a class=\"d-inline-block\" data-hydro-click=\"{"event_type":"explore.click","payload":{"click_context":"TRENDING_REPOSITORIES_PAGE","click_target":"CONTRIBUTING_DEVELOPER","click_visual_representation":"DEVELOPER_AVATAR","actor_id":null,"record_id":null,"originating_url":"https://github.com/trending?since=daily","user_id":null}}\" data-hydro-click-hmac=\"1624b778276e789a4b952f4f024b178ca5826544f935e57298c1d5b6ad8c3665\" data-hovercard-type=\"user\" data-hovercard-url=\"/users/sugatoray/hovercard\" data-octo-click=\"hovercard-link-click\" data-octo-dimensions=\"link_type:self\" href=\"/sugatoray\"><img class=\"avatar mb-1 avatar-user\" src=\"https://avatars.githubusercontent.com/u/10201242?s=40&v=4\" width=\"20\" height=\"20\" alt=\"@sugatoray\" /></a>\n <a class=\"d-inline-block\" data-hydro-click=\"{"event_type":"explore.click","payload":{"click_context":"TRENDING_REPOSITORIES_PAGE","click_target":"CONTRIBUTING_DEVELOPER","click_visual_representation":"DEVELOPER_AVATAR","actor_id":null,"record_id":null,"originating_url":"https://github.com/trending?since=daily","user_id":null}}\" data-hydro-click-hmac=\"1624b778276e789a4b952f4f024b178ca5826544f935e57298c1d5b6ad8c3665\" data-hovercard-type=\"user\" data-hovercard-url=\"/users/PetrAPConsulting/hovercard\" data-octo-click=\"hovercard-link-click\" data-octo-dimensions=\"link_type:self\" href=\"/PetrAPConsulting\"><img class=\"avatar mb-1 avatar-user\" src=\"https://avatars.githubusercontent.com/u/173082609?s=40&v=4\" width=\"20\" height=\"20\" alt=\"@PetrAPConsulting\" /></a>\n <a class=\"d-inline-block\" data-hydro-click=\"{"event_type":"explore.click","payload":{"click_context":"TRENDING_REPOSITORIES_PAGE","click_target":"CONTRIBUTING_DEVELOPER","click_visual_representation":"DEVELOPER_AVATAR","actor_id":null,"record_id":null,"originating_url":"https://github.com/trending?since=daily","user_id":null}}\" data-hydro-click-hmac=\"1624b778276e789a4b952f4f024b178ca5826544f935e57298c1d5b6ad8c3665\" data-hovercard-type=\"user\" data-hovercard-url=\"/users/l-lumin/hovercard\" data-octo-click=\"hovercard-link-click\" data-octo-dimensions=\"link_type:self\" href=\"/l-lumin\"><img class=\"avatar mb-1 avatar-user\" src=\"https://avatars.githubusercontent.com/u/71011125?s=40&v=4\" width=\"20\" height=\"20\" alt=\"@l-lumin\" /></a>\n</span>\n <span data-view-component=\"true\" class=\"d-inline-block float-sm-right\">\n <svg aria-hidden=\"true\" data-component=\"Octicon\" height=\"16\" viewBox=\"0 0 16 16\" version=\"1.1\" width=\"16\" data-view-component=\"true\" class=\"octicon octicon-star\">\n <path d=\"M8 .25a.75.75 0 0 1 .673.418l1.882 3.815 4.21.612a.75.75 0 0 1 .416 1.279l-3.046 2.97.719 4.192a.751.751 0 0 1-1.088.791L8 12.347l-3.766 1.98a.75.75 0 0 1-1.088-.79l.72-4.194L.818 6.374a.75.75 0 0 1 .416-1.28l4.21-.611L7.327.668A.75.75 0 0 1 8 .25Zm0 2.445L6.615 5.5a.75.75 0 0 1-.564.41l-3.097.45 2.24 2.184a.75.75 0 0 1 .216.664l-.528 3.084 2.769-1.456a.75.75 0 0 1 .698 0l2.77 1.456-.53-3.084a.75.75 0 0 1 .216-.664l2.24-2.183-3.096-.45a.75.75 0 0 1-.564-.41L8 2.694Z\"></path>\n</svg>\n 1,263 stars today\n</span> </div>",
"stars_today": 1263
},
"repo_detail_raw": {
"id": 888092115,
"url": "https://api.github.com/repos/microsoft/markitdown",
"fork": false,
"name": "markitdown",
"size": 4285,
"forks": 8733,
"owner": {
"id": 6154722,
"url": "https://api.github.com/users/microsoft",
"type": "Organization",
"login": "microsoft",
"node_id": "MDEyOk9yZ2FuaXphdGlvbjYxNTQ3MjI=",
"html_url": "https://github.com/microsoft",
"gists_url": "https://api.github.com/users/microsoft/gists{/gist_id}",
"repos_url": "https://api.github.com/users/microsoft/repos",
"avatar_url": "https://avatars.githubusercontent.com/u/6154722?v=4",
"events_url": "https://api.github.com/users/microsoft/events{/privacy}",
"site_admin": false,
"gravatar_id": "",
"starred_url": "https://api.github.com/users/microsoft/starred{/owner}{/repo}",
"followers_url": "https://api.github.com/users/microsoft/followers",
"following_url": "https://api.github.com/users/microsoft/following{/other_user}",
"user_view_type": "public",
"organizations_url": "https://api.github.com/users/microsoft/orgs",
"subscriptions_url": "https://api.github.com/users/microsoft/subscriptions",
"received_events_url": "https://api.github.com/users/microsoft/received_events"
},
"topics": [
"autogen",
"autogen-extension",
"langchain",
"markdown",
"microsoft-office",
"openai",
"pdf"
],
"git_url": "git://github.com/microsoft/markitdown.git",
"license": {
"key": "mit",
"url": "https://api.github.com/licenses/mit",
"name": "MIT License",
"node_id": "MDc6TGljZW5zZTEz",
"spdx_id": "MIT"
},
"node_id": "R_kgDONO810w",
"private": false,
"ssh_url": "git@github.com:microsoft/markitdown.git",
"svn_url": "https://github.com/microsoft/markitdown",
"archived": false,
"disabled": false,
"has_wiki": true,
"homepage": "",
"html_url": "https://github.com/microsoft/markitdown",
"keys_url": "https://api.github.com/repos/microsoft/markitdown/keys{/key_id}",
"language": "Python",
"tags_url": "https://api.github.com/repos/microsoft/markitdown/tags",
"watchers": 127613,
"blobs_url": "https://api.github.com/repos/microsoft/markitdown/git/blobs{/sha}",
"clone_url": "https://github.com/microsoft/markitdown.git",
"forks_url": "https://api.github.com/repos/microsoft/markitdown/forks",
"full_name": "microsoft/markitdown",
"has_pages": false,
"hooks_url": "https://api.github.com/repos/microsoft/markitdown/hooks",
"pulls_url": "https://api.github.com/repos/microsoft/markitdown/pulls{/number}",
"pushed_at": "2026-05-26T22:41:34Z",
"teams_url": "https://api.github.com/repos/microsoft/markitdown/teams",
"trees_url": "https://api.github.com/repos/microsoft/markitdown/git/trees{/sha}",
"created_at": "2024-11-13T19:56:40Z",
"events_url": "https://api.github.com/repos/microsoft/markitdown/events",
"has_issues": true,
"issues_url": "https://api.github.com/repos/microsoft/markitdown/issues{/number}",
"labels_url": "https://api.github.com/repos/microsoft/markitdown/labels{/name}",
"merges_url": "https://api.github.com/repos/microsoft/markitdown/merges",
"mirror_url": null,
"updated_at": "2026-05-28T21:58:23Z",
"visibility": "public",
"archive_url": "https://api.github.com/repos/microsoft/markitdown/{archive_format}{/ref}",
"commits_url": "https://api.github.com/repos/microsoft/markitdown/commits{/sha}",
"compare_url": "https://api.github.com/repos/microsoft/markitdown/compare/{base}...{head}",
"description": "Python tool for converting files and office documents to Markdown.",
"forks_count": 8733,
"is_template": false,
"open_issues": 671,
"branches_url": "https://api.github.com/repos/microsoft/markitdown/branches{/branch}",
"comments_url": "https://api.github.com/repos/microsoft/markitdown/comments{/number}",
"contents_url": "https://api.github.com/repos/microsoft/markitdown/contents/{+path}",
"git_refs_url": "https://api.github.com/repos/microsoft/markitdown/git/refs{/sha}",
"git_tags_url": "https://api.github.com/repos/microsoft/markitdown/git/tags{/sha}",
"has_projects": true,
"organization": {
"id": 6154722,
"url": "https://api.github.com/users/microsoft",
"type": "Organization",
"login": "microsoft",
"node_id": "MDEyOk9yZ2FuaXphdGlvbjYxNTQ3MjI=",
"html_url": "https://github.com/microsoft",
"gists_url": "https://api.github.com/users/microsoft/gists{/gist_id}",
"repos_url": "https://api.github.com/users/microsoft/repos",
"avatar_url": "https://avatars.githubusercontent.com/u/6154722?v=4",
"events_url": "https://api.github.com/users/microsoft/events{/privacy}",
"site_admin": false,
"gravatar_id": "",
"starred_url": "https://api.github.com/users/microsoft/starred{/owner}{/repo}",
"followers_url": "https://api.github.com/users/microsoft/followers",
"following_url": "https://api.github.com/users/microsoft/following{/other_user}",
"user_view_type": "public",
"organizations_url": "https://api.github.com/users/microsoft/orgs",
"subscriptions_url": "https://api.github.com/users/microsoft/subscriptions",
"received_events_url": "https://api.github.com/users/microsoft/received_events"
},
"releases_url": "https://api.github.com/repos/microsoft/markitdown/releases{/id}",
"statuses_url": "https://api.github.com/repos/microsoft/markitdown/statuses/{sha}",
"allow_forking": true,
"assignees_url": "https://api.github.com/repos/microsoft/markitdown/assignees{/user}",
"downloads_url": "https://api.github.com/repos/microsoft/markitdown/downloads",
"has_downloads": true,
"languages_url": "https://api.github.com/repos/microsoft/markitdown/languages",
"network_count": 8733,
"default_branch": "main",
"milestones_url": "https://api.github.com/repos/microsoft/markitdown/milestones{/number}",
"stargazers_url": "https://api.github.com/repos/microsoft/markitdown/stargazers",
"watchers_count": 127613,
"deployments_url": "https://api.github.com/repos/microsoft/markitdown/deployments",
"git_commits_url": "https://api.github.com/repos/microsoft/markitdown/git/commits{/sha}",
"has_discussions": true,
"subscribers_url": "https://api.github.com/repos/microsoft/markitdown/subscribers",
"contributors_url": "https://api.github.com/repos/microsoft/markitdown/contributors",
"issue_events_url": "https://api.github.com/repos/microsoft/markitdown/issues/events{/number}",
"stargazers_count": 127613,
"subscription_url": "https://api.github.com/repos/microsoft/markitdown/subscription",
"temp_clone_token": null,
"collaborators_url": "https://api.github.com/repos/microsoft/markitdown/collaborators{/collaborator}",
"custom_properties": {
"activeRepoStatus": "false",
"global-rulesets-opt-out": "false"
},
"has_pull_requests": true,
"issue_comment_url": "https://api.github.com/repos/microsoft/markitdown/issues/comments{/number}",
"notifications_url": "https://api.github.com/repos/microsoft/markitdown/notifications{?since,all,participating}",
"open_issues_count": 671,
"subscribers_count": 436,
"web_commit_signoff_required": false,
"pull_request_creation_policy": "all"
},
"readme_raw": {
"summary": "MarkItDown [!IMPORTANT] MarkItDown performs I/O with the privileges of the current process. Like open() or requests.get(), it will access resources that the process itself can access. Sanitize your inputs in untrusted environments, and call the narrowest convert_ function needed for your use case (e.g., convert_stream() , or convert_local() ). See the Security Considerations section of the documentation for more information.",
"raw_text": "# MarkItDown\n\n[](https://pypi.org/project/markitdown/)\n\n[](https://github.com/microsoft/autogen)\n\n> [!IMPORTANT]\n> MarkItDown performs I/O with the privileges of the current process. Like open() or requests.get(), it will access resources that the process itself can access. Sanitize your inputs in untrusted environments, and call the narrowest `convert_*` function needed for your use case (e.g., `convert_stream()`, or `convert_local()`). See the [Security Considerations](#security-considerations) section of the documentation for more information.\n\nMarkItDown is a lightweight Python utility for converting various files to Markdown for use with LLMs and related text analysis pipelines. To this end, it is most comparable to [textract](https://github.com/deanmalmgren/textract), but with a focus on preserving important document structure and content as Markdown (including: headings, lists, tables, links, etc.) While the output is often reasonably presentable and human-friendly, it is meant to be consumed by text analysis tools -- and may not be the best option for high-fidelity document conversions for human consumption.\n\nMarkItDown currently supports the conversion from:\n\n- PDF\n- PowerPoint\n- Word\n- Excel\n- Images (EXIF metadata and OCR)\n- Audio (EXIF metadata and speech transcription)\n- HTML\n- Text-based formats (CSV, JSON, XML)\n- ZIP files (iterates over contents)\n- Youtube URLs\n- EPubs\n- ... and more!\n\n## Why Markdown?\n\nMarkdown is extremely close to plain text, with minimal markup or formatting, but still\nprovides a way to represent important document structure. Mainstream LLMs, such as\nOpenAI's GPT-4o, natively \"_speak_\" Markdown, and often incorporate Markdown into their\nresponses unprompted. This suggests that they have been trained on vast amounts of\nMarkdown-format",
"raw_text_truncated": true
},
"contributors_raw": {
"truncated": true,
"contributors": [
{
"type": "User",
"login": "afourney",
"html_url": "https://github.com/afourney",
"contributions": 102
},
{
"type": "User",
"login": "gagb",
"html_url": "https://github.com/gagb",
"contributions": 70
},
{
"type": "User",
"login": "sugatoray",
"html_url": "https://github.com/sugatoray",
"contributions": 9
},
{
"type": "User",
"login": "PetrAPConsulting",
"html_url": "https://github.com/PetrAPConsulting",
"contributions": 8
},
{
"type": "User",
"login": "l-lumin",
"html_url": "https://github.com/l-lumin",
"contributions": 7
},
{
"type": "User",
"login": "Josh-XT",
"html_url": "https://github.com/Josh-XT",
"contributions": 7
},
{
"type": "User",
"login": "Soulter",
"html_url": "https://github.com/Soulter",
"contributions": 6
},
{
"type": "User",
"login": "microsoftopensource",
"html_url": "https://github.com/microsoftopensource",
"contributions": 5
},
{
"type": "User",
"login": "lesyk",
"html_url": "https://github.com/lesyk",
"contributions": 5
},
{
"type": "Bot",
"login": "dependabot[bot]",
"html_url": "https://github.com/apps/dependabot",
"contributions": 4
}
]
},
"funding_raw": null,
"stats_raw": {
"forks_total": 8733,
"stars_today": 1263,
"stars_total": 127613,
"watchers_count": 127613,
"open_issues_count": 671
},
"aux_raw": {
"selected_fields": {
"topics": [
"autogen",
"autogen-extension",
"langchain",
"markdown",
"microsoft-office",
"openai",
"pdf"
],
"is_fork": false,
"license": "MIT",
"language": "Python",
"owner_type": "Organization",
"forks_total": 8733,
"has_funding": false,
"is_archived": false,
"owner_login": "microsoft",
"stars_today": 1263,
"stars_total": 127613,
"homepage_url": null,
"default_branch": "main",
"last_pushed_at": "2026-05-26T22:41:34Z",
"readme_summary": "MarkItDown [!IMPORTANT] MarkItDown performs I/O with the privileges of the current process. Like open() or requests.get(), it will access resources that the process itself can access. Sanitize your inputs in untrusted environments, and call the narrowest convert_ function needed for your use case (e.g., convert_stream() , or convert_local() ). See the Security Considerations section of the documentation for more information.",
"repo_full_name": "microsoft/markitdown",
"watchers_count": 127613,
"last_updated_at": "2026-05-28T21:58:23Z",
"top_contributors": [
{
"login": "afourney",
"contributions": 102
},
{
"login": "gagb",
"contributions": 70
},
{
"login": "sugatoray",
"contributions": 9
},
{
"login": "PetrAPConsulting",
"contributions": 8
},
{
"login": "l-lumin",
"contributions": 7
}
],
"contributor_count": 10,
"funding_platforms": [],
"open_issues_count": 671,
"days_since_created": 561,
"created_at_on_source": "2024-11-13T19:56:40Z",
"days_since_last_push": 1,
"top_contributor_share": 0.457
}
},
"selection_meta": {
"readme_status": "ok",
"funding_status": "missing",
"missing_enrichment": [
"funding"
],
"repo_detail_status": "ok",
"contributors_status": "ok"
},
"created_at": "2026-05-28T22:00:39.968Z",
"updated_at": "2026-05-28T22:00:39.968Z"
}