痛点为 AI 基于上游原始证据的初步提炼;未包含额外中国市场检索。
用户在处理办公文档(如PDF、Office文件)时,需要将其转换为Markdown格式以便用于AI工具链(如LangChain、AutoGen)或文档管理。现有流程中,手动转换耗时且容易出错,尤其是复杂格式(表格、图片、公式)的保留困难,导致转换后内容丢失或格式混乱,需要人工反复校对和修复。这造成了时间浪费和协作成本,因为团队成员需要额外沟通来确认转换结果的准确性,且无法保证质量一致性。GitHub上该项目的高星数和大量fork(13万+星、9k+ fork)表明,许多开发者面临这一痛点,但具体行业和付费意愿仍需进一步验证。
README summary
MarkItDown [!IMPORTANT] MarkItDown performs I/O with the privileges of the current process. Like open() or requests.get(), it will access resources that the process itself can access. Sanitize your inputs in untrusted environments, and call the narrowest convert_ function needed for your use case (e.g., convert_stream() , or convert_local() ). See the Security Considerations section of the documentation for more information.
Repository metadata
- Owner
- microsoft
- License
- MIT
- Default branch
- main
- Days since created
- 563
- Days since last push
- 3
Repository signals
- Watchers
- 132,154
- Open issues
- 763
Contributor snapshot
- Contributor count
- 10
- Top contributor share
- 0.457
- Top contributors
- afourney, gagb, sugatoray, PetrAPConsulting, l-lumin
源数据· Raw Archive
- source
- GitHub Trending
- upstream_source
- github_trending
- upstream_item_id
- microsoft--markitdown
- daily_ranking_item_id
- 796269b6-b669-471f-9611-ebcab5563c7f
- rank_date
- 2026-05-31
- rank
- 1
- name
- markitdown
- tagline
- Python tool for converting files and office documents to Markdown.
- description
- Python tool for converting files and office documents to Markdown.
- votes_count
- 132,154
- source_url
- https://github.com/microsoft/markitdown
- thumbnail_url
- https://github.com/microsoft.png
- og_image_url
- https://github.com/microsoft.png
{
"topics": [
"autogen",
"autogen-extension",
"langchain",
"markdown",
"microsoft-office",
"openai",
"pdf"
],
"is_fork": false,
"license": "MIT",
"language": "Python",
"owner_type": "Organization",
"forks_total": 9049,
"has_funding": false,
"is_archived": false,
"owner_login": "microsoft",
"stars_today": 2473,
"stars_total": 132154,
"homepage_url": null,
"default_branch": "main",
"last_pushed_at": "2026-05-26T22:41:34Z",
"readme_summary": "MarkItDown [!IMPORTANT] MarkItDown performs I/O with the privileges of the current process. Like open() or requests.get(), it will access resources that the process itself can access. Sanitize your inputs in untrusted environments, and call the narrowest convert_ function needed for your use case (e.g., convert_stream() , or convert_local() ). See the Security Considerations section of the documentation for more information.",
"repo_full_name": "microsoft/markitdown",
"watchers_count": 132154,
"last_updated_at": "2026-05-30T21:57:36Z",
"top_contributors": [
{
"login": "afourney",
"contributions": 102
},
{
"login": "gagb",
"contributions": 70
},
{
"login": "sugatoray",
"contributions": 9
},
{
"login": "PetrAPConsulting",
"contributions": 8
},
{
"login": "l-lumin",
"contributions": 7
}
],
"contributor_count": 10,
"funding_platforms": [],
"open_issues_count": 763,
"days_since_created": 563,
"created_at_on_source": "2024-11-13T19:56:40Z",
"days_since_last_push": 3,
"top_contributor_share": 0.457
}{
"fetched_at": "2026-05-30T22:00:32.717Z",
"trending_repo": {
"url": "https://github.com/microsoft/markitdown",
"name": "markitdown",
"rank": 1,
"forks": 9049,
"owner": "microsoft",
"stars": 132154,
"fullName": "microsoft/markitdown",
"language": "Python",
"avatarUrl": "https://github.com/microsoft.png",
"rawSummary": "<div class=\"float-right d-flex\">\n\n <div data-view-component=\"true\" class=\"BtnGroup d-flex\">\n <a href=\"/login?return_to=%2Fmicrosoft%2Fmarkitdown\" rel=\"nofollow\" data-hydro-click=\"{"event_type":"authentication.click","payload":{"location_in_page":"star button","repository_id":888092115,"auth_type":"LOG_IN","originating_url":"https://github.com/trending?since=daily","user_id":null}}\" data-hydro-click-hmac=\"84542ae632d33c4b7f6eaedda7f4dc37ae6e1dd6e19f8c0a560212cc2398eb35\" aria-label=\"You must be signed in to star a repository\" data-view-component=\"true\" class=\"tooltipped tooltipped-sw btn-sm btn\"> <svg aria-hidden=\"true\" data-component=\"Octicon\" height=\"16\" viewBox=\"0 0 16 16\" version=\"1.1\" width=\"16\" data-view-component=\"true\" class=\"octicon octicon-star v-align-text-bottom d-none d-md-inline-block mr-2 tmp-mr-2\">\n <path d=\"M8 .25a.75.75 0 0 1 .673.418l1.882 3.815 4.21.612a.75.75 0 0 1 .416 1.279l-3.046 2.97.719 4.192a.751.751 0 0 1-1.088.791L8 12.347l-3.766 1.98a.75.75 0 0 1-1.088-.79l.72-4.194L.818 6.374a.75.75 0 0 1 .416-1.28l4.21-.611L7.327.668A.75.75 0 0 1 8 .25Zm0 2.445L6.615 5.5a.75.75 0 0 1-.564.41l-3.097.45 2.24 2.184a.75.75 0 0 1 .216.664l-.528 3.084 2.769-1.456a.75.75 0 0 1 .698 0l2.77 1.456-.53-3.084a.75.75 0 0 1 .216-.664l2.24-2.183-3.096-.45a.75.75 0 0 1-.564-.41L8 2.694Z\"></path>\n</svg><svg aria-hidden=\"true\" data-component=\"Octicon\" height=\"16\" viewBox=\"0 0 16 16\" version=\"1.1\" width=\"16\" data-view-component=\"true\" class=\"octicon octicon-star mr-0 tmp-mr-0 v-align-text-bottom d-inline-block d-md-none\">\n <path d=\"M8 .25a.75.75 0 0 1 .673.418l1.882 3.815 4.21.612a.75.75 0 0 1 .416 1.279l-3.046 2.97.719 4.192a.751.751 0 0 1-1.088.791L8 12.347l-3.766 1.98a.75.75 0 0 1-1.088-.79l.72-4.194L.818 6.374a.75.75 0 0 1 .416-1.28l4.21-.611L7.327.668A.75.75 0 0 1 8 .25Zm0 2.445L6.615 5.5a.75.75 0 0 1-.564.41l-3.097.45 2.24 2.184a.75.75 0 0 1 .216.664l-.528 3.084 2.769-1.456a.75.75 0 0 1 .698 0l2.77 1.456-.53-3.084a.75.75 0 0 1 .216-.664l2.24-2.183-3.096-.45a.75.75 0 0 1-.564-.41L8 2.694Z\"></path>\n</svg>\n <span data-view-component=\"true\" class=\"d-none d-md-inline\">\n Star\n</span>\n</a></div>\n </div>\n\n <h2 class=\"h3 lh-condensed\">\n <a data-hydro-click=\"{"event_type":"explore.click","payload":{"click_context":"TRENDING_REPOSITORIES_PAGE","click_target":"REPOSITORY","click_visual_representation":"REPOSITORY_NAME_HEADING","actor_id":null,"record_id":888092115,"originating_url":"https://github.com/trending?since=daily","user_id":null}}\" data-hydro-click-hmac=\"e56d7df19306f538e8b48eb101bc899cc5a0b7f2f15a3c2ec0ace84fd5732a36\" href=\"/microsoft/markitdown\" data-view-component=\"true\" class=\"Link\"><svg aria-hidden=\"true\" data-component=\"Octicon\" height=\"16\" viewBox=\"0 0 16 16\" version=\"1.1\" width=\"16\" data-view-component=\"true\" class=\"octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted\">\n <path d=\"M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z\"></path>\n</svg>\n\n <span data-view-component=\"true\" class=\"text-normal\">\n microsoft /\n</span>\n markitdown</a> </h2>\n\n <p class=\"col-9 color-fg-muted my-1 tmp-pr-4\">\n Python tool for converting files and office documents to Markdown.\n </p>\n\n <div class=\"f6 color-fg-muted mt-2\">\n <span class=\"tmp-mr-3 d-inline-block ml-0 tmp-ml-0\">\n <span class=\"repo-language-color\" style=\"background-color: #3572A5\"></span>\n <span itemprop=\"programmingLanguage\">Python</span>\n</span>\n\n\n <a href=\"/microsoft/markitdown/stargazers\" data-view-component=\"true\" class=\"tmp-mr-3 Link Link--muted d-inline-block\"><svg aria-label=\"star\" role=\"img\" data-component=\"Octicon\" height=\"16\" viewBox=\"0 0 16 16\" version=\"1.1\" width=\"16\" data-view-component=\"true\" class=\"octicon octicon-star\">\n <path d=\"M8 .25a.75.75 0 0 1 .673.418l1.882 3.815 4.21.612a.75.75 0 0 1 .416 1.279l-3.046 2.97.719 4.192a.751.751 0 0 1-1.088.791L8 12.347l-3.766 1.98a.75.75 0 0 1-1.088-.79l.72-4.194L.818 6.374a.75.75 0 0 1 .416-1.28l4.21-.611L7.327.668A.75.75 0 0 1 8 .25Zm0 2.445L6.615 5.5a.75.75 0 0 1-.564.41l-3.097.45 2.24 2.184a.75.75 0 0 1 .216.664l-.528 3.084 2.769-1.456a.75.75 0 0 1 .698 0l2.77 1.456-.53-3.084a.75.75 0 0 1 .216-.664l2.24-2.183-3.096-.45a.75.75 0 0 1-.564-.41L8 2.694Z\"></path>\n</svg>\n 132,154</a>\n <a href=\"/microsoft/markitdown/forks\" data-view-component=\"true\" class=\"tmp-mr-3 Link Link--muted d-inline-block\"><svg aria-label=\"fork\" role=\"img\" data-component=\"Octicon\" height=\"16\" viewBox=\"0 0 16 16\" version=\"1.1\" width=\"16\" data-view-component=\"true\" class=\"octicon octicon-repo-forked\">\n <path d=\"M5 5.372v.878c0 .414.336.75.75.75h4.5a.75.75 0 0 0 .75-.75v-.878a2.25 2.25 0 1 1 1.5 0v.878a2.25 2.25 0 0 1-2.25 2.25h-1.5v2.128a2.251 2.251 0 1 1-1.5 0V8.5h-1.5A2.25 2.25 0 0 1 3.5 6.25v-.878a2.25 2.25 0 1 1 1.5 0ZM5 3.25a.75.75 0 1 0-1.5 0 .75.75 0 0 0 1.5 0Zm6.75.75a.75.75 0 1 0 0-1.5.75.75 0 0 0 0 1.5Zm-3 8.75a.75.75 0 1 0-1.5 0 .75.75 0 0 0 1.5 0Z\"></path>\n</svg>\n 9,049</a>\n <span data-view-component=\"true\" class=\"tmp-mr-3 d-inline-block\">\n Built by\n\n <a class=\"d-inline-block\" data-hydro-click=\"{"event_type":"explore.click","payload":{"click_context":"TRENDING_REPOSITORIES_PAGE","click_target":"CONTRIBUTING_DEVELOPER","click_visual_representation":"DEVELOPER_AVATAR","actor_id":null,"record_id":null,"originating_url":"https://github.com/trending","user_id":244300126}}\" data-hydro-click-hmac=\"d22c3c67743bbc19e136eee29245a45b41787c9f51edcba55b12fa2b8c939c77\" data-hovercard-type=\"user\" data-hovercard-url=\"/users/afourney/hovercard\" data-octo-click=\"hovercard-link-click\" data-octo-dimensions=\"link_type:self\" href=\"/afourney\"><img class=\"avatar mb-1 avatar-user\" src=\"https://avatars.githubusercontent.com/u/4017093?s=40&v=4\" width=\"20\" height=\"20\" alt=\"@afourney\" /></a>\n <a class=\"d-inline-block\" data-hydro-click=\"{"event_type":"explore.click","payload":{"click_context":"TRENDING_REPOSITORIES_PAGE","click_target":"CONTRIBUTING_DEVELOPER","click_visual_representation":"DEVELOPER_AVATAR","actor_id":null,"record_id":null,"originating_url":"https://github.com/trending","user_id":244300126}}\" data-hydro-click-hmac=\"d22c3c67743bbc19e136eee29245a45b41787c9f51edcba55b12fa2b8c939c77\" data-hovercard-type=\"user\" data-hovercard-url=\"/users/gagb/hovercard\" data-octo-click=\"hovercard-link-click\" data-octo-dimensions=\"link_type:self\" href=\"/gagb\"><img class=\"avatar mb-1 avatar-user\" src=\"https://avatars.githubusercontent.com/u/13227607?s=40&v=4\" width=\"20\" height=\"20\" alt=\"@gagb\" /></a>\n <a class=\"d-inline-block\" data-hydro-click=\"{"event_type":"explore.click","payload":{"click_context":"TRENDING_REPOSITORIES_PAGE","click_target":"CONTRIBUTING_DEVELOPER","click_visual_representation":"DEVELOPER_AVATAR","actor_id":null,"record_id":null,"originating_url":"https://github.com/trending","user_id":244300126}}\" data-hydro-click-hmac=\"d22c3c67743bbc19e136eee29245a45b41787c9f51edcba55b12fa2b8c939c77\" data-hovercard-type=\"user\" data-hovercard-url=\"/users/sugatoray/hovercard\" data-octo-click=\"hovercard-link-click\" data-octo-dimensions=\"link_type:self\" href=\"/sugatoray\"><img class=\"avatar mb-1 avatar-user\" src=\"https://avatars.githubusercontent.com/u/10201242?s=40&v=4\" width=\"20\" height=\"20\" alt=\"@sugatoray\" /></a>\n <a class=\"d-inline-block\" data-hydro-click=\"{"event_type":"explore.click","payload":{"click_context":"TRENDING_REPOSITORIES_PAGE","click_target":"CONTRIBUTING_DEVELOPER","click_visual_representation":"DEVELOPER_AVATAR","actor_id":null,"record_id":null,"originating_url":"https://github.com/trending","user_id":244300126}}\" data-hydro-click-hmac=\"d22c3c67743bbc19e136eee29245a45b41787c9f51edcba55b12fa2b8c939c77\" data-hovercard-type=\"user\" data-hovercard-url=\"/users/PetrAPConsulting/hovercard\" data-octo-click=\"hovercard-link-click\" data-octo-dimensions=\"link_type:self\" href=\"/PetrAPConsulting\"><img class=\"avatar mb-1 avatar-user\" src=\"https://avatars.githubusercontent.com/u/173082609?s=40&v=4\" width=\"20\" height=\"20\" alt=\"@PetrAPConsulting\" /></a>\n <a class=\"d-inline-block\" data-hydro-click=\"{"event_type":"explore.click","payload":{"click_context":"TRENDING_REPOSITORIES_PAGE","click_target":"CONTRIBUTING_DEVELOPER","click_visual_representation":"DEVELOPER_AVATAR","actor_id":null,"record_id":null,"originating_url":"https://github.com/trending","user_id":244300126}}\" data-hydro-click-hmac=\"d22c3c67743bbc19e136eee29245a45b41787c9f51edcba55b12fa2b8c939c77\" data-hovercard-type=\"user\" data-hovercard-url=\"/users/l-lumin/hovercard\" data-octo-click=\"hovercard-link-click\" data-octo-dimensions=\"link_type:self\" href=\"/l-lumin\"><img class=\"avatar mb-1 avatar-user\" src=\"https://avatars.githubusercontent.com/u/71011125?s=40&v=4\" width=\"20\" height=\"20\" alt=\"@l-lumin\" /></a>\n</span>\n <span data-view-component=\"true\" class=\"d-inline-block float-sm-right\">\n <svg aria-hidden=\"true\" data-component=\"Octicon\" height=\"16\" viewBox=\"0 0 16 16\" version=\"1.1\" width=\"16\" data-view-component=\"true\" class=\"octicon octicon-star\">\n <path d=\"M8 .25a.75.75 0 0 1 .673.418l1.882 3.815 4.21.612a.75.75 0 0 1 .416 1.279l-3.046 2.97.719 4.192a.751.751 0 0 1-1.088.791L8 12.347l-3.766 1.98a.75.75 0 0 1-1.088-.79l.72-4.194L.818 6.374a.75.75 0 0 1 .416-1.28l4.21-.611L7.327.668A.75.75 0 0 1 8 .25Zm0 2.445L6.615 5.5a.75.75 0 0 1-.564.41l-3.097.45 2.24 2.184a.75.75 0 0 1 .216.664l-.528 3.084 2.769-1.456a.75.75 0 0 1 .698 0l2.77 1.456-.53-3.084a.75.75 0 0 1 .216-.664l2.24-2.183-3.096-.45a.75.75 0 0 1-.564-.41L8 2.694Z\"></path>\n</svg>\n 2,473 stars today\n</span> </div>",
"starsToday": 2473,
"description": "Python tool for converting files and office documents to Markdown."
},
"snapshot_version": "github_trending_v2"
}{
"id": "05752eb2-7440-45f0-9503-341905502f02",
"daily_ranking_item_id": "796269b6-b669-471f-9611-ebcab5563c7f",
"source": "github_trending",
"external_id": "microsoft--markitdown",
"fetched_at": "2026-05-30T22:00:32.717Z",
"trending_page_raw": {
"url": "https://github.com/microsoft/markitdown",
"name": "markitdown",
"rank": 1,
"forks": 9049,
"owner": "microsoft",
"stars": 132154,
"language": "Python",
"full_name": "microsoft/markitdown",
"avatar_url": "https://github.com/microsoft.png",
"description": "Python tool for converting files and office documents to Markdown.",
"raw_summary": "<div class=\"float-right d-flex\">\n\n <div data-view-component=\"true\" class=\"BtnGroup d-flex\">\n <a href=\"/login?return_to=%2Fmicrosoft%2Fmarkitdown\" rel=\"nofollow\" data-hydro-click=\"{"event_type":"authentication.click","payload":{"location_in_page":"star button","repository_id":888092115,"auth_type":"LOG_IN","originating_url":"https://github.com/trending?since=daily","user_id":null}}\" data-hydro-click-hmac=\"84542ae632d33c4b7f6eaedda7f4dc37ae6e1dd6e19f8c0a560212cc2398eb35\" aria-label=\"You must be signed in to star a repository\" data-view-component=\"true\" class=\"tooltipped tooltipped-sw btn-sm btn\"> <svg aria-hidden=\"true\" data-component=\"Octicon\" height=\"16\" viewBox=\"0 0 16 16\" version=\"1.1\" width=\"16\" data-view-component=\"true\" class=\"octicon octicon-star v-align-text-bottom d-none d-md-inline-block mr-2 tmp-mr-2\">\n <path d=\"M8 .25a.75.75 0 0 1 .673.418l1.882 3.815 4.21.612a.75.75 0 0 1 .416 1.279l-3.046 2.97.719 4.192a.751.751 0 0 1-1.088.791L8 12.347l-3.766 1.98a.75.75 0 0 1-1.088-.79l.72-4.194L.818 6.374a.75.75 0 0 1 .416-1.28l4.21-.611L7.327.668A.75.75 0 0 1 8 .25Zm0 2.445L6.615 5.5a.75.75 0 0 1-.564.41l-3.097.45 2.24 2.184a.75.75 0 0 1 .216.664l-.528 3.084 2.769-1.456a.75.75 0 0 1 .698 0l2.77 1.456-.53-3.084a.75.75 0 0 1 .216-.664l2.24-2.183-3.096-.45a.75.75 0 0 1-.564-.41L8 2.694Z\"></path>\n</svg><svg aria-hidden=\"true\" data-component=\"Octicon\" height=\"16\" viewBox=\"0 0 16 16\" version=\"1.1\" width=\"16\" data-view-component=\"true\" class=\"octicon octicon-star mr-0 tmp-mr-0 v-align-text-bottom d-inline-block d-md-none\">\n <path d=\"M8 .25a.75.75 0 0 1 .673.418l1.882 3.815 4.21.612a.75.75 0 0 1 .416 1.279l-3.046 2.97.719 4.192a.751.751 0 0 1-1.088.791L8 12.347l-3.766 1.98a.75.75 0 0 1-1.088-.79l.72-4.194L.818 6.374a.75.75 0 0 1 .416-1.28l4.21-.611L7.327.668A.75.75 0 0 1 8 .25Zm0 2.445L6.615 5.5a.75.75 0 0 1-.564.41l-3.097.45 2.24 2.184a.75.75 0 0 1 .216.664l-.528 3.084 2.769-1.456a.75.75 0 0 1 .698 0l2.77 1.456-.53-3.084a.75.75 0 0 1 .216-.664l2.24-2.183-3.096-.45a.75.75 0 0 1-.564-.41L8 2.694Z\"></path>\n</svg>\n <span data-view-component=\"true\" class=\"d-none d-md-inline\">\n Star\n</span>\n</a></div>\n </div>\n\n <h2 class=\"h3 lh-condensed\">\n <a data-hydro-click=\"{"event_type":"explore.click","payload":{"click_context":"TRENDING_REPOSITORIES_PAGE","click_target":"REPOSITORY","click_visual_representation":"REPOSITORY_NAME_HEADING","actor_id":null,"record_id":888092115,"originating_url":"https://github.com/trending?since=daily","user_id":null}}\" data-hydro-click-hmac=\"e56d7df19306f538e8b48eb101bc899cc5a0b7f2f15a3c2ec0ace84fd5732a36\" href=\"/microsoft/markitdown\" data-view-component=\"true\" class=\"Link\"><svg aria-hidden=\"true\" data-component=\"Octicon\" height=\"16\" viewBox=\"0 0 16 16\" version=\"1.1\" width=\"16\" data-view-component=\"true\" class=\"octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted\">\n <path d=\"M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z\"></path>\n</svg>\n\n <span data-view-component=\"true\" class=\"text-normal\">\n microsoft /\n</span>\n markitdown</a> </h2>\n\n <p class=\"col-9 color-fg-muted my-1 tmp-pr-4\">\n Python tool for converting files and office documents to Markdown.\n </p>\n\n <div class=\"f6 color-fg-muted mt-2\">\n <span class=\"tmp-mr-3 d-inline-block ml-0 tmp-ml-0\">\n <span class=\"repo-language-color\" style=\"background-color: #3572A5\"></span>\n <span itemprop=\"programmingLanguage\">Python</span>\n</span>\n\n\n <a href=\"/microsoft/markitdown/stargazers\" data-view-component=\"true\" class=\"tmp-mr-3 Link Link--muted d-inline-block\"><svg aria-label=\"star\" role=\"img\" data-component=\"Octicon\" height=\"16\" viewBox=\"0 0 16 16\" version=\"1.1\" width=\"16\" data-view-component=\"true\" class=\"octicon octicon-star\">\n <path d=\"M8 .25a.75.75 0 0 1 .673.418l1.882 3.815 4.21.612a.75.75 0 0 1 .416 1.279l-3.046 2.97.719 4.192a.751.751 0 0 1-1.088.791L8 12.347l-3.766 1.98a.75.75 0 0 1-1.088-.79l.72-4.194L.818 6.374a.75.75 0 0 1 .416-1.28l4.21-.611L7.327.668A.75.75 0 0 1 8 .25Zm0 2.445L6.615 5.5a.75.75 0 0 1-.564.41l-3.097.45 2.24 2.184a.75.75 0 0 1 .216.664l-.528 3.084 2.769-1.456a.75.75 0 0 1 .698 0l2.77 1.456-.53-3.084a.75.75 0 0 1 .216-.664l2.24-2.183-3.096-.45a.75.75 0 0 1-.564-.41L8 2.694Z\"></path>\n</svg>\n 132,154</a>\n <a href=\"/microsoft/markitdown/forks\" data-view-component=\"true\" class=\"tmp-mr-3 Link Link--muted d-inline-block\"><svg aria-label=\"fork\" role=\"img\" data-component=\"Octicon\" height=\"16\" viewBox=\"0 0 16 16\" version=\"1.1\" width=\"16\" data-view-component=\"true\" class=\"octicon octicon-repo-forked\">\n <path d=\"M5 5.372v.878c0 .414.336.75.75.75h4.5a.75.75 0 0 0 .75-.75v-.878a2.25 2.25 0 1 1 1.5 0v.878a2.25 2.25 0 0 1-2.25 2.25h-1.5v2.128a2.251 2.251 0 1 1-1.5 0V8.5h-1.5A2.25 2.25 0 0 1 3.5 6.25v-.878a2.25 2.25 0 1 1 1.5 0ZM5 3.25a.75.75 0 1 0-1.5 0 .75.75 0 0 0 1.5 0Zm6.75.75a.75.75 0 1 0 0-1.5.75.75 0 0 0 0 1.5Zm-3 8.75a.75.75 0 1 0-1.5 0 .75.75 0 0 0 1.5 0Z\"></path>\n</svg>\n 9,049</a>\n <span data-view-component=\"true\" class=\"tmp-mr-3 d-inline-block\">\n Built by\n\n <a class=\"d-inline-block\" data-hydro-click=\"{"event_type":"explore.click","payload":{"click_context":"TRENDING_REPOSITORIES_PAGE","click_target":"CONTRIBUTING_DEVELOPER","click_visual_representation":"DEVELOPER_AVATAR","actor_id":null,"record_id":null,"originating_url":"https://github.com/trending","user_id":244300126}}\" data-hydro-click-hmac=\"d22c3c67743bbc19e136eee29245a45b41787c9f51edcba55b12fa2b8c939c77\" data-hovercard-type=\"user\" data-hovercard-url=\"/users/afourney/hovercard\" data-octo-click=\"hovercard-link-click\" data-octo-dimensions=\"link_type:self\" href=\"/afourney\"><img class=\"avatar mb-1 avatar-user\" src=\"https://avatars.githubusercontent.com/u/4017093?s=40&v=4\" width=\"20\" height=\"20\" alt=\"@afourney\" /></a>\n <a class=\"d-inline-block\" data-hydro-click=\"{"event_type":"explore.click","payload":{"click_context":"TRENDING_REPOSITORIES_PAGE","click_target":"CONTRIBUTING_DEVELOPER","click_visual_representation":"DEVELOPER_AVATAR","actor_id":null,"record_id":null,"originating_url":"https://github.com/trending","user_id":244300126}}\" data-hydro-click-hmac=\"d22c3c67743bbc19e136eee29245a45b41787c9f51edcba55b12fa2b8c939c77\" data-hovercard-type=\"user\" data-hovercard-url=\"/users/gagb/hovercard\" data-octo-click=\"hovercard-link-click\" data-octo-dimensions=\"link_type:self\" href=\"/gagb\"><img class=\"avatar mb-1 avatar-user\" src=\"https://avatars.githubusercontent.com/u/13227607?s=40&v=4\" width=\"20\" height=\"20\" alt=\"@gagb\" /></a>\n <a class=\"d-inline-block\" data-hydro-click=\"{"event_type":"explore.click","payload":{"click_context":"TRENDING_REPOSITORIES_PAGE","click_target":"CONTRIBUTING_DEVELOPER","click_visual_representation":"DEVELOPER_AVATAR","actor_id":null,"record_id":null,"originating_url":"https://github.com/trending","user_id":244300126}}\" data-hydro-click-hmac=\"d22c3c67743bbc19e136eee29245a45b41787c9f51edcba55b12fa2b8c939c77\" data-hovercard-type=\"user\" data-hovercard-url=\"/users/sugatoray/hovercard\" data-octo-click=\"hovercard-link-click\" data-octo-dimensions=\"link_type:self\" href=\"/sugatoray\"><img class=\"avatar mb-1 avatar-user\" src=\"https://avatars.githubusercontent.com/u/10201242?s=40&v=4\" width=\"20\" height=\"20\" alt=\"@sugatoray\" /></a>\n <a class=\"d-inline-block\" data-hydro-click=\"{"event_type":"explore.click","payload":{"click_context":"TRENDING_REPOSITORIES_PAGE","click_target":"CONTRIBUTING_DEVELOPER","click_visual_representation":"DEVELOPER_AVATAR","actor_id":null,"record_id":null,"originating_url":"https://github.com/trending","user_id":244300126}}\" data-hydro-click-hmac=\"d22c3c67743bbc19e136eee29245a45b41787c9f51edcba55b12fa2b8c939c77\" data-hovercard-type=\"user\" data-hovercard-url=\"/users/PetrAPConsulting/hovercard\" data-octo-click=\"hovercard-link-click\" data-octo-dimensions=\"link_type:self\" href=\"/PetrAPConsulting\"><img class=\"avatar mb-1 avatar-user\" src=\"https://avatars.githubusercontent.com/u/173082609?s=40&v=4\" width=\"20\" height=\"20\" alt=\"@PetrAPConsulting\" /></a>\n <a class=\"d-inline-block\" data-hydro-click=\"{"event_type":"explore.click","payload":{"click_context":"TRENDING_REPOSITORIES_PAGE","click_target":"CONTRIBUTING_DEVELOPER","click_visual_representation":"DEVELOPER_AVATAR","actor_id":null,"record_id":null,"originating_url":"https://github.com/trending","user_id":244300126}}\" data-hydro-click-hmac=\"d22c3c67743bbc19e136eee29245a45b41787c9f51edcba55b12fa2b8c939c77\" data-hovercard-type=\"user\" data-hovercard-url=\"/users/l-lumin/hovercard\" data-octo-click=\"hovercard-link-click\" data-octo-dimensions=\"link_type:self\" href=\"/l-lumin\"><img class=\"avatar mb-1 avatar-user\" src=\"https://avatars.githubusercontent.com/u/71011125?s=40&v=4\" width=\"20\" height=\"20\" alt=\"@l-lumin\" /></a>\n</span>\n <span data-view-component=\"true\" class=\"d-inline-block float-sm-right\">\n <svg aria-hidden=\"true\" data-component=\"Octicon\" height=\"16\" viewBox=\"0 0 16 16\" version=\"1.1\" width=\"16\" data-view-component=\"true\" class=\"octicon octicon-star\">\n <path d=\"M8 .25a.75.75 0 0 1 .673.418l1.882 3.815 4.21.612a.75.75 0 0 1 .416 1.279l-3.046 2.97.719 4.192a.751.751 0 0 1-1.088.791L8 12.347l-3.766 1.98a.75.75 0 0 1-1.088-.79l.72-4.194L.818 6.374a.75.75 0 0 1 .416-1.28l4.21-.611L7.327.668A.75.75 0 0 1 8 .25Zm0 2.445L6.615 5.5a.75.75 0 0 1-.564.41l-3.097.45 2.24 2.184a.75.75 0 0 1 .216.664l-.528 3.084 2.769-1.456a.75.75 0 0 1 .698 0l2.77 1.456-.53-3.084a.75.75 0 0 1 .216-.664l2.24-2.183-3.096-.45a.75.75 0 0 1-.564-.41L8 2.694Z\"></path>\n</svg>\n 2,473 stars today\n</span> </div>",
"stars_today": 2473
},
"repo_detail_raw": {
"id": 888092115,
"url": "https://api.github.com/repos/microsoft/markitdown",
"fork": false,
"name": "markitdown",
"size": 4285,
"forks": 9049,
"owner": {
"id": 6154722,
"url": "https://api.github.com/users/microsoft",
"type": "Organization",
"login": "microsoft",
"node_id": "MDEyOk9yZ2FuaXphdGlvbjYxNTQ3MjI=",
"html_url": "https://github.com/microsoft",
"gists_url": "https://api.github.com/users/microsoft/gists{/gist_id}",
"repos_url": "https://api.github.com/users/microsoft/repos",
"avatar_url": "https://avatars.githubusercontent.com/u/6154722?v=4",
"events_url": "https://api.github.com/users/microsoft/events{/privacy}",
"site_admin": false,
"gravatar_id": "",
"starred_url": "https://api.github.com/users/microsoft/starred{/owner}{/repo}",
"followers_url": "https://api.github.com/users/microsoft/followers",
"following_url": "https://api.github.com/users/microsoft/following{/other_user}",
"user_view_type": "public",
"organizations_url": "https://api.github.com/users/microsoft/orgs",
"subscriptions_url": "https://api.github.com/users/microsoft/subscriptions",
"received_events_url": "https://api.github.com/users/microsoft/received_events"
},
"topics": [
"autogen",
"autogen-extension",
"langchain",
"markdown",
"microsoft-office",
"openai",
"pdf"
],
"git_url": "git://github.com/microsoft/markitdown.git",
"license": {
"key": "mit",
"url": "https://api.github.com/licenses/mit",
"name": "MIT License",
"node_id": "MDc6TGljZW5zZTEz",
"spdx_id": "MIT"
},
"node_id": "R_kgDONO810w",
"private": false,
"ssh_url": "git@github.com:microsoft/markitdown.git",
"svn_url": "https://github.com/microsoft/markitdown",
"archived": false,
"disabled": false,
"has_wiki": true,
"homepage": "",
"html_url": "https://github.com/microsoft/markitdown",
"keys_url": "https://api.github.com/repos/microsoft/markitdown/keys{/key_id}",
"language": "Python",
"tags_url": "https://api.github.com/repos/microsoft/markitdown/tags",
"watchers": 132154,
"blobs_url": "https://api.github.com/repos/microsoft/markitdown/git/blobs{/sha}",
"clone_url": "https://github.com/microsoft/markitdown.git",
"forks_url": "https://api.github.com/repos/microsoft/markitdown/forks",
"full_name": "microsoft/markitdown",
"has_pages": false,
"hooks_url": "https://api.github.com/repos/microsoft/markitdown/hooks",
"pulls_url": "https://api.github.com/repos/microsoft/markitdown/pulls{/number}",
"pushed_at": "2026-05-26T22:41:34Z",
"teams_url": "https://api.github.com/repos/microsoft/markitdown/teams",
"trees_url": "https://api.github.com/repos/microsoft/markitdown/git/trees{/sha}",
"created_at": "2024-11-13T19:56:40Z",
"events_url": "https://api.github.com/repos/microsoft/markitdown/events",
"has_issues": true,
"issues_url": "https://api.github.com/repos/microsoft/markitdown/issues{/number}",
"labels_url": "https://api.github.com/repos/microsoft/markitdown/labels{/name}",
"merges_url": "https://api.github.com/repos/microsoft/markitdown/merges",
"mirror_url": null,
"updated_at": "2026-05-30T21:57:36Z",
"visibility": "public",
"archive_url": "https://api.github.com/repos/microsoft/markitdown/{archive_format}{/ref}",
"commits_url": "https://api.github.com/repos/microsoft/markitdown/commits{/sha}",
"compare_url": "https://api.github.com/repos/microsoft/markitdown/compare/{base}...{head}",
"description": "Python tool for converting files and office documents to Markdown.",
"forks_count": 9049,
"is_template": false,
"open_issues": 763,
"branches_url": "https://api.github.com/repos/microsoft/markitdown/branches{/branch}",
"comments_url": "https://api.github.com/repos/microsoft/markitdown/comments{/number}",
"contents_url": "https://api.github.com/repos/microsoft/markitdown/contents/{+path}",
"git_refs_url": "https://api.github.com/repos/microsoft/markitdown/git/refs{/sha}",
"git_tags_url": "https://api.github.com/repos/microsoft/markitdown/git/tags{/sha}",
"has_projects": true,
"organization": {
"id": 6154722,
"url": "https://api.github.com/users/microsoft",
"type": "Organization",
"login": "microsoft",
"node_id": "MDEyOk9yZ2FuaXphdGlvbjYxNTQ3MjI=",
"html_url": "https://github.com/microsoft",
"gists_url": "https://api.github.com/users/microsoft/gists{/gist_id}",
"repos_url": "https://api.github.com/users/microsoft/repos",
"avatar_url": "https://avatars.githubusercontent.com/u/6154722?v=4",
"events_url": "https://api.github.com/users/microsoft/events{/privacy}",
"site_admin": false,
"gravatar_id": "",
"starred_url": "https://api.github.com/users/microsoft/starred{/owner}{/repo}",
"followers_url": "https://api.github.com/users/microsoft/followers",
"following_url": "https://api.github.com/users/microsoft/following{/other_user}",
"user_view_type": "public",
"organizations_url": "https://api.github.com/users/microsoft/orgs",
"subscriptions_url": "https://api.github.com/users/microsoft/subscriptions",
"received_events_url": "https://api.github.com/users/microsoft/received_events"
},
"releases_url": "https://api.github.com/repos/microsoft/markitdown/releases{/id}",
"statuses_url": "https://api.github.com/repos/microsoft/markitdown/statuses/{sha}",
"allow_forking": true,
"assignees_url": "https://api.github.com/repos/microsoft/markitdown/assignees{/user}",
"downloads_url": "https://api.github.com/repos/microsoft/markitdown/downloads",
"has_downloads": true,
"languages_url": "https://api.github.com/repos/microsoft/markitdown/languages",
"network_count": 9049,
"default_branch": "main",
"milestones_url": "https://api.github.com/repos/microsoft/markitdown/milestones{/number}",
"stargazers_url": "https://api.github.com/repos/microsoft/markitdown/stargazers",
"watchers_count": 132154,
"deployments_url": "https://api.github.com/repos/microsoft/markitdown/deployments",
"git_commits_url": "https://api.github.com/repos/microsoft/markitdown/git/commits{/sha}",
"has_discussions": true,
"subscribers_url": "https://api.github.com/repos/microsoft/markitdown/subscribers",
"contributors_url": "https://api.github.com/repos/microsoft/markitdown/contributors",
"issue_events_url": "https://api.github.com/repos/microsoft/markitdown/issues/events{/number}",
"stargazers_count": 132154,
"subscription_url": "https://api.github.com/repos/microsoft/markitdown/subscription",
"temp_clone_token": null,
"collaborators_url": "https://api.github.com/repos/microsoft/markitdown/collaborators{/collaborator}",
"custom_properties": {
"activeRepoStatus": "false",
"global-rulesets-opt-out": "false"
},
"has_pull_requests": true,
"issue_comment_url": "https://api.github.com/repos/microsoft/markitdown/issues/comments{/number}",
"notifications_url": "https://api.github.com/repos/microsoft/markitdown/notifications{?since,all,participating}",
"open_issues_count": 763,
"subscribers_count": 446,
"web_commit_signoff_required": false,
"pull_request_creation_policy": "all"
},
"readme_raw": {
"summary": "MarkItDown [!IMPORTANT] MarkItDown performs I/O with the privileges of the current process. Like open() or requests.get(), it will access resources that the process itself can access. Sanitize your inputs in untrusted environments, and call the narrowest convert_ function needed for your use case (e.g., convert_stream() , or convert_local() ). See the Security Considerations section of the documentation for more information.",
"raw_text": "# MarkItDown\n\n[](https://pypi.org/project/markitdown/)\n\n[](https://github.com/microsoft/autogen)\n\n> [!IMPORTANT]\n> MarkItDown performs I/O with the privileges of the current process. Like open() or requests.get(), it will access resources that the process itself can access. Sanitize your inputs in untrusted environments, and call the narrowest `convert_*` function needed for your use case (e.g., `convert_stream()`, or `convert_local()`). See the [Security Considerations](#security-considerations) section of the documentation for more information.\n\nMarkItDown is a lightweight Python utility for converting various files to Markdown for use with LLMs and related text analysis pipelines. To this end, it is most comparable to [textract](https://github.com/deanmalmgren/textract), but with a focus on preserving important document structure and content as Markdown (including: headings, lists, tables, links, etc.) While the output is often reasonably presentable and human-friendly, it is meant to be consumed by text analysis tools -- and may not be the best option for high-fidelity document conversions for human consumption.\n\nMarkItDown currently supports the conversion from:\n\n- PDF\n- PowerPoint\n- Word\n- Excel\n- Images (EXIF metadata and OCR)\n- Audio (EXIF metadata and speech transcription)\n- HTML\n- Text-based formats (CSV, JSON, XML)\n- ZIP files (iterates over contents)\n- Youtube URLs\n- EPubs\n- ... and more!\n\n## Why Markdown?\n\nMarkdown is extremely close to plain text, with minimal markup or formatting, but still\nprovides a way to represent important document structure. Mainstream LLMs, such as\nOpenAI's GPT-4o, natively \"_speak_\" Markdown, and often incorporate Markdown into their\nresponses unprompted. This suggests that they have been trained on vast amounts of\nMarkdown-format",
"raw_text_truncated": true
},
"contributors_raw": {
"truncated": true,
"contributors": [
{
"type": "User",
"login": "afourney",
"html_url": "https://github.com/afourney",
"contributions": 102
},
{
"type": "User",
"login": "gagb",
"html_url": "https://github.com/gagb",
"contributions": 70
},
{
"type": "User",
"login": "sugatoray",
"html_url": "https://github.com/sugatoray",
"contributions": 9
},
{
"type": "User",
"login": "PetrAPConsulting",
"html_url": "https://github.com/PetrAPConsulting",
"contributions": 8
},
{
"type": "User",
"login": "l-lumin",
"html_url": "https://github.com/l-lumin",
"contributions": 7
},
{
"type": "User",
"login": "Josh-XT",
"html_url": "https://github.com/Josh-XT",
"contributions": 7
},
{
"type": "User",
"login": "Soulter",
"html_url": "https://github.com/Soulter",
"contributions": 6
},
{
"type": "User",
"login": "microsoftopensource",
"html_url": "https://github.com/microsoftopensource",
"contributions": 5
},
{
"type": "User",
"login": "lesyk",
"html_url": "https://github.com/lesyk",
"contributions": 5
},
{
"type": "Bot",
"login": "dependabot[bot]",
"html_url": "https://github.com/apps/dependabot",
"contributions": 4
}
]
},
"funding_raw": {
"path": null,
"exists": false,
"content": null
},
"stats_raw": {
"forks_total": 9049,
"stars_today": 2473,
"stars_total": 132154,
"watchers_count": 132154,
"open_issues_count": 763
},
"aux_raw": {
"selected_fields": {
"topics": [
"autogen",
"autogen-extension",
"langchain",
"markdown",
"microsoft-office",
"openai",
"pdf"
],
"is_fork": false,
"license": "MIT",
"language": "Python",
"owner_type": "Organization",
"forks_total": 9049,
"has_funding": false,
"is_archived": false,
"owner_login": "microsoft",
"stars_today": 2473,
"stars_total": 132154,
"homepage_url": null,
"default_branch": "main",
"last_pushed_at": "2026-05-26T22:41:34Z",
"readme_summary": "MarkItDown [!IMPORTANT] MarkItDown performs I/O with the privileges of the current process. Like open() or requests.get(), it will access resources that the process itself can access. Sanitize your inputs in untrusted environments, and call the narrowest convert_ function needed for your use case (e.g., convert_stream() , or convert_local() ). See the Security Considerations section of the documentation for more information.",
"repo_full_name": "microsoft/markitdown",
"watchers_count": 132154,
"last_updated_at": "2026-05-30T21:57:36Z",
"top_contributors": [
{
"login": "afourney",
"contributions": 102
},
{
"login": "gagb",
"contributions": 70
},
{
"login": "sugatoray",
"contributions": 9
},
{
"login": "PetrAPConsulting",
"contributions": 8
},
{
"login": "l-lumin",
"contributions": 7
}
],
"contributor_count": 10,
"funding_platforms": [],
"open_issues_count": 763,
"days_since_created": 563,
"created_at_on_source": "2024-11-13T19:56:40Z",
"days_since_last_push": 3,
"top_contributor_share": 0.457
}
},
"selection_meta": {
"readme_status": "ok",
"funding_status": "ok",
"missing_enrichment": [],
"repo_detail_status": "ok",
"contributors_status": "ok"
},
"created_at": "2026-05-30T22:00:35.673Z",
"updated_at": "2026-05-30T22:00:35.673Z"
}