Future TechnologyFuture Technology
AI

I fed a PDF, a spreadsheet and a Word doc to this free Microsoft tool. Here's what came out

· 2 min read · By Nath Connell

Key takeaways

  • MarkItDown converts PDFs, Word, Excel, PowerPoint, HTML and images into clean Markdown for AI tools
  • It is free, open source and made by Microsoft, with over 150,000 stars on GitHub
  • Setup difficulty: easy. One command and you are running
  • Worth it if you paste documents into ChatGPT or Claude on a regular basis
A laptop displaying code — MarkItDown turns messy documents into clean text for AI tools
A laptop displaying code — MarkItDown turns messy documents into clean text for AI tools

If you have ever copied text out of a PDF and pasted it into ChatGPT, you will know the pain. Broken line breaks. Tables turned to mush. Page numbers floating in the middle of sentences. The chatbot then reads that mess and gives you a worse answer.

MarkItDown is Microsoft's free fix for exactly this. It takes your files and spits out clean Markdown, the simple text format that AI models read best. We did not just read about it. We installed it and ran real documents through it.

What it actually does

The job is simple. You point it at a file, it gives you back tidy text. It handles PDFs, Word documents, Excel sheets, PowerPoint decks, web pages, images and even audio. The whole point is to prep your stuff before it goes into an AI tool, or into a system that searches your documents for you.

The future, in 3 minutes a day. The biggest tech story explained every morning, free. Get the briefing →

It is properly popular too. The project has passed 150,000 stars on GitHub, which makes it one of the most watched tools Microsoft has put out in the open.

We tried it

Setup took one command. We installed it, then fed it four files back to back.

A Word report came out with its headings intact and the table converted to a proper Markdown grid. A spreadsheet of prices turned into a clean table in under a second. A PDF invoice kept its numbers in the right order. A messy HTML page was stripped down to just the headings, the bold text and the links, with all the website clutter gone.

Every file converted in roughly a second. Nothing crashed.

The catch

It is not magic. One Word table came out with a blank top row where the styled header used to be, so you may need to tidy the odd table by hand. Scanned PDFs, the kind that are really just photos of text, will need the image reading features switched on to get anything useful. And this is a tool you run, not an app with buttons, so it suits people who are comfortable typing a command or who want to wire it into something bigger.

Who it is for

If you regularly drop documents into ChatGPT, Claude or Gemini, this saves you the copy and paste dance and gives the model cleaner input to work with. It is also a quiet workhorse for anyone building a tool that needs to read a pile of mixed files. Content creators sitting on folders of research will find it handy.

If you only ever open the odd PDF to read it yourself, you do not need this. Your normal apps are fine.

The verdict

Setup difficulty: easy. Worth it if you feed documents to AI tools more than once in a blue moon. It is free, it is fast, and it does one annoying job well. For something that costs nothing, there is very little to grumble about.

Read next

The future, in 3 minutes a day.

The biggest tech story explained every morning, plus a Friday roundup. Read it before your coffee cools.

Free. No spam. Unsubscribe in one click.