convert-to-markdown
About
This skill converts various file formats (PDF, Office docs, images, audio, etc.) to Markdown using the markitdown utility. It handles Windows/WSL path conversions and can extract text from images/audio via OCR and transcription. Use it for batch document conversion, processing Confluence exports, or integrating markitdown into your workflows.
Quick Install
Claude Code
Recommended/plugin add https://github.com/majiayu000/claude-skill-registrygit clone https://github.com/majiayu000/claude-skill-registry.git ~/.claude/skills/convert-to-markdownCopy and paste this command in Claude Code to install this skill
Documentation
Markdown Tools
Convert documents to markdown using markitdown with support for multiple formats, image extraction, and Windows/WSL path handling.
Quick Start
Installation Options
Option 1: uvx (no installation required)
# Run directly without installing
uvx markitdown input.pdf -o output.md
Option 2: uv tool install (recommended for PDF support)
# Install with PDF support
uv tool install "markitdown[pdf]"
# Or via pip
pip install "markitdown[pdf]"
# Then use directly
markitdown "document.pdf" -o output.md
Supported Formats
- Documents: PDF, Word (.docx), PowerPoint (.pptx), Excel (.xlsx, .xls)
- Web/Data: HTML, CSV, JSON, XML
- Media: Images (EXIF + OCR), Audio (EXIF + transcription)
- Other: ZIP (iterates contents), YouTube URLs, EPub
Basic Usage
Using uvx (no install)
# Convert to stdout
uvx markitdown input.pdf
# Save to file
uvx markitdown input.pdf -o output.md
uvx markitdown input.docx > output.md
# From stdin
cat input.pdf | uvx markitdown
Using installed markitdown
# Basic conversion
markitdown "document.pdf" -o output.md
# Redirect output
markitdown "document.pdf" > output.md
Command Options
-o OUTPUT # Output file
-x EXTENSION # Hint file extension (for stdin)
-m MIME_TYPE # Hint MIME type
-c CHARSET # Hint charset (e.g., UTF-8)
-d # Use Azure Document Intelligence
-e ENDPOINT # Document Intelligence endpoint
--use-plugins # Enable 3rd-party plugins
--list-plugins # Show installed plugins
PDF Conversion with Images
markitdown extracts text only. For PDFs with images, use this workflow:
Step 1: Convert Text
markitdown "document.pdf" -o output.md
Step 2: Extract Images
# Create assets directory alongside the markdown
mkdir -p assets
# Extract images using PyMuPDF
uv run --with pymupdf python scripts/extract_pdf_images.py "document.pdf" ./assets
Step 3: Add Image References
Insert image references in the markdown where needed:

Step 4: Format Cleanup
markitdown output often needs manual fixes:
- Add proper heading levels (
#,##,###) - Reconstruct tables in markdown format
- Fix broken line breaks
- Restore indentation structure
Path Conversion (Windows/WSL)
# Windows → WSL conversion
C:\Users\name\file.pdf → /mnt/c/Users/name/file.pdf
# Use helper script
python scripts/convert_path.py "C:\Users\name\Documents\file.pdf"
Advanced Examples
Convert Word document
uvx markitdown report.docx -o report.md
Convert Excel spreadsheet
uvx markitdown data.xlsx > data.md
Convert PowerPoint presentation
uvx markitdown slides.pptx -o slides.md
Convert with file type hint (for stdin)
cat document | uvx markitdown -x .pdf > output.md
Use Azure Document Intelligence for better PDF extraction
uvx markitdown scan.pdf -d -e "https://your-resource.cognitiveservices.azure.com/"
Common Issues
"dependencies needed to read .pdf files"
# Install with PDF support
uv tool install "markitdown[pdf]" --force
FontBBox warnings during PDF conversion
- These are harmless font parsing warnings, output is still correct
Images missing from output
- Use
scripts/extract_pdf_images.pyto extract images separately
Notes
- Output preserves document structure: headings, tables, lists, links
- First run caches dependencies; subsequent runs are faster
- For complex PDFs with poor extraction, use
-dwith Azure Document Intelligence - Works on Windows, WSL, macOS, and Linux
Resources
scripts/extract_pdf_images.py- Extract images from PDF using PyMuPDFscripts/convert_path.py- Windows to WSL path converterreferences/conversion-examples.md- Detailed examples for batch operations
GitHub Repository
Related Skills
content-collections
MetaThis skill provides a production-tested setup for Content Collections, a TypeScript-first tool that transforms Markdown/MDX files into type-safe data collections with Zod validation. Use it when building blogs, documentation sites, or content-heavy Vite + React applications to ensure type safety and automatic content validation. It covers everything from Vite plugin configuration and MDX compilation to deployment optimization and schema validation.
cloudflare-turnstile
MetaThis skill provides comprehensive guidance for implementing Cloudflare Turnstile as a CAPTCHA-alternative bot protection system. It covers integration for forms, login pages, API endpoints, and frameworks like React/Next.js/Hono, while handling invisible challenges that maintain user experience. Use it when migrating from reCAPTCHA, debugging error codes, or implementing token validation and E2E tests.
llamaindex
MetaLlamaIndex is a data framework for building RAG-powered LLM applications, specializing in document ingestion, indexing, and querying. It provides key features like vector indices, query engines, and agents, and supports over 300 data connectors. Use it for document Q&A, chatbots, and knowledge retrieval when building data-centric applications.
canvas-design
MetaThe canvas-design skill generates original visual art in PNG and PDF formats for creating posters, designs, and other static artwork. It operates through a two-step process: first creating a design philosophy document, then visually expressing it on a canvas. The skill focuses on original compositions using form, color, and space while avoiding copyright infringement by never copying existing artists' work.
