markdown-converter
💡 Summary
Convert various document formats to Markdown for LLM processing or text analysis.
🎯 Target Audience
🤖 AI Roast: “Powerful, but the setup might scare off the impatient.”
Risk: Medium. Review: shell/CLI command execution; outbound network access (SSRF, data egress). Run with least privilege and audit before enabling in production.
name: markdown-converter description: Convert documents and files to Markdown using markitdown. Use when converting PDF, Word (.docx), PowerPoint (.pptx), Excel (.xlsx, .xls), HTML, CSV, JSON, XML, images (with EXIF/OCR), audio (with transcription), ZIP archives, YouTube URLs, or EPubs to Markdown format for LLM processing or text analysis.
Markdown Converter
Convert files to Markdown using uvx markitdown — no installation required.
Basic Usage
# Convert to stdout uvx markitdown input.pdf # Save to file uvx markitdown input.pdf -o output.md uvx markitdown input.docx > output.md # From stdin cat input.pdf | uvx markitdown
Supported Formats
- Documents: PDF, Word (.docx), PowerPoint (.pptx), Excel (.xlsx, .xls)
- Web/Data: HTML, CSV, JSON, XML
- Media: Images (EXIF + OCR), Audio (EXIF + transcription)
- Other: ZIP (iterates contents), YouTube URLs, EPub
Options
-o OUTPUT # Output file -x EXTENSION # Hint file extension (for stdin) -m MIME_TYPE # Hint MIME type -c CHARSET # Hint charset (e.g., UTF-8) -d # Use Azure Document Intelligence -e ENDPOINT # Document Intelligence endpoint --use-plugins # Enable 3rd-party plugins --list-plugins # Show installed plugins
Examples
# Convert Word document uvx markitdown report.docx -o report.md # Convert Excel spreadsheet uvx markitdown data.xlsx > data.md # Convert PowerPoint presentation uvx markitdown slides.pptx -o slides.md # Convert with file type hint (for stdin) cat document | uvx markitdown -x .pdf > output.md # Use Azure Document Intelligence for better PDF extraction uvx markitdown scan.pdf -d -e "https://your-resource.cognitiveservices.azure.com/"
Notes
- Output preserves document structure: headings, tables, lists, links
- First run caches dependencies; subsequent runs are faster
- For complex PDFs with poor extraction, use
-dwith Azure Document Intelligence
Pros
- Supports a wide range of formats
- No installation required
- Preserves document structure
- Fast subsequent runs due to caching
Cons
- Complex PDFs may require Azure integration
- Dependency caching may slow first run
- Limited error handling mentioned
- No GUI for non-technical users
Related Skills
novel-writer-skills
A“This tool is like a writing coach that never sleeps—just don't expect it to write your novel for you!”
screen-creative-skills
A“Powerful, but the setup might scare off the impatient.”
payload
A“Payload's architecture may expose risks such as dependency vulnerabilities and potential CSRF attacks. Regular updates and using secure coding practices can mitigate these risks.”
Disclaimer: This content is sourced from GitHub open source projects for display and rating purposes only.
Copyright belongs to the original author intellectronica.
