💡 摘要
Firecrawl CLI 是一个用于从网站抓取、爬取和提取数据的命令行工具。
🎯 适合人群
🤖 AI 吐槽: “看起来很能打,但别让配置把人劝退。”
风险:Medium。建议检查:是否执行 shell/命令行指令;是否发起外网请求(SSRF/数据外发);API Key/Token 的获取、存储与泄露风险;文件读写范围与路径穿越风险;依赖锁定与供应链风险。以最小权限运行,并在生产环境启用前审计代码与依赖。
🔥 Firecrawl CLI
Command-line interface for Firecrawl. Scrape, crawl, and extract data from any website directly from your terminal.
Installation
npm install -g firecrawl-cli
If you are using in any AI agent like Claude Code, you can install the skill with:
npx skills add firecrawl/cli
Quick Start
Just run a command - the CLI will prompt you to authenticate if needed:
firecrawl https://example.com
Authentication
On first run, you'll be prompted to authenticate:
🔥 firecrawl cli
Turn websites into LLM-ready data
Welcome! To get started, authenticate with your Firecrawl account.
1. Login with browser (recommended)
2. Enter API key manually
Tip: You can also set FIRECRAWL_API_KEY environment variable
Enter choice [1/2]:
Authentication Methods
# Interactive (prompts automatically when needed) firecrawl # Browser login firecrawl login # Direct API key firecrawl login --api-key fc-your-api-key # Environment variable export FIRECRAWL_API_KEY=fc-your-api-key # Per-command API key firecrawl scrape https://example.com --api-key fc-your-api-key
Commands
scrape - Scrape a single URL
Extract content from any webpage in various formats.
# Basic usage (outputs markdown) firecrawl https://example.com firecrawl scrape https://example.com # Get raw HTML firecrawl https://example.com --html firecrawl https://example.com -H # Multiple formats (outputs JSON) firecrawl https://example.com --format markdown,links,images # Save to file firecrawl https://example.com -o output.md firecrawl https://example.com --format json -o data.json --pretty
Scrape Options
| Option | Description |
| ------------------------ | ------------------------------------------------------- |
| -f, --format <formats> | Output format(s), comma-separated |
| -H, --html | Shortcut for --format html |
| --only-main-content | Extract only main content (removes navs, footers, etc.) |
| --wait-for <ms> | Wait time before scraping (for JS-rendered content) |
| --screenshot | Take a screenshot |
| --include-tags <tags> | Only include specific HTML tags |
| --exclude-tags <tags> | Exclude specific HTML tags |
| -o, --output <path> | Save output to file |
| --pretty | Pretty print JSON output |
| --timing | Show request timing info |
Available Formats
| Format | Description |
| ------------ | -------------------------- |
| markdown | Clean markdown (default) |
| html | Cleaned HTML |
| rawHtml | Original HTML |
| links | All links on the page |
| screenshot | Screenshot as base64 |
| json | Structured JSON extraction |
Examples
# Extract only main content as markdown firecrawl https://blog.example.com --only-main-content # Wait for JS to render, then scrape firecrawl https://spa-app.com --wait-for 3000 # Get all links from a page firecrawl https://example.com --format links # Screenshot + markdown firecrawl https://example.com --format markdown --screenshot # Extract specific elements only firecrawl https://example.com --include-tags article,main # Exclude navigation and ads firecrawl https://example.com --exclude-tags nav,aside,.ad
crawl - Crawl an entire website
Crawl multiple pages from a website.
# Start a crawl (returns job ID) firecrawl crawl https://example.com # Wait for crawl to complete firecrawl crawl https://example.com --wait # With progress indicator firecrawl crawl https://example.com --wait --progress # Check crawl status firecrawl crawl <job-id> # Limit pages firecrawl crawl https://example.com --limit 100 --max-depth 3
Crawl Options
| Option | Description |
| --------------------------- | ---------------------------------------- |
| --wait | Wait for crawl to complete |
| --progress | Show progress while waiting |
| --limit <n> | Maximum pages to crawl |
| --max-depth <n> | Maximum crawl depth |
| --include-paths <paths> | Only crawl matching paths |
| --exclude-paths <paths> | Skip matching paths |
| --sitemap <mode> | include, skip, or only |
| --allow-subdomains | Include subdomains |
| --allow-external-links | Follow external links |
| --crawl-entire-domain | Crawl entire domain |
| --ignore-query-parameters | Treat URLs with different params as same |
| --delay <ms> | Delay between requests |
| --max-concurrency <n> | Max concurrent requests |
| --timeout <seconds> | Timeout when waiting |
| --poll-interval <seconds> | Status check interval |
Examples
# Crawl blog section only firecrawl crawl https://example.com --include-paths /blog,/posts # Exclude admin pages firecrawl crawl https://example.com --exclude-paths /admin,/login # Crawl with rate limiting firecrawl crawl https://example.com --delay 1000 --max-concurrency 2 # Deep crawl with high limit firecrawl crawl https://example.com --limit 1000 --max-depth 10 --wait --progress # Save results firecrawl crawl https://example.com --wait -o crawl-results.json --pretty
map - Discover all URLs on a website
Quickly discover all URLs on a website without scraping content.
# List all URLs (one per line) firecrawl map https://example.com # Output as JSON firecrawl map https://example.com --json # Search for specific URLs firecrawl map https://example.com --search "blog" # Limit results firecrawl map https://example.com --limit 500
Map Options
| Option | Description |
| --------------------------- | --------------------------------- |
| --limit <n> | Maximum URLs to discover |
| --search <query> | Filter URLs by search query |
| --sitemap <mode> | include, skip, or only |
| --include-subdomains | Include subdomains |
| --ignore-query-parameters | Dedupe URLs with different params |
| --timeout <seconds> | Request timeout |
| --json | Output as JSON |
| -o, --output <path> | Save to file |
Examples
# Find all product pages firecrawl map https://shop.example.com --search "product" # Get sitemap URLs only firecrawl map https://example.com --sitemap only # Save URL list to file firecrawl map https://example.com -o urls.txt # Include subdomains firecrawl map https://example.com --include-subdomains --limit 1000
search - Search the web
Search the web and optionally scrape content from search results.
# Basic search firecrawl search "firecrawl web scraping" # Limit results firecrawl search "AI news" --limit 10 # Search news sources firecrawl search "tech startups" --sources news # Search images firecrawl search "landscape photography" --sources images # Multiple sources firecrawl search "machine learning" --sources web,news,images # Filter by category (GitHub, research papers, PDFs) firecrawl search "web scraping python" --categories github firecrawl search "transformer architecture" --categories research firecrawl search "machine learning" --categories github,research # Time-based search firecrawl search "AI announcements" --tbs qdr:d # Past day firecrawl search "tech news" --tbs qdr:w # Past week # Location-based search firecrawl search "restaurants" --location "San Francisco,California,United States" firecrawl search "local news" --country DE # Search and scrape results firecrawl search "firecrawl tutorials" --scrape firecrawl search "API documentation" --scrape --scrape-formats markdown,links # Output as pretty JSON firecrawl search "web scraping"
Search Options
| Option | Description |
| ---------------------------- | ------------------------------------------------------------------------------------------- |
| --limit <n> | Maximum results (default: 5, max: 100) |
| --sources <sources> | Comma-separated: web, images, news (default: web) |
| --categories <categories> | Comma-separated: github, research, pdf |
| --tbs <value> | Time filter: qdr:h (hour), qdr:d (day), qdr:w (week), qdr:m (month), qdr:y (year) |
| --location <location> | Geo-targeting (e.g., "Germany", "San Francisco,California,United States") |
| --country <code> | ISO country code (default: US) |
| --timeout <ms> | Timeout in milliseconds (default: 60000) |
| --ignore-invalid-urls | Exclude URLs invalid for other Firecrawl endpoints |
| --scrape | Enable scraping of search results |
| `--s
优点
- 多种命令选项满足各种抓取需求。
- 支持多种输出格式,灵活性强。
- 用户友好的认证方法。
缺点
- 需要认证才能充分发挥功能。
- 可能需要针对复杂网站进行调整。
- 新用户有学习曲线。
相关技能
免责声明:本内容来源于 GitHub 开源项目,仅供展示和评分分析使用。
版权归原作者所有 firecrawl.
