Co-Pilot / 辅助式
更新于 a month ago

cli

Ffirecrawl
0.0k
firecrawl/cli
82
Agent 评分

💡 摘要

Firecrawl CLI 是一个用于从网站抓取、爬取和提取数据的命令行工具。

🎯 适合人群

希望自动化数据提取的网页开发人员。需要收集网络数据进行分析的数据科学家。希望分析网站结构的SEO专业人士。希望从多个来源聚合信息的内容创作者。需要从各种在线平台收集数据的研究人员。

🤖 AI 吐槽:看起来很能打,但别让配置把人劝退。

安全分析中风险

风险:Medium。建议检查:是否执行 shell/命令行指令;是否发起外网请求(SSRF/数据外发);API Key/Token 的获取、存储与泄露风险;文件读写范围与路径穿越风险;依赖锁定与供应链风险。以最小权限运行,并在生产环境启用前审计代码与依赖。

🔥 Firecrawl CLI

Command-line interface for Firecrawl. Scrape, crawl, and extract data from any website directly from your terminal.

Installation

npm install -g firecrawl-cli

If you are using in any AI agent like Claude Code, you can install the skill with:

npx skills add firecrawl/cli

Quick Start

Just run a command - the CLI will prompt you to authenticate if needed:

firecrawl https://example.com

Authentication

On first run, you'll be prompted to authenticate:

  🔥 firecrawl cli
  Turn websites into LLM-ready data

Welcome! To get started, authenticate with your Firecrawl account.

  1. Login with browser (recommended)
  2. Enter API key manually

Tip: You can also set FIRECRAWL_API_KEY environment variable

Enter choice [1/2]:

Authentication Methods

# Interactive (prompts automatically when needed) firecrawl # Browser login firecrawl login # Direct API key firecrawl login --api-key fc-your-api-key # Environment variable export FIRECRAWL_API_KEY=fc-your-api-key # Per-command API key firecrawl scrape https://example.com --api-key fc-your-api-key

Commands

scrape - Scrape a single URL

Extract content from any webpage in various formats.

# Basic usage (outputs markdown) firecrawl https://example.com firecrawl scrape https://example.com # Get raw HTML firecrawl https://example.com --html firecrawl https://example.com -H # Multiple formats (outputs JSON) firecrawl https://example.com --format markdown,links,images # Save to file firecrawl https://example.com -o output.md firecrawl https://example.com --format json -o data.json --pretty

Scrape Options

| Option | Description | | ------------------------ | ------------------------------------------------------- | | -f, --format <formats> | Output format(s), comma-separated | | -H, --html | Shortcut for --format html | | --only-main-content | Extract only main content (removes navs, footers, etc.) | | --wait-for <ms> | Wait time before scraping (for JS-rendered content) | | --screenshot | Take a screenshot | | --include-tags <tags> | Only include specific HTML tags | | --exclude-tags <tags> | Exclude specific HTML tags | | -o, --output <path> | Save output to file | | --pretty | Pretty print JSON output | | --timing | Show request timing info |

Available Formats

| Format | Description | | ------------ | -------------------------- | | markdown | Clean markdown (default) | | html | Cleaned HTML | | rawHtml | Original HTML | | links | All links on the page | | screenshot | Screenshot as base64 | | json | Structured JSON extraction |

Examples

# Extract only main content as markdown firecrawl https://blog.example.com --only-main-content # Wait for JS to render, then scrape firecrawl https://spa-app.com --wait-for 3000 # Get all links from a page firecrawl https://example.com --format links # Screenshot + markdown firecrawl https://example.com --format markdown --screenshot # Extract specific elements only firecrawl https://example.com --include-tags article,main # Exclude navigation and ads firecrawl https://example.com --exclude-tags nav,aside,.ad

crawl - Crawl an entire website

Crawl multiple pages from a website.

# Start a crawl (returns job ID) firecrawl crawl https://example.com # Wait for crawl to complete firecrawl crawl https://example.com --wait # With progress indicator firecrawl crawl https://example.com --wait --progress # Check crawl status firecrawl crawl <job-id> # Limit pages firecrawl crawl https://example.com --limit 100 --max-depth 3

Crawl Options

| Option | Description | | --------------------------- | ---------------------------------------- | | --wait | Wait for crawl to complete | | --progress | Show progress while waiting | | --limit <n> | Maximum pages to crawl | | --max-depth <n> | Maximum crawl depth | | --include-paths <paths> | Only crawl matching paths | | --exclude-paths <paths> | Skip matching paths | | --sitemap <mode> | include, skip, or only | | --allow-subdomains | Include subdomains | | --allow-external-links | Follow external links | | --crawl-entire-domain | Crawl entire domain | | --ignore-query-parameters | Treat URLs with different params as same | | --delay <ms> | Delay between requests | | --max-concurrency <n> | Max concurrent requests | | --timeout <seconds> | Timeout when waiting | | --poll-interval <seconds> | Status check interval |

Examples

# Crawl blog section only firecrawl crawl https://example.com --include-paths /blog,/posts # Exclude admin pages firecrawl crawl https://example.com --exclude-paths /admin,/login # Crawl with rate limiting firecrawl crawl https://example.com --delay 1000 --max-concurrency 2 # Deep crawl with high limit firecrawl crawl https://example.com --limit 1000 --max-depth 10 --wait --progress # Save results firecrawl crawl https://example.com --wait -o crawl-results.json --pretty

map - Discover all URLs on a website

Quickly discover all URLs on a website without scraping content.

# List all URLs (one per line) firecrawl map https://example.com # Output as JSON firecrawl map https://example.com --json # Search for specific URLs firecrawl map https://example.com --search "blog" # Limit results firecrawl map https://example.com --limit 500

Map Options

| Option | Description | | --------------------------- | --------------------------------- | | --limit <n> | Maximum URLs to discover | | --search <query> | Filter URLs by search query | | --sitemap <mode> | include, skip, or only | | --include-subdomains | Include subdomains | | --ignore-query-parameters | Dedupe URLs with different params | | --timeout <seconds> | Request timeout | | --json | Output as JSON | | -o, --output <path> | Save to file |

Examples

# Find all product pages firecrawl map https://shop.example.com --search "product" # Get sitemap URLs only firecrawl map https://example.com --sitemap only # Save URL list to file firecrawl map https://example.com -o urls.txt # Include subdomains firecrawl map https://example.com --include-subdomains --limit 1000

search - Search the web

Search the web and optionally scrape content from search results.

# Basic search firecrawl search "firecrawl web scraping" # Limit results firecrawl search "AI news" --limit 10 # Search news sources firecrawl search "tech startups" --sources news # Search images firecrawl search "landscape photography" --sources images # Multiple sources firecrawl search "machine learning" --sources web,news,images # Filter by category (GitHub, research papers, PDFs) firecrawl search "web scraping python" --categories github firecrawl search "transformer architecture" --categories research firecrawl search "machine learning" --categories github,research # Time-based search firecrawl search "AI announcements" --tbs qdr:d # Past day firecrawl search "tech news" --tbs qdr:w # Past week # Location-based search firecrawl search "restaurants" --location "San Francisco,California,United States" firecrawl search "local news" --country DE # Search and scrape results firecrawl search "firecrawl tutorials" --scrape firecrawl search "API documentation" --scrape --scrape-formats markdown,links # Output as pretty JSON firecrawl search "web scraping"

Search Options

| Option | Description | | ---------------------------- | ------------------------------------------------------------------------------------------- | | --limit <n> | Maximum results (default: 5, max: 100) | | --sources <sources> | Comma-separated: web, images, news (default: web) | | --categories <categories> | Comma-separated: github, research, pdf | | --tbs <value> | Time filter: qdr:h (hour), qdr:d (day), qdr:w (week), qdr:m (month), qdr:y (year) | | --location <location> | Geo-targeting (e.g., "Germany", "San Francisco,California,United States") | | --country <code> | ISO country code (default: US) | | --timeout <ms> | Timeout in milliseconds (default: 60000) | | --ignore-invalid-urls | Exclude URLs invalid for other Firecrawl endpoints | | --scrape | Enable scraping of search results | | `--s

五维分析
清晰度8/10
创新性7/10
实用性9/10
完整性9/10
可维护性8/10
优缺点分析

优点

  • 多种命令选项满足各种抓取需求。
  • 支持多种输出格式,灵活性强。
  • 用户友好的认证方法。

缺点

  • 需要认证才能充分发挥功能。
  • 可能需要针对复杂网站进行调整。
  • 新用户有学习曲线。

相关技能

pytorch

S
toolCode Lib / 代码库
92/ 100

“它是深度学习的瑞士军刀,但祝你好运能从47种安装方法里找到那个不会搞崩你系统的那一个。”

agno

S
toolCode Lib / 代码库
90/ 100

“它承诺成为智能体领域的Kubernetes,但得看开发者有没有耐心学习又一个编排层。”

nuxt-skills

S
toolCo-Pilot / 辅助式
90/ 100

“这本质上是一份组织良好的小抄,能把你的 AI 助手变成一只 Nuxt 框架的复读机。”

免责声明:本内容来源于 GitHub 开源项目,仅供展示和评分分析使用。

版权归原作者所有 firecrawl.