Co-Pilot / 辅助式

更新于 6 months ago

cli

Name: cli
Rating: 4.1 (12 reviews)
Author: firecrawl

Ffirecrawl

0.0k

firecrawl/cli

Agent 评分

💡 摘要

Firecrawl CLI 是一个用于从网站抓取、爬取和提取数据的命令行工具。

🎯 适合人群

希望自动化数据提取的网页开发人员。需要收集网络数据进行分析的数据科学家。希望分析网站结构的SEO专业人士。希望从多个来源聚合信息的内容创作者。需要从各种在线平台收集数据的研究人员。

🤖 AI 吐槽: “看起来很能打，但别让配置把人劝退。”

安全分析中风险

风险：Medium。建议检查：是否执行 shell/命令行指令；是否发起外网请求（SSRF/数据外发）；API Key/Token 的获取、存储与泄露风险；文件读写范围与路径穿越风险；依赖锁定与供应链风险。以最小权限运行，并在生产环境启用前审计代码与依赖。

🔥 Firecrawl CLI

Command-line interface for Firecrawl. Scrape, crawl, and extract data from any website directly from your terminal.

Installation

npm install -g firecrawl-cli

If you are using in any AI agent like Claude Code, you can install the skill with:

npx skills add firecrawl/cli

Quick Start

Just run a command - the CLI will prompt you to authenticate if needed:

firecrawl https://example.com

Authentication

On first run, you'll be prompted to authenticate:

  🔥 firecrawl cli
  Turn websites into LLM-ready data

Welcome! To get started, authenticate with your Firecrawl account.

  1. Login with browser (recommended)
  2. Enter API key manually

Tip: You can also set FIRECRAWL_API_KEY environment variable

Enter choice [1/2]:

Authentication Methods

# Interactive (prompts automatically when needed)
firecrawl

# Browser login
firecrawl login

# Direct API key
firecrawl login --api-key fc-your-api-key

# Environment variable
export FIRECRAWL_API_KEY=fc-your-api-key

# Per-command API key
firecrawl scrape https://example.com --api-key fc-your-api-key

Commands

`scrape` - Scrape a single URL

Extract content from any webpage in various formats.

# Basic usage (outputs markdown)
firecrawl https://example.com
firecrawl scrape https://example.com

# Get raw HTML
firecrawl https://example.com --html
firecrawl https://example.com -H

# Multiple formats (outputs JSON)
firecrawl https://example.com --format markdown,links,images

# Save to file
firecrawl https://example.com -o output.md
firecrawl https://example.com --format json -o data.json --pretty

Scrape Options

| Option | Description | | ------------------------ | ------------------------------------------------------- | | -f, --format <formats> | Output format(s), comma-separated | | -H, --html | Shortcut for --format html | | --only-main-content | Extract only main content (removes navs, footers, etc.) | | --wait-for <ms> | Wait time before scraping (for JS-rendered content) | | --screenshot | Take a screenshot | | --include-tags <tags> | Only include specific HTML tags | | --exclude-tags <tags> | Exclude specific HTML tags | | -o, --output <path> | Save output to file | | --pretty | Pretty print JSON output | | --timing | Show request timing info |

Available Formats

| Format | Description | | ------------ | -------------------------- | | markdown | Clean markdown (default) | | html | Cleaned HTML | | rawHtml | Original HTML | | links | All links on the page | | screenshot | Screenshot as base64 | | json | Structured JSON extraction |

Examples

# Extract only main content as markdown
firecrawl https://blog.example.com --only-main-content

# Wait for JS to render, then scrape
firecrawl https://spa-app.com --wait-for 3000

# Get all links from a page
firecrawl https://example.com --format links

# Screenshot + markdown
firecrawl https://example.com --format markdown --screenshot

# Extract specific elements only
firecrawl https://example.com --include-tags article,main

# Exclude navigation and ads
firecrawl https://example.com --exclude-tags nav,aside,.ad

`crawl` - Crawl an entire website

Crawl multiple pages from a website.

# Start a crawl (returns job ID)
firecrawl crawl https://example.com

# Wait for crawl to complete
firecrawl crawl https://example.com --wait

# With progress indicator
firecrawl crawl https://example.com --wait --progress

# Check crawl status
firecrawl crawl <job-id>

# Limit pages
firecrawl crawl https://example.com --limit 100 --max-depth 3

Crawl Options

| Option | Description | | --------------------------- | ---------------------------------------- | | --wait | Wait for crawl to complete | | --progress | Show progress while waiting | | --limit <n> | Maximum pages to crawl | | --max-depth <n> | Maximum crawl depth | | --include-paths <paths> | Only crawl matching paths | | --exclude-paths <paths> | Skip matching paths | | --sitemap <mode> | include, skip, or only | | --allow-subdomains | Include subdomains | | --allow-external-links | Follow external links | | --crawl-entire-domain | Crawl entire domain | | --ignore-query-parameters | Treat URLs with different params as same | | --delay <ms> | Delay between requests | | --max-concurrency <n> | Max concurrent requests | | --timeout <seconds> | Timeout when waiting | | --poll-interval <seconds> | Status check interval |

Examples

# Crawl blog section only
firecrawl crawl https://example.com --include-paths /blog,/posts

# Exclude admin pages
firecrawl crawl https://example.com --exclude-paths /admin,/login

# Crawl with rate limiting
firecrawl crawl https://example.com --delay 1000 --max-concurrency 2

# Deep crawl with high limit
firecrawl crawl https://example.com --limit 1000 --max-depth 10 --wait --progress

# Save results
firecrawl crawl https://example.com --wait -o crawl-results.json --pretty

`map` - Discover all URLs on a website

Quickly discover all URLs on a website without scraping content.

# List all URLs (one per line)
firecrawl map https://example.com

# Output as JSON
firecrawl map https://example.com --json

# Search for specific URLs
firecrawl map https://example.com --search "blog"

# Limit results
firecrawl map https://example.com --limit 500

Map Options

| Option | Description | | --------------------------- | --------------------------------- | | --limit <n> | Maximum URLs to discover | | --search <query> | Filter URLs by search query | | --sitemap <mode> | include, skip, or only | | --include-subdomains | Include subdomains | | --ignore-query-parameters | Dedupe URLs with different params | | --timeout <seconds> | Request timeout | | --json | Output as JSON | | -o, --output <path> | Save to file |

Examples

# Find all product pages
firecrawl map https://shop.example.com --search "product"

# Get sitemap URLs only
firecrawl map https://example.com --sitemap only

# Save URL list to file
firecrawl map https://example.com -o urls.txt

# Include subdomains
firecrawl map https://example.com --include-subdomains --limit 1000

`search` - Search the web

Search the web and optionally scrape content from search results.

# Basic search
firecrawl search "firecrawl web scraping"

# Limit results
firecrawl search "AI news" --limit 10

# Search news sources
firecrawl search "tech startups" --sources news

# Search images
firecrawl search "landscape photography" --sources images

# Multiple sources
firecrawl search "machine learning" --sources web,news,images

# Filter by category (GitHub, research papers, PDFs)
firecrawl search "web scraping python" --categories github
firecrawl search "transformer architecture" --categories research
firecrawl search "machine learning" --categories github,research

# Time-based search
firecrawl search "AI announcements" --tbs qdr:d   # Past day
firecrawl search "tech news" --tbs qdr:w          # Past week

# Location-based search
firecrawl search "restaurants" --location "San Francisco,California,United States"
firecrawl search "local news" --country DE

# Search and scrape results
firecrawl search "firecrawl tutorials" --scrape
firecrawl search "API documentation" --scrape --scrape-formats markdown,links

# Output as pretty JSON
firecrawl search "web scraping"

Search Options

| Option | Description | | ---------------------------- | ------------------------------------------------------------------------------------------- | | --limit <n> | Maximum results (default: 5, max: 100) | | --sources <sources> | Comma-separated: web, images, news (default: web) | | --categories <categories> | Comma-separated: github, research, pdf | | --tbs <value> | Time filter: qdr:h (hour), qdr:d (day), qdr:w (week), qdr:m (month), qdr:y (year) | | --location <location> | Geo-targeting (e.g., "Germany", "San Francisco,California,United States") | | --country <code> | ISO country code (default: US) | | --timeout <ms> | Timeout in milliseconds (default: 60000) | | --ignore-invalid-urls | Exclude URLs invalid for other Firecrawl endpoints | | --scrape | Enable scraping of search results | | `--s

五维分析

清晰度8/10

创新性7/10

实用性9/10

完整性9/10

可维护性8/10

优缺点分析

优点

多种命令选项满足各种抓取需求。
支持多种输出格式，灵活性强。
用户友好的认证方法。

缺点

需要认证才能充分发挥功能。
可能需要针对复杂网站进行调整。
新用户有学习曲线。

cli

💡 摘要

🎯 适合人群

🔥 Firecrawl CLI

Installation

Quick Start

Authentication

Authentication Methods

Commands

`scrape` - Scrape a single URL

Scrape Options

Available Formats

Examples

`crawl` - Crawl an entire website

Crawl Options

Examples

`map` - Discover all URLs on a website

Map Options

Examples

`search` - Search the web

Search Options

优点

缺点

相关技能

pytorch

agno

nuxt-skills

cli

💡 摘要

🎯 适合人群

🔥 Firecrawl CLI

Installation

Quick Start

Authentication

Authentication Methods

Commands

scrape - Scrape a single URL

Scrape Options

Available Formats

Examples

crawl - Crawl an entire website

Crawl Options

Examples

map - Discover all URLs on a website

Map Options

Examples

search - Search the web

Search Options

优点

缺点

相关技能

pytorch

agno

nuxt-skills

`scrape` - Scrape a single URL

`crawl` - Crawl an entire website

`map` - Discover all URLs on a website

`search` - Search the web