Co-Pilot / 辅助式

更新于 4 months ago

agent-browser

Name: agent-browser
Rating: 4.1 (9848 reviews)
Author: vercel-labs

Vvercel-labs

9.8k

vercel-labs/agent-browser

Agent 评分

💡 摘要

一个用于AI代理的无头浏览器自动化CLI，支持快速灵活的网页交互。

🎯 适合人群

网页开发者质量保证工程师自动化测试人员数据抓取者AI研究人员

🤖 AI 吐槽: “看起来很能打，但别让配置把人劝退。”

安全分析中风险

风险：Medium。建议检查：是否执行 shell/命令行指令；是否发起外网请求（SSRF/数据外发）；API Key/Token 的获取、存储与泄露风险；文件读写范围与路径穿越风险；依赖锁定与供应链风险。以最小权限运行，并在生产环境启用前审计代码与依赖。

agent-browser

Headless browser automation CLI for AI agents. Fast Rust CLI with Node.js fallback.

Installation

npm (recommended)

npm install -g agent-browser
agent-browser install  # Download Chromium

From Source

git clone https://github.com/vercel-labs/agent-browser
cd agent-browser
pnpm install
pnpm build
pnpm build:native   # Requires Rust (https://rustup.rs)
pnpm link --global  # Makes agent-browser available globally
agent-browser install

Linux Dependencies

On Linux, install system dependencies:

agent-browser install --with-deps
# or manually: npx playwright install-deps chromium

Quick Start

agent-browser open example.com
agent-browser snapshot                    # Get accessibility tree with refs
agent-browser click @e2                   # Click by ref from snapshot
agent-browser fill @e3 "test@example.com" # Fill by ref
agent-browser get text @e1                # Get text by ref
agent-browser screenshot page.png
agent-browser close

Traditional Selectors (also supported)

agent-browser click "#submit"
agent-browser fill "#email" "test@example.com"
agent-browser find role button click --name "Submit"

Commands

Core Commands

agent-browser open <url>              # Navigate to URL (aliases: goto, navigate)
agent-browser click <sel>             # Click element
agent-browser dblclick <sel>          # Double-click element
agent-browser focus <sel>             # Focus element
agent-browser type <sel> <text>       # Type into element
agent-browser fill <sel> <text>       # Clear and fill
agent-browser press <key>             # Press key (Enter, Tab, Control+a) (alias: key)
agent-browser keydown <key>           # Hold key down
agent-browser keyup <key>             # Release key
agent-browser hover <sel>             # Hover element
agent-browser select <sel> <val>      # Select dropdown option
agent-browser check <sel>             # Check checkbox
agent-browser uncheck <sel>           # Uncheck checkbox
agent-browser scroll <dir> [px]       # Scroll (up/down/left/right)
agent-browser scrollintoview <sel>    # Scroll element into view (alias: scrollinto)
agent-browser drag <src> <tgt>        # Drag and drop
agent-browser upload <sel> <files>    # Upload files
agent-browser screenshot [path]       # Take screenshot (--full for full page, base64 png to stdout if no path)
agent-browser pdf <path>              # Save as PDF
agent-browser snapshot                # Accessibility tree with refs (best for AI)
agent-browser eval <js>               # Run JavaScript
agent-browser connect <port>          # Connect to browser via CDP
agent-browser close                   # Close browser (aliases: quit, exit)

Get Info

agent-browser get text <sel>          # Get text content
agent-browser get html <sel>          # Get innerHTML
agent-browser get value <sel>         # Get input value
agent-browser get attr <sel> <attr>   # Get attribute
agent-browser get title               # Get page title
agent-browser get url                 # Get current URL
agent-browser get count <sel>         # Count matching elements
agent-browser get box <sel>           # Get bounding box

Check State

agent-browser is visible <sel>        # Check if visible
agent-browser is enabled <sel>        # Check if enabled
agent-browser is checked <sel>        # Check if checked

Find Elements (Semantic Locators)

agent-browser find role <role> <action> [value]       # By ARIA role
agent-browser find text <text> <action>               # By text content
agent-browser find label <label> <action> [value]     # By label
agent-browser find placeholder <ph> <action> [value]  # By placeholder
agent-browser find alt <text> <action>                # By alt text
agent-browser find title <text> <action>              # By title attr
agent-browser find testid <id> <action> [value]       # By data-testid
agent-browser find first <sel> <action> [value]       # First match
agent-browser find last <sel> <action> [value]        # Last match
agent-browser find nth <n> <sel> <action> [value]     # Nth match

Actions: click, fill, check, hover, text

Examples:

agent-browser find role button click --name "Submit"
agent-browser find text "Sign In" click
agent-browser find label "Email" fill "test@test.com"
agent-browser find first ".item" click
agent-browser find nth 2 "a" text

Wait

agent-browser wait <selector>         # Wait for element to be visible
agent-browser wait <ms>               # Wait for time (milliseconds)
agent-browser wait --text "Welcome"   # Wait for text to appear
agent-browser wait --url "**/dash"    # Wait for URL pattern
agent-browser wait --load networkidle # Wait for load state
agent-browser wait --fn "window.ready === true"  # Wait for JS condition

Load states: load, domcontentloaded, networkidle

Mouse Control

agent-browser mouse move <x> <y>      # Move mouse
agent-browser mouse down [button]     # Press button (left/right/middle)
agent-browser mouse up [button]       # Release button
agent-browser mouse wheel <dy> [dx]   # Scroll wheel

Browser Settings

agent-browser set viewport <w> <h>    # Set viewport size
agent-browser set device <name>       # Emulate device ("iPhone 14")
agent-browser set geo <lat> <lng>     # Set geolocation
agent-browser set offline [on|off]    # Toggle offline mode
agent-browser set headers <json>      # Extra HTTP headers
agent-browser set credentials <u> <p> # HTTP basic auth
agent-browser set media [dark|light]  # Emulate color scheme

Cookies & Storage

agent-browser cookies                 # Get all cookies
agent-browser cookies set <name> <val> # Set cookie
agent-browser cookies clear           # Clear cookies

agent-browser storage local           # Get all localStorage
agent-browser storage local <key>     # Get specific key
agent-browser storage local set <k> <v>  # Set value
agent-browser storage local clear     # Clear all

agent-browser storage session         # Same for sessionStorage

Network

agent-browser network route <url>              # Intercept requests
agent-browser network route <url> --abort      # Block requests
agent-browser network route <url> --body <json>  # Mock response
agent-browser network unroute [url]            # Remove routes
agent-browser network requests                 # View tracked requests
agent-browser network requests --filter api    # Filter requests

Tabs & Windows

agent-browser tab                     # List tabs
agent-browser tab new [url]           # New tab (optionally with URL)
agent-browser tab <n>                 # Switch to tab n
agent-browser tab close [n]           # Close tab
agent-browser window new              # New window

Frames

agent-browser frame <sel>             # Switch to iframe
agent-browser frame main              # Back to main frame

Dialogs

agent-browser dialog accept [text]    # Accept (with optional prompt text)
agent-browser dialog dismiss          # Dismiss

Debug

agent-browser trace start [path]      # Start recording trace
agent-browser trace stop [path]       # Stop and save trace
agent-browser console                 # View console messages (log, error, warn, info)
agent-browser console --clear         # Clear console
agent-browser errors                  # View page errors (uncaught JavaScript exceptions)
agent-browser errors --clear          # Clear errors
agent-browser highlight <sel>         # Highlight element
agent-browser state save <path>       # Save auth state
agent-browser state load <path>       # Load auth state

Navigation

agent-browser back                    # Go back
agent-browser forward                 # Go forward
agent-browser reload                  # Reload page

Setup

agent-browser install                 # Download Chromium browser
agent-browser install --with-deps     # Also install system deps (Linux)

Sessions

Run multiple isolated browser instances:

# Different sessions
agent-browser --session agent1 open site-a.com
agent-browser --session agent2 open site-b.com

# Or via environment variable
AGENT_BROWSER_SESSION=agent1 agent-browser click "#btn"

# List active sessions
agent-browser session list
# Output:
# Active sessions:
# -> default
#    agent1

# Show current session
agent-browser session

Each session has its own:

Browser instance
Cookies and storage
Navigation history
Authentication state

Persistent Profiles

By default, browser state (cookies, localStorage, login sessions) is ephemeral and lost when the browser closes. Use --profile to persist state across browser restarts:

# Use a persistent profile directory
agent-browser --profile ~/.myapp-profile open myapp.com

# Login once, then reuse the authenticated session
agent-browser --profile ~/.myapp-profile open myapp.com/dashboard

# Or via environment variable
AGENT_BROWSER_PROFILE=~/.myapp-profile agent-browser open myapp.com

The profile directory stores:

Cookies and localStorage
IndexedDB data
Service workers
Browser cache
Login sessions

Tip: Use different profile paths for different projects to keep their browser state isolated.

Snapshot Options

The snapshot command supports filtering to reduce output size:

agent-browser snapshot                    # Full accessibility tree
agent-browser snapshot -i                 # Interactive elements only (buttons, inputs, links)
agent-browser snapshot -c                 # Compact (remove empty structural elements)
agent-browser snapshot -d 3               # Limit depth to 3 levels
agent-browser snapshot -s "#main"         # Scope to CSS selector
agent-browser snapshot -i -c -d 5         # Combine options

| Option | Description | |--------|-------------| |

五维分析

清晰度9/10

创新性7/10

实用性8/10

完整性9/10

可维护性8/10

优缺点分析

优点

快速高效的无头浏览器自动化
支持多种网页交互命令
灵活的会话和配置文件管理

缺点

需要安装依赖项
新用户学习曲线较陡
高级功能文档有限

agent-browser

💡 摘要

🎯 适合人群

agent-browser

Installation

npm (recommended)

From Source

Linux Dependencies

Quick Start

Traditional Selectors (also supported)

Commands

Core Commands

Get Info

Check State

Find Elements (Semantic Locators)

Wait

Mouse Control

Browser Settings

Cookies & Storage

Network

Tabs & Windows

Frames

Dialogs

Debug

Navigation

Setup

Sessions

Persistent Profiles

Snapshot Options

优点

缺点

相关技能

tonejs-skill

payload

audit-website