Co-Pilot
Updated a month ago

agent-browser

Vvercel-labs
9.8k
vercel-labs/agent-browser
82
Agent Score

💡 Summary

A headless browser automation CLI for AI agents, enabling fast and flexible web interactions.

🎯 Target Audience

Web developersQA engineersAutomation testersData scrapersAI researchers

🤖 AI Roast:Powerful, but the setup might scare off the impatient.

Security AnalysisMedium Risk

Risk: Medium. Review: shell/CLI command execution; outbound network access (SSRF, data egress); API keys/tokens handling and storage; filesystem read/write scope and path traversal; dependency pinning and supply-chain risk. Run with least privilege and audit before enabling in production.

agent-browser

Headless browser automation CLI for AI agents. Fast Rust CLI with Node.js fallback.

Installation

npm (recommended)

npm install -g agent-browser agent-browser install # Download Chromium

From Source

git clone https://github.com/vercel-labs/agent-browser cd agent-browser pnpm install pnpm build pnpm build:native # Requires Rust (https://rustup.rs) pnpm link --global # Makes agent-browser available globally agent-browser install

Linux Dependencies

On Linux, install system dependencies:

agent-browser install --with-deps # or manually: npx playwright install-deps chromium

Quick Start

agent-browser open example.com agent-browser snapshot # Get accessibility tree with refs agent-browser click @e2 # Click by ref from snapshot agent-browser fill @e3 "test@example.com" # Fill by ref agent-browser get text @e1 # Get text by ref agent-browser screenshot page.png agent-browser close

Traditional Selectors (also supported)

agent-browser click "#submit" agent-browser fill "#email" "test@example.com" agent-browser find role button click --name "Submit"

Commands

Core Commands

agent-browser open <url> # Navigate to URL (aliases: goto, navigate) agent-browser click <sel> # Click element agent-browser dblclick <sel> # Double-click element agent-browser focus <sel> # Focus element agent-browser type <sel> <text> # Type into element agent-browser fill <sel> <text> # Clear and fill agent-browser press <key> # Press key (Enter, Tab, Control+a) (alias: key) agent-browser keydown <key> # Hold key down agent-browser keyup <key> # Release key agent-browser hover <sel> # Hover element agent-browser select <sel> <val> # Select dropdown option agent-browser check <sel> # Check checkbox agent-browser uncheck <sel> # Uncheck checkbox agent-browser scroll <dir> [px] # Scroll (up/down/left/right) agent-browser scrollintoview <sel> # Scroll element into view (alias: scrollinto) agent-browser drag <src> <tgt> # Drag and drop agent-browser upload <sel> <files> # Upload files agent-browser screenshot [path] # Take screenshot (--full for full page, base64 png to stdout if no path) agent-browser pdf <path> # Save as PDF agent-browser snapshot # Accessibility tree with refs (best for AI) agent-browser eval <js> # Run JavaScript agent-browser connect <port> # Connect to browser via CDP agent-browser close # Close browser (aliases: quit, exit)

Get Info

agent-browser get text <sel> # Get text content agent-browser get html <sel> # Get innerHTML agent-browser get value <sel> # Get input value agent-browser get attr <sel> <attr> # Get attribute agent-browser get title # Get page title agent-browser get url # Get current URL agent-browser get count <sel> # Count matching elements agent-browser get box <sel> # Get bounding box

Check State

agent-browser is visible <sel> # Check if visible agent-browser is enabled <sel> # Check if enabled agent-browser is checked <sel> # Check if checked

Find Elements (Semantic Locators)

agent-browser find role <role> <action> [value] # By ARIA role agent-browser find text <text> <action> # By text content agent-browser find label <label> <action> [value] # By label agent-browser find placeholder <ph> <action> [value] # By placeholder agent-browser find alt <text> <action> # By alt text agent-browser find title <text> <action> # By title attr agent-browser find testid <id> <action> [value] # By data-testid agent-browser find first <sel> <action> [value] # First match agent-browser find last <sel> <action> [value] # Last match agent-browser find nth <n> <sel> <action> [value] # Nth match

Actions: click, fill, check, hover, text

Examples:

agent-browser find role button click --name "Submit" agent-browser find text "Sign In" click agent-browser find label "Email" fill "test@test.com" agent-browser find first ".item" click agent-browser find nth 2 "a" text

Wait

agent-browser wait <selector> # Wait for element to be visible agent-browser wait <ms> # Wait for time (milliseconds) agent-browser wait --text "Welcome" # Wait for text to appear agent-browser wait --url "**/dash" # Wait for URL pattern agent-browser wait --load networkidle # Wait for load state agent-browser wait --fn "window.ready === true" # Wait for JS condition

Load states: load, domcontentloaded, networkidle

Mouse Control

agent-browser mouse move <x> <y> # Move mouse agent-browser mouse down [button] # Press button (left/right/middle) agent-browser mouse up [button] # Release button agent-browser mouse wheel <dy> [dx] # Scroll wheel

Browser Settings

agent-browser set viewport <w> <h> # Set viewport size agent-browser set device <name> # Emulate device ("iPhone 14") agent-browser set geo <lat> <lng> # Set geolocation agent-browser set offline [on|off] # Toggle offline mode agent-browser set headers <json> # Extra HTTP headers agent-browser set credentials <u> <p> # HTTP basic auth agent-browser set media [dark|light] # Emulate color scheme

Cookies & Storage

agent-browser cookies # Get all cookies agent-browser cookies set <name> <val> # Set cookie agent-browser cookies clear # Clear cookies agent-browser storage local # Get all localStorage agent-browser storage local <key> # Get specific key agent-browser storage local set <k> <v> # Set value agent-browser storage local clear # Clear all agent-browser storage session # Same for sessionStorage

Network

agent-browser network route <url> # Intercept requests agent-browser network route <url> --abort # Block requests agent-browser network route <url> --body <json> # Mock response agent-browser network unroute [url] # Remove routes agent-browser network requests # View tracked requests agent-browser network requests --filter api # Filter requests

Tabs & Windows

agent-browser tab # List tabs agent-browser tab new [url] # New tab (optionally with URL) agent-browser tab <n> # Switch to tab n agent-browser tab close [n] # Close tab agent-browser window new # New window

Frames

agent-browser frame <sel> # Switch to iframe agent-browser frame main # Back to main frame

Dialogs

agent-browser dialog accept [text] # Accept (with optional prompt text) agent-browser dialog dismiss # Dismiss

Debug

agent-browser trace start [path] # Start recording trace agent-browser trace stop [path] # Stop and save trace agent-browser console # View console messages (log, error, warn, info) agent-browser console --clear # Clear console agent-browser errors # View page errors (uncaught JavaScript exceptions) agent-browser errors --clear # Clear errors agent-browser highlight <sel> # Highlight element agent-browser state save <path> # Save auth state agent-browser state load <path> # Load auth state

Navigation

agent-browser back # Go back agent-browser forward # Go forward agent-browser reload # Reload page

Setup

agent-browser install # Download Chromium browser agent-browser install --with-deps # Also install system deps (Linux)

Sessions

Run multiple isolated browser instances:

# Different sessions agent-browser --session agent1 open site-a.com agent-browser --session agent2 open site-b.com # Or via environment variable AGENT_BROWSER_SESSION=agent1 agent-browser click "#btn" # List active sessions agent-browser session list # Output: # Active sessions: # -> default # agent1 # Show current session agent-browser session

Each session has its own:

  • Browser instance
  • Cookies and storage
  • Navigation history
  • Authentication state

Persistent Profiles

By default, browser state (cookies, localStorage, login sessions) is ephemeral and lost when the browser closes. Use --profile to persist state across browser restarts:

# Use a persistent profile directory agent-browser --profile ~/.myapp-profile open myapp.com # Login once, then reuse the authenticated session agent-browser --profile ~/.myapp-profile open myapp.com/dashboard # Or via environment variable AGENT_BROWSER_PROFILE=~/.myapp-profile agent-browser open myapp.com

The profile directory stores:

  • Cookies and localStorage
  • IndexedDB data
  • Service workers
  • Browser cache
  • Login sessions

Tip: Use different profile paths for different projects to keep their browser state isolated.

Snapshot Options

The snapshot command supports filtering to reduce output size:

agent-browser snapshot # Full accessibility tree agent-browser snapshot -i # Interactive elements only (buttons, inputs, links) agent-browser snapshot -c # Compact (remove empty structural elements) agent-browser snapshot -d 3 # Limit depth to 3 levels agent-browser snapshot -s "#main" # Scope to CSS selector agent-browser snapshot -i -c -d 5 # Combine options

| Option | Description | |--------|-------------| |

5-Dim Analysis
Clarity9/10
Novelty7/10
Utility8/10
Completeness9/10
Maintainability8/10
Pros & Cons

Pros

  • Fast and efficient headless browser automation
  • Supports various commands for web interactions
  • Flexible session and profile management

Cons

  • Requires installation of dependencies
  • Learning curve for new users
  • Limited documentation on advanced features

Related Skills

tonejs-skill

A
toolCo-Pilot
88/ 100

“Powerful, but the setup might scare off the impatient.”

payload

A
toolCo-Pilot
86/ 100

“Payload's architecture may expose risks such as dependency vulnerabilities and potential CSRF attacks. Regular updates and using secure coding practices can mitigate these risks.”

audit-website

A
toolCo-Pilot
86/ 100

“Powerful, but the setup might scare off the impatient.”

Disclaimer: This content is sourced from GitHub open source projects for display and rating purposes only.

Copyright belongs to the original author vercel-labs.