💡 摘要
该项目使AI助手能够通过MCP服务器自动化网页浏览任务。
🎯 适合人群
🤖 AI 吐槽: “看起来很能打,但别让配置把人劝退。”
风险:Medium。建议检查:是否执行 shell/命令行指令;是否发起外网请求(SSRF/数据外发);API Key/Token 的获取、存储与泄露风险;文件读写范围与路径穿越风险。以最小权限运行,并在生产环境启用前审计代码与依赖。
mcp-server-browser-use
MCP server that gives AI assistants the power to control a web browser.
Table of Contents
- What is this?
- Installation
- Web UI
- Web Dashboard
- Configuration
- CLI Reference
- MCP Tools
- Deep Research
- Observability
- Skills System
- REST API Reference
- Architecture
- License
What is this?
This wraps browser-use as an MCP server, letting Claude (or any MCP client) automate a real browser—navigate pages, fill forms, click buttons, extract data, and more.
Why HTTP instead of stdio?
Browser automation tasks take 30-120+ seconds. The standard MCP stdio transport has timeout issues with long-running operations—connections drop mid-task. HTTP transport solves this by running as a persistent daemon that handles requests reliably regardless of duration.
Installation
Claude Code Plugin (Recommended)
Install as a Claude Code plugin for automatic setup:
# Install the plugin /plugin install browser-use/mcp-browser-use
The plugin automatically:
- Installs Playwright browsers on first run
- Starts the HTTP daemon when Claude Code starts
- Registers the MCP server with Claude
Set your API key (the browser agent needs an LLM to decide actions):
# Set API key (environment variable - recommended) export GEMINI_API_KEY=your-key-here # Or use config file mcp-server-browser-use config set -k llm.api_key -v your-key-here
That's it! Claude can now use browser automation tools.
Manual Installation
For other MCP clients or standalone use:
# Clone and install git clone https://github.com/Saik0s/mcp-browser-use.git cd mcp-server-browser-use uv sync # Install browser uv run playwright install chromium # Start the server uv run mcp-server-browser-use server
Add to Claude Desktop (~/Library/Application Support/Claude/claude_desktop_config.json):
{ "mcpServers": { "browser-use": { "type": "streamable-http", "url": "http://localhost:8383/mcp" } } }
For MCP clients that don't support HTTP transport, use mcp-remote as a proxy:
{ "mcpServers": { "browser-use": { "command": "npx", "args": ["mcp-remote", "http://localhost:8383/mcp"] } } }
Web UI
Access the task viewer at http://localhost:8383 when the daemon is running.
Features:
- Real-time task list with status and progress
- Task details with execution logs
- Server health status and uptime
- Running tasks monitoring
The web UI provides visibility into browser automation tasks without requiring CLI commands.
Web Dashboard
Access the full-featured dashboard at http://localhost:8383/dashboard when the daemon is running.
Features:
- Tasks Tab: Complete task history with filtering, real-time status updates, and detailed execution logs
- Skills Tab: Browse, inspect, and manage learned skills with usage statistics
- History Tab: Historical view of all completed tasks with filtering by status and time
Key Capabilities:
- Run existing skills directly from the dashboard with custom parameters
- Start learning sessions to capture new skills
- Delete outdated or invalid skills
- Monitor running tasks with live progress updates
- View full task results and error details
The dashboard provides a comprehensive web interface for managing all aspects of browser automation without CLI commands.
Configuration
Settings are stored in ~/.config/mcp-server-browser-use/config.json.
View current config:
mcp-server-browser-use config view
Change settings:
mcp-server-browser-use config set -k llm.provider -v openai mcp-server-browser-use config set -k llm.model_name -v gpt-4o # Note: Set API keys via environment variables (e.g., ANTHROPIC_API_KEY) for better security # mcp-server-browser-use config set -k llm.api_key -v sk-... mcp-server-browser-use config set -k browser.headless -v false mcp-server-browser-use config set -k agent.max_steps -v 30
Settings Reference
| Key | Default | Description |
|-----|---------|-------------|
| llm.provider | google | LLM provider (anthropic, openai, google, azure_openai, groq, deepseek, cerebras, ollama, bedrock, browser_use, openrouter, vercel) |
| llm.model_name | gemini-3-flash-preview | Model for the browser agent |
| llm.api_key | - | API key for the provider (prefer env vars: GEMINI_API_KEY, ANTHROPIC_API_KEY, etc.) |
| browser.headless | true | Run browser without GUI |
| browser.cdp_url | - | Connect to existing Chrome (e.g., http://localhost:9222) |
| browser.user_data_dir | - | Chrome profile directory for persistent logins/cookies |
| browser.chromium_sandbox | true | Enable Chromium sandboxing for security |
| agent.max_steps | 20 | Max steps per browser task |
| agent.use_vision | true | Enable vision capabilities for the agent |
| research.max_searches | 5 | Max searches per research task |
| research.search_timeout | - | Timeout for individual searches |
| server.host | 127.0.0.1 | Server bind address |
| server.port | 8383 | Server port |
| server.results_dir | - | Directory to save results |
| server.auth_token | - | Auth token for non-localhost connections |
| skills.enabled | false | Enable skills system (beta - disabled by default) |
| skills.directory | ~/.config/browser-skills | Skills storage location |
| skills.validate_results | true | Validate skill execution results |
Config Priority
Environment Variables > Config File > Defaults
Environment variables use prefix MCP_ + section + _ + key (e.g., MCP_LLM_PROVIDER).
Using Your Own Browser
Option 1: Persistent Profile (Recommended)
Use a dedicated Chrome profile to preserve logins and cookies:
# Set user data directory mcp-server-browser-use config set -k browser.user_data_dir -v ~/.chrome-browser-use
Option 2: Connect to Existing Chrome
Connect to an existing Chrome instance (useful for advanced debugging):
# Launch Chrome with debugging enabled google-chrome --remote-debugging-port=9222 # Configure CDP connection (localhost only for security) mcp-server-browser-use config set -k browser.cdp_url -v http://localhost:9222
CLI Reference
Server Management
mcp-server-browser-use server # Start as background daemon mcp-server-browser-use server -f # Start in foreground (for debugging) mcp-server-browser-use status # Check if running mcp-server-browser-use stop # Stop the daemon mcp-server-browser-use logs -f # Tail server logs
Calling Tools
mcp-server-browser-use tools # List all available MCP tools mcp-server-browser-use call run_browser_agent task="Go to google.com" mcp-server-browser-use call run_deep_research topic="quantum computing"
Configuration
mcp-server-browser-use config view # Show all settings mcp-server-browser-use config set -k <key> -v <value> mcp-server-browser-use config path # Show config file location
Observability
mcp-server-browser-use tasks # List recent tasks mcp-server-browser-use tasks --status running mcp-server-browser-use task <id> # Get task details mcp-server-browser-use task cancel <id> # Cancel a running task mcp-server-browser-use health # Server health + stats
Skills Management
mcp-server-browser-use call skill_list mcp-server-browser-use call skill_get name="my-skill" mcp-server-browser-use call skill_delete name="my-skill"
Tip: Skills can also be managed through the web dashboard at http://localhost:8383/dashboard for a visual interface with one-click execution and learning sessions.
MCP Tools
These tools are exposed via MCP for AI clients:
| Tool | Description | Typical Duration |
|------|-------------|------------------|
| run_browser_agent | Execute browser automation tasks | 60-120s |
| run_deep_research | Multi-search research with synthesis | 2-5 min |
| skill_list | List learned skills | <1s |
| skill_get | Get skill definition | <1s |
| skill_delete | Delete a skill | <1s |
| health_check | Server status and running tasks | <1s |
| task_list | Query task history | <1s |
| task_get | Get full task details | <1s |
run_browser_agent
The main tool. Tell it what you want in plain English:
mcp-server-browser-use call run_browser_agent \ task="Find the price of iPhone 16 Pro on Apple's website"
The agent launches a browser, navigates to apple.com, finds the product, and returns the price.
Parameters:
| Parameter | Type | Description |
|-----------|------|-------------|
| task | string | What to do (required) |
| max_steps | int | Override default max steps |
| skill_name | string | Use a learned skill |
| skill_params | JSON | Parameters for the skill |
| learn | bool | Enable learning mode |
| save_skill_as | string | Name for the learned skill |
run_deep_research
Multi-step web research with automatic synthesis:
mcp-server-browser-use call run_deep_research \ topic="Latest developments in quantum computing" \ max_searches=5
The agent searches multiple sources, extracts key findings, and compiles a markdown report.
Deep Research
Deep research executes a 3-phase workflow:
┌─────────────────────────────────────────────────────────┐
│ Phase 1: PLANNING │
│ LLM generates 3-5 focused search queries from topic │
└─────────────────────────────┬───────────────────────────┘
优点
- 通过HTTP支持多个AI客户端
- 用户友好的网页UI进行任务管理
- 灵活的配置选项
- 实时监控任务
缺点
- 需要额外依赖的设置
- API密钥存在潜在安全风险
- 对非技术用户来说较复杂
- 高级功能的文档有限
相关技能
免责声明:本内容来源于 GitHub 开源项目,仅供展示和评分分析使用。
版权归原作者所有 Saik0s.
