Co-Pilot / 辅助式
更新于 a month ago

agent-rdp

Tthisnick
0.0k
thisnick/agent-rdp
86
Agent 评分

💡 摘要

一个CLI工具,用于AI代理控制Windows远程桌面会话,具有多种自动化功能。

🎯 适合人群

系统管理员DevOps工程师软件测试人员远程支持技术人员AI开发人员

🤖 AI 吐槽:看起来很能打,但别让配置把人劝退。

安全分析中风险

风险:Medium。建议检查:是否执行 shell/命令行指令;是否发起外网请求(SSRF/数据外发);API Key/Token 的获取、存储与泄露风险;文件读写范围与路径穿越风险;依赖锁定与供应链风险。以最小权限运行,并在生产环境启用前审计代码与依赖。

agent-rdp

A CLI tool for AI agents to control Windows Remote Desktop sessions, built on IronRDP.

Demo

Claude Code automating SQLite database and table creation via RDP:

https://github.com/user-attachments/assets/91892b39-4edb-412b-b265-55ccd75d7421

Features

  • Connect to RDP servers - Full RDP protocol support with TLS and CredSSP authentication
  • Take screenshots - Capture the remote desktop as PNG or JPEG
  • Mouse control - Click, double-click, right-click, drag, scroll
  • Keyboard input - Type text, press key combinations (Ctrl+C, Alt+Tab, etc.)
  • Clipboard sync - Copy/paste text between local machine and remote Windows
  • Drive mapping - Map local directories as network drives on the remote machine
  • UI Automation - Interact with Windows applications via accessibility API (click, select, toggle, expand)
  • OCR text location - Find text on screen using OCR when UI Automation isn't available
  • JSON output - Structured output for AI agent consumption
  • Session management - Multiple named sessions with automatic daemon lifecycle

Installation

From npm

npm install -g agent-rdp

As a Claude Code skill

npx add-skill https://github.com/thisnick/agent-rdp

From source

git clone https://github.com/thisnick/agent-rdp cd agent-rdp pnpm install pnpm build # Build native binary pnpm build:ts # Build TypeScript

Usage

Connect to an RDP Server

# Using command line (password visible in process list - not recommended) agent-rdp connect --host 192.168.1.100 --username Administrator --password 'secret' # Using environment variables (recommended) export AGENT_RDP_USERNAME=Administrator export AGENT_RDP_PASSWORD=secret agent-rdp connect --host 192.168.1.100 # Using stdin (most secure) echo 'secret' | agent-rdp connect --host 192.168.1.100 --username Administrator --password-stdin

Take a Screenshot

# Save to file agent-rdp screenshot --output desktop.png # Output as base64 (for AI agents) agent-rdp screenshot --base64 # With JSON output agent-rdp --json screenshot --base64

Mouse Operations

# Click at position agent-rdp mouse click 500 300 # Right-click agent-rdp mouse right-click 500 300 # Double-click agent-rdp mouse double-click 500 300 # Move cursor agent-rdp mouse move 100 200 # Drag from (100,100) to (500,500) agent-rdp mouse drag 100 100 500 500

Keyboard Operations

# Type text (supports Unicode) agent-rdp keyboard type "Hello, World!" # Press key combinations agent-rdp keyboard press "ctrl+c" agent-rdp keyboard press "alt+tab" agent-rdp keyboard press "ctrl+shift+esc" # Press single keys (use press command) agent-rdp keyboard press enter agent-rdp keyboard press escape agent-rdp keyboard press f5

Scroll

agent-rdp scroll up --amount 3 agent-rdp scroll down --amount 5 agent-rdp scroll left agent-rdp scroll right

Locate (OCR)

Find text on screen using OCR (powered by ocrs). Useful when UI Automation can't access certain elements (WebView content, some dialogs).

# Find lines containing text agent-rdp locate "Cancel" # Pattern matching (glob-style) agent-rdp locate "Save*" --pattern # Get all text on screen agent-rdp locate --all # JSON output agent-rdp locate "OK" --json

Returns text lines with coordinates for clicking:

Found 1 line(s) containing 'Cancel':
  'Cancel Button' at (650, 420) size 80x14 - center: (690, 427)

To click the first match: agent-rdp mouse click 690 427

Clipboard

# Set clipboard text (available when you paste on Windows) agent-rdp clipboard set "Hello from CLI" # Get clipboard text (after copying on Windows) agent-rdp clipboard get # With JSON output agent-rdp --json clipboard get

Drive Mapping

Map local directories as network drives on the remote Windows machine. Drives must be mapped at connect time. Multiple drives can be specified.

# Map local directories during connection agent-rdp connect --host 192.168.1.100 -u Administrator -p secret \ --drive /home/user/documents:Documents \ --drive /tmp/shared:Shared # List mapped drives agent-rdp drive list

On the remote Windows machine, mapped drives appear in File Explorer as network locations.

UI Automation

Interact with Windows applications programmatically via the Windows UI Automation API using native patterns (InvokePattern, SelectionItemPattern, TogglePattern, etc.). When enabled, a PowerShell agent is injected into the remote session that captures the accessibility tree and performs actions. Communication between the CLI and the agent uses a Dynamic Virtual Channel (DVC) for fast bidirectional IPC.

For detailed documentation, see docs/AUTOMATION.md.

# Connect with automation enabled agent-rdp connect --host 192.168.1.100 -u Admin -p secret --enable-win-automation # Take an accessibility tree snapshot (refs are always included) agent-rdp automate snapshot # Snapshot filtering options (like agent-browser) agent-rdp automate snapshot -i # Interactive elements only agent-rdp automate snapshot -c # Compact (remove empty structural elements) agent-rdp automate snapshot -d 3 # Limit depth to 3 levels agent-rdp automate snapshot -s "~*Notepad*" # Scope to a window/element agent-rdp automate snapshot -i -c -d 5 # Combine options # Pattern-based element operations (refs use @eN format) agent-rdp automate click "#SaveButton" # Click button agent-rdp automate click "@e5" # Click by ref number from snapshot agent-rdp automate click "@e5" -d # Double-click (for file list items) agent-rdp automate select "@e10" # Select item (SelectionItemPattern) agent-rdp automate toggle "@e7" # Toggle checkbox (TogglePattern) agent-rdp automate expand "@e3" # Expand menu (ExpandCollapsePattern) agent-rdp automate context-menu "@e5" # Open context menu (Shift+F10) # Fill text fields agent-rdp automate fill ".Edit" "Hello World" # Window operations agent-rdp automate window list agent-rdp automate window focus "~*Notepad*" # Run PowerShell commands agent-rdp automate run "Get-Process" --wait agent-rdp automate run "Get-Process" --wait --process-timeout 5000 # With 5s timeout

Selector Types:

  • @e5 or @5 - Reference number from snapshot (e prefix recommended)
  • #SaveButton - Automation ID
  • .Edit - Win32 class name
  • ~*pattern* - Wildcard name match
  • File - Element name (exact match)

Snapshot Output Format:

- Window "Notepad" [ref=e1, id=Notepad]
  - MenuBar "Application" [ref=e2]
    - MenuItem "File" [ref=e3]
  - Edit "Text Editor" [ref=e5, value="Hello"]

Session Management

# List active sessions agent-rdp session list # Get current session info agent-rdp session info # Close a session agent-rdp session close # Use a named session agent-rdp --session work connect --host work-pc.local ... agent-rdp --session work screenshot

Disconnect

agent-rdp disconnect

Web Viewer

Open the web-based viewer to see the remote desktop in your browser:

# Open viewer (connects to default streaming port 9224) agent-rdp view # Specify a different port agent-rdp view --port 9224

The viewer requires WebSocket streaming to be enabled. Start a session with streaming:

agent-rdp --stream-port 9224 connect --host 192.168.1.100 -u Admin -p secret agent-rdp view

JSON Output

All commands support --json for structured output:

agent-rdp --json screenshot --base64

Success response:

{ "success": true, "data": { "type": "screenshot", "width": 1920, "height": 1080, "format": "png", "base64": "iVBORw0KGgo..." } }

Error response:

{ "success": false, "error": { "code": "not_connected", "message": "Not connected to an RDP server" } }

Environment Variables

| Variable | Description | |----------|-------------| | AGENT_RDP_HOST | RDP server hostname or IP | | AGENT_RDP_PORT | RDP server port (default: 3389) | | AGENT_RDP_USERNAME | RDP username | | AGENT_RDP_PASSWORD | RDP password | | AGENT_RDP_SESSION | Session name (default: "default") | | AGENT_RDP_STREAM_PORT | WebSocket streaming port (0 = disabled) |

Node.js API

Use agent-rdp programmatically from Node.js/TypeScript:

import { RdpSession } from 'agent-rdp'; const rdp = new RdpSession({ session: 'default' }); await rdp.connect({ host: '192.168.1.100', username: 'Administrator', password: 'secret', width: 1280, height: 800, drives: [{ path: '/tmp/share', name: 'Share' }], enableWinAutomation: true, // Enable UI Automation }); // Screenshot const { base64, width, height } = await rdp.screenshot({ format: 'png' }); // Mouse await rdp.mouse.click({ x: 100, y: 200 }); await rdp.mouse.rightClick({ x: 100, y: 200 }); await rdp.mouse.doubleClick({ x: 100, y: 200 }); await rdp.mouse.move({ x: 150, y: 250 }); await rdp.mouse.drag({ from: { x: 100, y: 100 }, to: { x: 500, y: 500 } }); // Keyboard await rdp.keyboard.type({ text: 'Hello World' }); await rdp.keyboard.press({ keys: 'ctrl+c' }); await rdp.keyboard.press({ keys: 'enter' }); // Single keys use press() // Scroll await rdp.scroll.up(); // Default amount: 3 await rdp.scroll.down({ amount: 5 }); // Custom amount await rdp.scroll.up({ x: 500, y: 300 }); // Scroll at position // Clipboard await rdp.clipboard.set({ text: 'text to copy' }); const text = await rdp.clipboard.get(); // Locate text using OCR const matches = await rdp.locate({ text: 'Cancel' }); if (matches.length > 0) { await rdp.mouse.click({ x: matches[0].center_x, y: matches[0].center_y }); } // Get all text on screen const allText = await rdp.locate({ all: true }); //
五维分析
清晰度9/10
创新性8/10
实用性9/10
完整性9/10
可维护性8/10
优缺点分析

优点

  • 全面的RDP控制功能
  • 支持自动化和脚本
  • 用于AI集成的JSON输出
  • 安全连接选项

缺点

  • 需要熟悉CLI
  • 密码处理存在潜在安全风险
  • 仅限于Windows环境
  • 依赖外部库

相关技能

odoo-upgrade-skill

A
toolCo-Pilot / 辅助式
86/ 100

“看起来很能打,但别让配置把人劝退。”

disk-cleaner

A
toolCo-Pilot / 辅助式
86/ 100

“看起来很能打,但别让配置把人劝退。”

gemini-cli

A
toolCo-Pilot / 辅助式
84/ 100

“这是一个基于终端的AI助手,它如此热心地提供帮助,以至于你可能在询问天气时,它就把你的整个代码库重写了。”

免责声明:本内容来源于 GitHub 开源项目,仅供展示和评分分析使用。

版权归原作者所有 thisnick.