Co-Pilot
Updated a month ago

agent-rdp

Tthisnick
0.0k
thisnick/agent-rdp
86
Agent Score

💡 Summary

A CLI tool for AI agents to control Windows Remote Desktop sessions with various automation features.

🎯 Target Audience

System AdministratorsDevOps EngineersSoftware TestersRemote Support TechniciansAI Developers

🤖 AI Roast:Powerful, but the setup might scare off the impatient.

Security AnalysisMedium Risk

Risk: Medium. Review: shell/CLI command execution; outbound network access (SSRF, data egress); API keys/tokens handling and storage; filesystem read/write scope and path traversal; dependency pinning and supply-chain risk. Run with least privilege and audit before enabling in production.

agent-rdp

A CLI tool for AI agents to control Windows Remote Desktop sessions, built on IronRDP.

Demo

Claude Code automating SQLite database and table creation via RDP:

https://github.com/user-attachments/assets/91892b39-4edb-412b-b265-55ccd75d7421

Features

  • Connect to RDP servers - Full RDP protocol support with TLS and CredSSP authentication
  • Take screenshots - Capture the remote desktop as PNG or JPEG
  • Mouse control - Click, double-click, right-click, drag, scroll
  • Keyboard input - Type text, press key combinations (Ctrl+C, Alt+Tab, etc.)
  • Clipboard sync - Copy/paste text between local machine and remote Windows
  • Drive mapping - Map local directories as network drives on the remote machine
  • UI Automation - Interact with Windows applications via accessibility API (click, select, toggle, expand)
  • OCR text location - Find text on screen using OCR when UI Automation isn't available
  • JSON output - Structured output for AI agent consumption
  • Session management - Multiple named sessions with automatic daemon lifecycle

Installation

From npm

npm install -g agent-rdp

As a Claude Code skill

npx add-skill https://github.com/thisnick/agent-rdp

From source

git clone https://github.com/thisnick/agent-rdp cd agent-rdp pnpm install pnpm build # Build native binary pnpm build:ts # Build TypeScript

Usage

Connect to an RDP Server

# Using command line (password visible in process list - not recommended) agent-rdp connect --host 192.168.1.100 --username Administrator --password 'secret' # Using environment variables (recommended) export AGENT_RDP_USERNAME=Administrator export AGENT_RDP_PASSWORD=secret agent-rdp connect --host 192.168.1.100 # Using stdin (most secure) echo 'secret' | agent-rdp connect --host 192.168.1.100 --username Administrator --password-stdin

Take a Screenshot

# Save to file agent-rdp screenshot --output desktop.png # Output as base64 (for AI agents) agent-rdp screenshot --base64 # With JSON output agent-rdp --json screenshot --base64

Mouse Operations

# Click at position agent-rdp mouse click 500 300 # Right-click agent-rdp mouse right-click 500 300 # Double-click agent-rdp mouse double-click 500 300 # Move cursor agent-rdp mouse move 100 200 # Drag from (100,100) to (500,500) agent-rdp mouse drag 100 100 500 500

Keyboard Operations

# Type text (supports Unicode) agent-rdp keyboard type "Hello, World!" # Press key combinations agent-rdp keyboard press "ctrl+c" agent-rdp keyboard press "alt+tab" agent-rdp keyboard press "ctrl+shift+esc" # Press single keys (use press command) agent-rdp keyboard press enter agent-rdp keyboard press escape agent-rdp keyboard press f5

Scroll

agent-rdp scroll up --amount 3 agent-rdp scroll down --amount 5 agent-rdp scroll left agent-rdp scroll right

Locate (OCR)

Find text on screen using OCR (powered by ocrs). Useful when UI Automation can't access certain elements (WebView content, some dialogs).

# Find lines containing text agent-rdp locate "Cancel" # Pattern matching (glob-style) agent-rdp locate "Save*" --pattern # Get all text on screen agent-rdp locate --all # JSON output agent-rdp locate "OK" --json

Returns text lines with coordinates for clicking:

Found 1 line(s) containing 'Cancel':
  'Cancel Button' at (650, 420) size 80x14 - center: (690, 427)

To click the first match: agent-rdp mouse click 690 427

Clipboard

# Set clipboard text (available when you paste on Windows) agent-rdp clipboard set "Hello from CLI" # Get clipboard text (after copying on Windows) agent-rdp clipboard get # With JSON output agent-rdp --json clipboard get

Drive Mapping

Map local directories as network drives on the remote Windows machine. Drives must be mapped at connect time. Multiple drives can be specified.

# Map local directories during connection agent-rdp connect --host 192.168.1.100 -u Administrator -p secret \ --drive /home/user/documents:Documents \ --drive /tmp/shared:Shared # List mapped drives agent-rdp drive list

On the remote Windows machine, mapped drives appear in File Explorer as network locations.

UI Automation

Interact with Windows applications programmatically via the Windows UI Automation API using native patterns (InvokePattern, SelectionItemPattern, TogglePattern, etc.). When enabled, a PowerShell agent is injected into the remote session that captures the accessibility tree and performs actions. Communication between the CLI and the agent uses a Dynamic Virtual Channel (DVC) for fast bidirectional IPC.

For detailed documentation, see docs/AUTOMATION.md.

# Connect with automation enabled agent-rdp connect --host 192.168.1.100 -u Admin -p secret --enable-win-automation # Take an accessibility tree snapshot (refs are always included) agent-rdp automate snapshot # Snapshot filtering options (like agent-browser) agent-rdp automate snapshot -i # Interactive elements only agent-rdp automate snapshot -c # Compact (remove empty structural elements) agent-rdp automate snapshot -d 3 # Limit depth to 3 levels agent-rdp automate snapshot -s "~*Notepad*" # Scope to a window/element agent-rdp automate snapshot -i -c -d 5 # Combine options # Pattern-based element operations (refs use @eN format) agent-rdp automate click "#SaveButton" # Click button agent-rdp automate click "@e5" # Click by ref number from snapshot agent-rdp automate click "@e5" -d # Double-click (for file list items) agent-rdp automate select "@e10" # Select item (SelectionItemPattern) agent-rdp automate toggle "@e7" # Toggle checkbox (TogglePattern) agent-rdp automate expand "@e3" # Expand menu (ExpandCollapsePattern) agent-rdp automate context-menu "@e5" # Open context menu (Shift+F10) # Fill text fields agent-rdp automate fill ".Edit" "Hello World" # Window operations agent-rdp automate window list agent-rdp automate window focus "~*Notepad*" # Run PowerShell commands agent-rdp automate run "Get-Process" --wait agent-rdp automate run "Get-Process" --wait --process-timeout 5000 # With 5s timeout

Selector Types:

  • @e5 or @5 - Reference number from snapshot (e prefix recommended)
  • #SaveButton - Automation ID
  • .Edit - Win32 class name
  • ~*pattern* - Wildcard name match
  • File - Element name (exact match)

Snapshot Output Format:

- Window "Notepad" [ref=e1, id=Notepad]
  - MenuBar "Application" [ref=e2]
    - MenuItem "File" [ref=e3]
  - Edit "Text Editor" [ref=e5, value="Hello"]

Session Management

# List active sessions agent-rdp session list # Get current session info agent-rdp session info # Close a session agent-rdp session close # Use a named session agent-rdp --session work connect --host work-pc.local ... agent-rdp --session work screenshot

Disconnect

agent-rdp disconnect

Web Viewer

Open the web-based viewer to see the remote desktop in your browser:

# Open viewer (connects to default streaming port 9224) agent-rdp view # Specify a different port agent-rdp view --port 9224

The viewer requires WebSocket streaming to be enabled. Start a session with streaming:

agent-rdp --stream-port 9224 connect --host 192.168.1.100 -u Admin -p secret agent-rdp view

JSON Output

All commands support --json for structured output:

agent-rdp --json screenshot --base64

Success response:

{ "success": true, "data": { "type": "screenshot", "width": 1920, "height": 1080, "format": "png", "base64": "iVBORw0KGgo..." } }

Error response:

{ "success": false, "error": { "code": "not_connected", "message": "Not connected to an RDP server" } }

Environment Variables

| Variable | Description | |----------|-------------| | AGENT_RDP_HOST | RDP server hostname or IP | | AGENT_RDP_PORT | RDP server port (default: 3389) | | AGENT_RDP_USERNAME | RDP username | | AGENT_RDP_PASSWORD | RDP password | | AGENT_RDP_SESSION | Session name (default: "default") | | AGENT_RDP_STREAM_PORT | WebSocket streaming port (0 = disabled) |

Node.js API

Use agent-rdp programmatically from Node.js/TypeScript:

import { RdpSession } from 'agent-rdp'; const rdp = new RdpSession({ session: 'default' }); await rdp.connect({ host: '192.168.1.100', username: 'Administrator', password: 'secret', width: 1280, height: 800, drives: [{ path: '/tmp/share', name: 'Share' }], enableWinAutomation: true, // Enable UI Automation }); // Screenshot const { base64, width, height } = await rdp.screenshot({ format: 'png' }); // Mouse await rdp.mouse.click({ x: 100, y: 200 }); await rdp.mouse.rightClick({ x: 100, y: 200 }); await rdp.mouse.doubleClick({ x: 100, y: 200 }); await rdp.mouse.move({ x: 150, y: 250 }); await rdp.mouse.drag({ from: { x: 100, y: 100 }, to: { x: 500, y: 500 } }); // Keyboard await rdp.keyboard.type({ text: 'Hello World' }); await rdp.keyboard.press({ keys: 'ctrl+c' }); await rdp.keyboard.press({ keys: 'enter' }); // Single keys use press() // Scroll await rdp.scroll.up(); // Default amount: 3 await rdp.scroll.down({ amount: 5 }); // Custom amount await rdp.scroll.up({ x: 500, y: 300 }); // Scroll at position // Clipboard await rdp.clipboard.set({ text: 'text to copy' }); const text = await rdp.clipboard.get(); // Locate text using OCR const matches = await rdp.locate({ text: 'Cancel' }); if (matches.length > 0) { await rdp.mouse.click({ x: matches[0].center_x, y: matches[0].center_y }); } // Get all text on screen const allText = await rdp.locate({ all: true }); //
5-Dim Analysis
Clarity9/10
Novelty8/10
Utility9/10
Completeness9/10
Maintainability8/10
Pros & Cons

Pros

  • Comprehensive RDP control features
  • Supports automation and scripting
  • JSON output for AI integration
  • Secure connection options

Cons

  • Requires familiarity with CLI
  • Potential security risks with password handling
  • Limited to Windows environments
  • Dependency on external libraries

Related Skills

odoo-upgrade-skill

A
toolCo-Pilot
86/ 100

“Powerful, but the setup might scare off the impatient.”

disk-cleaner

A
toolCo-Pilot
86/ 100

“Powerful, but the setup might scare off the impatient.”

gemini-cli

A
toolCo-Pilot
84/ 100

“It's a terminal-based AI assistant that's so eager to help, it might just rewrite your entire codebase while you're asking about the weather.”

Disclaimer: This content is sourced from GitHub open source projects for display and rating purposes only.

Copyright belongs to the original author thisnick.