Auto-Pilot / 全自动

更新于 4 months ago

hosted-agents

Name: hosted-agents
Rating: 3.8 (7409 reviews)
Author: muratcankoylan

Mmuratcankoylan

7.4k

muratcankoylan/Agent-Skills-for-Context-Engineering/skills/hosted-agents

Agent 评分

💡 摘要

中文总结。

🎯 适合人群

AI基础设施工程师构建智能体平台的DevOps/SRE工程师AI编程工具的产品经理扩展协作式AI工作流的工程团队

🤖 AI 吐槽: “看起来很能打，但别让配置把人劝退。”

安全分析中风险

风险：Medium。建议检查：是否发起外网请求（SSRF/数据外发）；API Key/Token 的获取、存储与泄露风险；文件读写范围与路径穿越风险。以最小权限运行，并在生产环境启用前审计代码与依赖。

name: hosted-agents description: This skill should be used when the user asks to "build background agent", "create hosted coding agent", "set up sandboxed execution", "implement multiplayer agent", or mentions background agents, sandboxed VMs, agent infrastructure, Modal sandboxes, self-spawning agents, or remote coding environments.

Hosted Agent Infrastructure

Hosted agents run in remote sandboxed environments rather than on local machines. When designed well, they provide unlimited concurrency, consistent execution environments, and multiplayer collaboration. The critical insight is that session speed should be limited only by model provider time-to-first-token, with all infrastructure setup completed before the user starts their session.

When to Activate

Activate this skill when:

Building background coding agents that run independently of user devices
Designing sandboxed execution environments for agent workloads
Implementing multiplayer agent sessions with shared state
Creating multi-client agent interfaces (Slack, Web, Chrome extensions)
Scaling agent infrastructure beyond local machine constraints
Building systems where agents spawn sub-agents for parallel work

Core Concepts

Hosted agents address the fundamental limitation of local agent execution: resource contention, environment inconsistency, and single-user constraints. By moving agent execution to remote sandboxed environments, teams gain unlimited concurrency, reproducible environments, and collaborative workflows.

The architecture consists of three layers: sandbox infrastructure for isolated execution, API layer for state management and client coordination, and client interfaces for user interaction across platforms. Each layer has specific design requirements that enable the system to scale.

Detailed Topics

Sandbox Infrastructure

The Core Challenge Spinning up full development environments quickly is the primary technical challenge. Users expect near-instant session starts, but development environments require cloning repositories, installing dependencies, and running build steps.

Image Registry Pattern Pre-build environment images on a regular cadence (every 30 minutes works well). Each image contains:

Cloned repository at a known commit
All runtime dependencies installed
Initial setup and build commands completed
Cached files from running app and test suite once

When starting a session, spin up a sandbox from the most recent image. The repository is at most 30 minutes out of date, making synchronization with the latest code much faster.

Snapshot and Restore Take filesystem snapshots at key points:

After initial image build (base snapshot)
When agent finishes making changes (session snapshot)
Before sandbox exit for potential follow-up

This enables instant restoration for follow-up prompts without re-running setup.

Git Configuration for Background Agents Since git operations are not tied to a specific user during image builds:

Generate GitHub app installation tokens for repository access during clone
Update git config's user.name and user.email when committing and pushing changes
Use the prompting user's identity for commits, not the app identity

Warm Pool Strategy Maintain a pool of pre-warmed sandboxes for high-volume repositories:

Sandboxes are ready before users start sessions
Expire and recreate pool entries as new image builds complete
Start warming sandbox as soon as user begins typing (predictive warm-up)

Agent Framework Selection

Server-First Architecture Choose an agent framework structured as a server first, with TUI and desktop apps as clients. This enables:

Multiple custom clients without duplicating agent logic
Consistent behavior across all interaction surfaces
Plugin systems for extending functionality
Event-driven architectures for real-time updates

Code as Source of Truth Select frameworks where the agent can read its own source code to understand behavior. This is underrated in AI development: having the code as source of truth prevents hallucination about the agent's own capabilities.

Plugin System Requirements The framework should support plugins that:

Listen to tool execution events (e.g., tool.execute.before)
Block or modify tool calls conditionally
Inject context or state at runtime

Speed Optimizations

Predictive Warm-Up Start warming the sandbox as soon as a user begins typing their prompt:

Clone latest changes in parallel with user typing
Run initial setup before user hits enter
For fast spin-up, sandbox can be ready before user finishes typing

Parallel File Reading Allow the agent to start reading files immediately, even if sync from latest base branch is not complete:

In large repositories, incoming prompts rarely modify recently-changed files
Agent can research immediately without waiting for git sync
Block file edits (not reads) until synchronization completes

Maximize Build-Time Work Move everything possible to the image build step:

Full dependency installation
Database schema setup
Initial app and test suite runs (populates caches)
Build-time duration is invisible to users

Self-Spawning Agents

Agent-Spawned Sessions Create tools that allow agents to spawn new sessions:

Research tasks across different repositories
Parallel subtask execution for large changes
Multiple smaller PRs from one major task

Frontier models are capable of containing themselves. The tools should:

Start a new session with specified parameters
Read status of any session (check-in capability)
Continue main work while sub-sessions run in parallel

Prompt Engineering for Self-Spawning Engineer prompts to guide when agents spawn sub-sessions:

Research tasks that require cross-repository exploration
Breaking monolithic changes into smaller PRs
Parallel exploration of different approaches

API Layer

Per-Session State Isolation Each session requires its own isolated state storage:

Dedicated database per session (SQLite per session works well)
No session can impact another's performance
Handles hundreds of concurrent sessions

Real-Time Streaming Agent work involves high-frequency updates:

Token streaming from model providers
Tool execution status updates
File change notifications

WebSocket connections with hibernation APIs reduce compute costs during idle periods while maintaining open connections.

Synchronization Across Clients Build a single state system that synchronizes across:

Chat interfaces
Slack bots
Chrome extensions
Web interfaces
VS Code instances

All changes sync to the session state, enabling seamless client switching.

Multiplayer Support

Why Multiplayer Matters Multiplayer enables:

Teaching non-engineers to use AI effectively
Live QA sessions with multiple team members
Real-time PR review with immediate changes
Collaborative debugging sessions

Implementation Requirements

Data model must not tie sessions to single authors
Pass authorship info to each prompt
Attribute code changes to the prompting user
Share session links for instant collaboration

With proper synchronization architecture, multiplayer support is nearly free to add.

Authentication and Authorization

User-Based Commits Use GitHub authentication to:

Obtain user tokens for PR creation
Open PRs on behalf of the user (not the app)
Prevent users from approving their own changes

Sandbox-to-API Flow

Sandbox pushes changes (updating git user config)
Sandbox sends event to API with branch name and session ID
API uses user's GitHub token to create PR
GitHub webhooks notify API of PR events

Client Implementations

Slack Integration The most effective distribution channel for internal adoption:

Creates virality loop as team members see others using it
No syntax required, natural chat interface
Classify repository from message, thread context, and channel name

Build a classifier to determine which repository to work in:

Fast model with descriptions of available repositories
Include hints for common repositories
Allow "unknown" option for ambiguous cases

Web Interface Core features:

Works on desktop and mobile
Real-time streaming of agent work
Hosted VS Code instance running inside sandbox
Streamed desktop view for visual verification
Before/after screenshots for PRs

Statistics page showing:

Sessions resulting in merged PRs (primary metric)
Usage over time
Live "humans prompting" count (prompts in last 5 minutes)

Chrome Extension For non-engineering users:

Sidebar chat interface with screenshot tool
DOM and React internals extraction instead of raw images
Reduces token usage while maintaining precision
Distribute via managed device policy (bypasses Chrome Web Store)

Practical Guidance

Follow-Up Message Handling

Decide how to handle messages sent during execution:

Queue approach: Messages wait until current prompt completes
Insert approach: Messages are processed immediately

Queueing is simpler to manage and lets users send thoughts on next steps while agent works. Build mechanism to stop agent mid-execution when needed.

Metrics That Matter

Track metrics that indicate real value:

Sessions resulting in merged PRs (primary success metric)
Time from session start to first model response
PR approval rate and revision count
Agent-written code percentage across repositories

Adoption Strategy

Internal adoption patterns that work:

Work in public spaces (Slack channels) for visibility
Let the product create virality loops
Don't force usage over existing tools
Build to people's needs, not hypothetical requirements

Guidelines

Pre-build environment images on regular cadence (30 minutes is a good default)
Start warming sandboxes when users begin typing, not when they submit 3

五维分析

清晰度7/10

创新性8/10

实用性9/10

完整性8/10

可维护性6/10

优缺点分析

优点

提供了可扩展、沙盒化智能体部署的全面指南。
强调了预加热等关键性能优化。
支持有价值的多用户和协作用例。

缺点

极其复杂，需要大量定制基础设施。
README是设计文档，而非即用型技能/库。
缺乏具体代码，导致实现模糊不清。