Auto-Pilot / 全自动
更新于 a month ago

hosted-agents

Mmuratcankoylan
7.4k
muratcankoylan/Agent-Skills-for-Context-Engineering/skills/hosted-agents
76
Agent 评分

💡 摘要

中文总结。

🎯 适合人群

AI基础设施工程师构建智能体平台的DevOps/SRE工程师AI编程工具的产品经理扩展协作式AI工作流的工程团队

🤖 AI 吐槽:看起来很能打,但别让配置把人劝退。

安全分析中风险

风险:Medium。建议检查:是否发起外网请求(SSRF/数据外发);API Key/Token 的获取、存储与泄露风险;文件读写范围与路径穿越风险。以最小权限运行,并在生产环境启用前审计代码与依赖。


name: hosted-agents description: This skill should be used when the user asks to "build background agent", "create hosted coding agent", "set up sandboxed execution", "implement multiplayer agent", or mentions background agents, sandboxed VMs, agent infrastructure, Modal sandboxes, self-spawning agents, or remote coding environments.

Hosted Agent Infrastructure

Hosted agents run in remote sandboxed environments rather than on local machines. When designed well, they provide unlimited concurrency, consistent execution environments, and multiplayer collaboration. The critical insight is that session speed should be limited only by model provider time-to-first-token, with all infrastructure setup completed before the user starts their session.

When to Activate

Activate this skill when:

  • Building background coding agents that run independently of user devices
  • Designing sandboxed execution environments for agent workloads
  • Implementing multiplayer agent sessions with shared state
  • Creating multi-client agent interfaces (Slack, Web, Chrome extensions)
  • Scaling agent infrastructure beyond local machine constraints
  • Building systems where agents spawn sub-agents for parallel work

Core Concepts

Hosted agents address the fundamental limitation of local agent execution: resource contention, environment inconsistency, and single-user constraints. By moving agent execution to remote sandboxed environments, teams gain unlimited concurrency, reproducible environments, and collaborative workflows.

The architecture consists of three layers: sandbox infrastructure for isolated execution, API layer for state management and client coordination, and client interfaces for user interaction across platforms. Each layer has specific design requirements that enable the system to scale.

Detailed Topics

Sandbox Infrastructure

The Core Challenge Spinning up full development environments quickly is the primary technical challenge. Users expect near-instant session starts, but development environments require cloning repositories, installing dependencies, and running build steps.

Image Registry Pattern Pre-build environment images on a regular cadence (every 30 minutes works well). Each image contains:

  • Cloned repository at a known commit
  • All runtime dependencies installed
  • Initial setup and build commands completed
  • Cached files from running app and test suite once

When starting a session, spin up a sandbox from the most recent image. The repository is at most 30 minutes out of date, making synchronization with the latest code much faster.

Snapshot and Restore Take filesystem snapshots at key points:

  • After initial image build (base snapshot)
  • When agent finishes making changes (session snapshot)
  • Before sandbox exit for potential follow-up

This enables instant restoration for follow-up prompts without re-running setup.

Git Configuration for Background Agents Since git operations are not tied to a specific user during image builds:

  • Generate GitHub app installation tokens for repository access during clone
  • Update git config's user.name and user.email when committing and pushing changes
  • Use the prompting user's identity for commits, not the app identity

Warm Pool Strategy Maintain a pool of pre-warmed sandboxes for high-volume repositories:

  • Sandboxes are ready before users start sessions
  • Expire and recreate pool entries as new image builds complete
  • Start warming sandbox as soon as user begins typing (predictive warm-up)

Agent Framework Selection

Server-First Architecture Choose an agent framework structured as a server first, with TUI and desktop apps as clients. This enables:

  • Multiple custom clients without duplicating agent logic
  • Consistent behavior across all interaction surfaces
  • Plugin systems for extending functionality
  • Event-driven architectures for real-time updates

Code as Source of Truth Select frameworks where the agent can read its own source code to understand behavior. This is underrated in AI development: having the code as source of truth prevents hallucination about the agent's own capabilities.

Plugin System Requirements The framework should support plugins that:

  • Listen to tool execution events (e.g., tool.execute.before)
  • Block or modify tool calls conditionally
  • Inject context or state at runtime

Speed Optimizations

Predictive Warm-Up Start warming the sandbox as soon as a user begins typing their prompt:

  • Clone latest changes in parallel with user typing
  • Run initial setup before user hits enter
  • For fast spin-up, sandbox can be ready before user finishes typing

Parallel File Reading Allow the agent to start reading files immediately, even if sync from latest base branch is not complete:

  • In large repositories, incoming prompts rarely modify recently-changed files
  • Agent can research immediately without waiting for git sync
  • Block file edits (not reads) until synchronization completes

Maximize Build-Time Work Move everything possible to the image build step:

  • Full dependency installation
  • Database schema setup
  • Initial app and test suite runs (populates caches)
  • Build-time duration is invisible to users

Self-Spawning Agents

Agent-Spawned Sessions Create tools that allow agents to spawn new sessions:

  • Research tasks across different repositories
  • Parallel subtask execution for large changes
  • Multiple smaller PRs from one major task

Frontier models are capable of containing themselves. The tools should:

  • Start a new session with specified parameters
  • Read status of any session (check-in capability)
  • Continue main work while sub-sessions run in parallel

Prompt Engineering for Self-Spawning Engineer prompts to guide when agents spawn sub-sessions:

  • Research tasks that require cross-repository exploration
  • Breaking monolithic changes into smaller PRs
  • Parallel exploration of different approaches

API Layer

Per-Session State Isolation Each session requires its own isolated state storage:

  • Dedicated database per session (SQLite per session works well)
  • No session can impact another's performance
  • Handles hundreds of concurrent sessions

Real-Time Streaming Agent work involves high-frequency updates:

  • Token streaming from model providers
  • Tool execution status updates
  • File change notifications

WebSocket connections with hibernation APIs reduce compute costs during idle periods while maintaining open connections.

Synchronization Across Clients Build a single state system that synchronizes across:

  • Chat interfaces
  • Slack bots
  • Chrome extensions
  • Web interfaces
  • VS Code instances

All changes sync to the session state, enabling seamless client switching.

Multiplayer Support

Why Multiplayer Matters Multiplayer enables:

  • Teaching non-engineers to use AI effectively
  • Live QA sessions with multiple team members
  • Real-time PR review with immediate changes
  • Collaborative debugging sessions

Implementation Requirements

  • Data model must not tie sessions to single authors
  • Pass authorship info to each prompt
  • Attribute code changes to the prompting user
  • Share session links for instant collaboration

With proper synchronization architecture, multiplayer support is nearly free to add.

Authentication and Authorization

User-Based Commits Use GitHub authentication to:

  • Obtain user tokens for PR creation
  • Open PRs on behalf of the user (not the app)
  • Prevent users from approving their own changes

Sandbox-to-API Flow

  1. Sandbox pushes changes (updating git user config)
  2. Sandbox sends event to API with branch name and session ID
  3. API uses user's GitHub token to create PR
  4. GitHub webhooks notify API of PR events

Client Implementations

Slack Integration The most effective distribution channel for internal adoption:

  • Creates virality loop as team members see others using it
  • No syntax required, natural chat interface
  • Classify repository from message, thread context, and channel name

Build a classifier to determine which repository to work in:

  • Fast model with descriptions of available repositories
  • Include hints for common repositories
  • Allow "unknown" option for ambiguous cases

Web Interface Core features:

  • Works on desktop and mobile
  • Real-time streaming of agent work
  • Hosted VS Code instance running inside sandbox
  • Streamed desktop view for visual verification
  • Before/after screenshots for PRs

Statistics page showing:

  • Sessions resulting in merged PRs (primary metric)
  • Usage over time
  • Live "humans prompting" count (prompts in last 5 minutes)

Chrome Extension For non-engineering users:

  • Sidebar chat interface with screenshot tool
  • DOM and React internals extraction instead of raw images
  • Reduces token usage while maintaining precision
  • Distribute via managed device policy (bypasses Chrome Web Store)

Practical Guidance

Follow-Up Message Handling

Decide how to handle messages sent during execution:

  • Queue approach: Messages wait until current prompt completes
  • Insert approach: Messages are processed immediately

Queueing is simpler to manage and lets users send thoughts on next steps while agent works. Build mechanism to stop agent mid-execution when needed.

Metrics That Matter

Track metrics that indicate real value:

  • Sessions resulting in merged PRs (primary success metric)
  • Time from session start to first model response
  • PR approval rate and revision count
  • Agent-written code percentage across repositories

Adoption Strategy

Internal adoption patterns that work:

  • Work in public spaces (Slack channels) for visibility
  • Let the product create virality loops
  • Don't force usage over existing tools
  • Build to people's needs, not hypothetical requirements

Guidelines

  1. Pre-build environment images on regular cadence (30 minutes is a good default)
  2. Start warming sandboxes when users begin typing, not when they submit 3
五维分析
清晰度7/10
创新性8/10
实用性9/10
完整性8/10
可维护性6/10
优缺点分析

优点

  • 提供了可扩展、沙盒化智能体部署的全面指南。
  • 强调了预加热等关键性能优化。
  • 支持有价值的多用户和协作用例。

缺点

  • 极其复杂,需要大量定制基础设施。
  • README是设计文档,而非即用型技能/库。
  • 缺乏具体代码,导致实现模糊不清。

相关技能

infra-skills

A
toolCo-Pilot / 辅助式
80/ 100

“看起来很能打,但别让配置把人劝退。”

pytorch

S
toolCode Lib / 代码库
92/ 100

“它是深度学习的瑞士军刀,但祝你好运能从47种安装方法里找到那个不会搞崩你系统的那一个。”

agno

S
toolCode Lib / 代码库
90/ 100

“它承诺成为智能体领域的Kubernetes,但得看开发者有没有耐心学习又一个编排层。”

免责声明:本内容来源于 GitHub 开源项目,仅供展示和评分分析使用。

版权归原作者所有 muratcankoylan.