Co-Pilot / 辅助式
更新于 25 days ago

infra-skills

Yyzlnew
0.1k
yzlnew/infra-skills
80
Agent 评分

💡 摘要

一系列专为AI基础设施工程师设计的专业技能,提升开发和演示能力。

🎯 适合人群

AI基础设施工程师机器学习研究人员数据科学家技术演示者软件开发人员

🤖 AI 吐槽:看起来很能打,但别让配置把人劝退。

安全分析中风险

风险:Medium。建议检查:是否执行 shell/命令行指令;是否发起外网请求(SSRF/数据外发);文件读写范围与路径穿越风险。以最小权限运行,并在生产环境启用前审计代码与依赖。

AI Infrastructure Agent Skills

⚠️ WARNING This project is under active development and heavily generated by LLMs without strict proofreading. Use with caution and verify all code before production use.

A collection of specialized agent skills for AI infrastructure engineers, covering both technical development (GPU kernels, distributed training, inference optimization) and soft skills (flowchart creation, presentation design).

Overview

This repository provides expert-level skills tailored for AI infrastructure engineering. Each skill packages domain knowledge, code examples, and best practices to transform Claude into a specialized assistant for specific frameworks and workflows—from writing high-performance CUDA kernels to creating professional technical presentations.

Construction Methodology (Unless Otherwise Specified)

  1. Knowledge Gathering: Use Gemini DeepResearch to collect comprehensive, up-to-date information on target frameworks
  2. Skill Development: Transform research into structured skills using skill-creator in Claude Code
  3. Validation: Test skill-generated code examples to ensure correctness
  4. Maintenance: Regular updates based on latest official documentation

Available Skills

TileLang Developer

Write high-performance GPU kernels using TileLang for NVIDIA, AMD, and Ascend hardware.

Capabilities:

  • Matrix multiplication (GEMM) kernels
  • FlashAttention implementations
  • DeepSeek MLA operators
  • Performance optimization (swizzle layouts, pipelining, warp specialization)
  • Cross-platform kernel development

Status: ✅ Complete

Megatron Memory Estimator

Estimate GPU memory usage for Megatron-based MoE and dense models. Built upon megatron_memory_estimator.

Capabilities:

  • Estimate memory from HuggingFace configs
  • Support for MoE models (DeepSeek-V3, Qwen, etc.)
  • Parallelism strategy comparison (TP/PP/EP/CP)
  • Memory optimization recommendations

Status: ✅ Complete

SLIME User

Guide for using SLIME (LLM post-training framework for RL Scaling). Built upon THUDM/slime.

Capabilities:

  • RL training setup and configuration (GRPO, GSPO, PPO, Reinforce++)
  • Multi-turn tool calling and agent workflows
  • Custom reward models and generation functions
  • Megatron and FSDP backend configuration
  • SGLang integration and optimization
  • Dynamic sampling and partial rollout
  • Multi-node distributed training

Status: ✅ Complete

Prompt to create this skill, with Sonnet 4.5:

Use skill-creator to create a skill called slime-user at this repo. slime is an LLM
post-training framework for RL Scaling. Its repo is https://github.com/THUDM/slime.

Skill creation procedure:

1. Git clone the latest repo
2. Analyze `docs/en`, understand basic structure and write a doc navigation guide for user
getting started or finding docs for advanced usage
3. Gather valuable examples from the docs and `examples` dir, write key ideas and script
path down for quick reference
4. Checkout some important source code, for example `slime/slime/utils/arguments.py` and
`slime/rollout/sglang_rollout.py`, provide its path and functions for a quick find.

TikZ Flowchart

Create professional flowcharts and architecture diagrams using LaTeX TikZ with standardized styles.

Capabilities:

  • Professional flowcharts with Google Material-like color palette
  • Standardized node types (data, memory, operation, kernel boxes)
  • Architecture diagrams and process flows
  • Grouping and layout best practices
  • Clean orthogonal edges and relative positioning

Example Output: QAT Flowchart

Status: ✅ Complete

Material You Slides

Create presentation slide decks using Material You (Material Design 3) design language.

Capabilities:

  • Self-contained HTML slides (1280x720) with M3 color tokens
  • Roboto typography with multiple weight support
  • Professional slide types (title, section divider, content)
  • Component library (cards, flow diagrams, metric cards, code blocks)
  • Rounded shapes and generous whitespace
  • Surface hierarchy without drop shadows
  • Structured layouts (columns, tables, lists, tags/chips)

Example Output: SLIME RL Training Slides

Status: ✅ Complete

Planned Skills

SGLang Developer

Development skill for SGLang (Structured Generation Language) runtime and optimization.

Planned capabilities:

  • SGLang runtime configuration
  • Custom sampling strategies
  • Performance tuning for LLM inference
  • Multi-GPU serving optimization

Status: 🚧 Planned

vLLM Developer

Skill for vLLM engine development and deployment.

Planned capabilities:

  • PagedAttention implementation
  • Custom scheduler development
  • Multi-LoRA serving
  • Quantization integration

Status: 🚧 Planned

Usage

Installing Skills

Skills are installed by placing the skill directory in Claude's skills path:

Natural Language: Ask Claude Code directly: "Help me install skills from https://github.com/yzlnew/infra-skills"

Personal (across all projects):

# Clone and copy to personal skills directory git clone https://github.com/yzlnew/infra-skills.git mkdir -p ~/.claude/skills cp -r infra-skills/tilelang-developer ~/.claude/skills/ cp -r infra-skills/megatron-memory-estimator ~/.claude/skills/ cp -r infra-skills/slime-user ~/.claude/skills/ cp -r infra-skills/tikz-flowchart ~/.claude/skills/ cp -r infra-skills/material-you-slides ~/.claude/skills/

Project-level (for repository collaborators):

# Clone and copy to project's skills directory cd your-project git clone https://github.com/yzlnew/infra-skills.git .claude/skills-repo mkdir -p .claude/skills cp -r .claude/skills-repo/tilelang-developer .claude/skills/ cp -r .claude/skills-repo/megatron-memory-estimator .claude/skills/ cp -r .claude/skills-repo/slime-user .claude/skills/ cp -r .claude/skills-repo/tikz-flowchart .claude/skills/ cp -r .claude/skills-repo/material-you-slides .claude/skills/

Skills automatically activate when relevant tasks are detected.

Examples

TileLang Kernel Development:

# User request: "Write a FP16 matrix multiplication kernel optimized for A100" # Claude loads tilelang-developer skill and generates: # - Complete TileLang kernel code # - Performance optimizations (swizzle, pipelining) # - Testing code # - Hardware-specific tuning recommendations

Megatron Memory Estimation:

# User request: "Estimate memory for DeepSeek-V3 with TP=8, PP=4, EP=8" # Claude loads megatron-memory-estimator skill and provides: # - Detailed memory breakdown (model, optimizer, activations) # - Comparison across different parallelism strategies # - Memory optimization recommendations # - Hardware configuration suggestions

SLIME RL Training Setup:

# User request: "Help me set up GRPO training for Qwen3-4B with multi-turn tool calling" # Claude loads slime-user skill and provides: # - Environment setup instructions # - Custom generation function for tool calling # - Training script configuration # - Multi-node scaling guidance

TikZ Flowchart Creation:

# User request: "Create a flowchart showing the FlashAttention-2 algorithm flow" # Claude loads tikz-flowchart skill and generates: # - Professional LaTeX TikZ diagram with standardized colors # - Data nodes (green), operation nodes (blue), memory nodes (orange) # - Clean layout with orthogonal edges # - Grouped kernel phases with proper styling

Material You Slides Creation:

# User request: "Create a presentation deck about our AI infrastructure architecture" # Claude loads material-you-slides skill and generates: # - Self-contained HTML file with Material Design 3 styling # - Title slide with gradient background and branding # - Section dividers with large translucent numbers # - Content slides with cards, flow diagrams, and metric displays # - Responsive 1280x720 slides ready for presentation

Development

Testing Skills

Validate code examples in skills:

# Run all tests from project root pytest # Run tests for specific skill pytest tests/tilelang-developer/ # Run specific test file pytest tests/tilelang-developer/test_gemm.py

Updating Skills

When frameworks release major updates:

  1. Update skill source files (SKILL.md, references/) with latest information
  2. Run validation tests to ensure examples are correct
  3. Commit and tag new version

Quality Standards

All skills must meet these criteria:

  • Accurate: Code examples must be tested and correct
  • Concise: Follow progressive disclosure (SKILL.md < 500 lines)
  • Complete: Include workflow, API reference, examples, and debugging
  • Current: Based on latest stable framework version
  • Clear: Explicit triggers in description for automatic activation

Contributing

Skill Requests

Open an issue with:

  • Framework/tool name
  • Use cases and scenarios
  • Link to official documentation

Skill Improvements

  1. Fork the repository
  2. Update skill source files
  3. Run validation tests
  4. Submit PR with changelog

Roadmap

  • [x] TileLang developer skill
  • [x] Megatron memory estimator skill
  • [x] SLIME user skill
  • [x] TikZ flowchart skill
  • [x] Material You slides skill
  • [ ] SGLang developer skill
  • [ ] vLLM developer skill
  • [ ] Automated testing pipeline
  • [ ] Documentation update monitoring
  • [ ] Skill versioning system

Resources

License

Skills are provided as-is for development purposes. Generated code follows the license terms of the underlying frameworks.

五维分析
清晰度8/10
创新性8/10
实用性9/10
完整性8/10
可维护性7/10
优缺点分析

优点

  • 涵盖多种AI任务的全面技能集
  • 支持技术和软技能
  • 根据最新框架定期更新

缺点

  • 正在积极开发中,可能存在错误
  • 某些领域的文档可能缺乏深度
  • 功能依赖于外部框架

相关技能

hosted-agents

B
toolAuto-Pilot / 全自动
76/ 100

“看起来很能打,但别让配置把人劝退。”

pytorch

S
toolCode Lib / 代码库
92/ 100

“它是深度学习的瑞士军刀,但祝你好运能从47种安装方法里找到那个不会搞崩你系统的那一个。”

agno

S
toolCode Lib / 代码库
90/ 100

“它承诺成为智能体领域的Kubernetes,但得看开发者有没有耐心学习又一个编排层。”

免责声明:本内容来源于 GitHub 开源项目,仅供展示和评分分析使用。

版权归原作者所有 yzlnew.