Co-Pilot / 辅助式
更新于 a month ago

hugging-face-model-trainer

Hhuggingface
1.0k
huggingface/skills/skills/hugging-face-model-trainer
80
Agent 评分

💡 摘要

此技能使用户能够在Hugging Face的云基础设施上使用TRL训练和微调语言模型。

🎯 适合人群

希望微调模型的数据科学家对强化学习感兴趣的AI研究人员需要基于云的训练解决方案的开发人员教授机器学习概念的教育工作者希望在没有本地资源的情况下部署AI模型的企业

🤖 AI 吐槽:看起来很能打,但别让配置把人劝退。

安全分析严重风险

风险:Critical。建议检查:是否执行 shell/命令行指令;是否发起外网请求(SSRF/数据外发);API Key/Token 的获取、存储与泄露风险;文件读写范围与路径穿越风险;依赖锁定与供应链风险。以最小权限运行,并在生产环境启用前审计代码与依赖。


name: hugging-face-model-trainer description: This skill should be used when users want to train or fine-tune language models using TRL (Transformer Reinforcement Learning) on Hugging Face Jobs infrastructure. Covers SFT, DPO, GRPO and reward modeling training methods, plus GGUF conversion for local deployment. Includes guidance on the TRL Jobs package, UV scripts with PEP 723 format, dataset preparation and validation, hardware selection, cost estimation, Trackio monitoring, Hub authentication, and model persistence. Should be invoked for tasks involving cloud GPU training, GGUF conversion, or when users mention training on Hugging Face Jobs without local GPU setup. license: Complete terms in LICENSE.txt

TRL Training on Hugging Face Jobs

Overview

Train language models using TRL (Transformer Reinforcement Learning) on fully managed Hugging Face infrastructure. No local GPU setup required—models train on cloud GPUs and results are automatically saved to the Hugging Face Hub.

TRL provides multiple training methods:

  • SFT (Supervised Fine-Tuning) - Standard instruction tuning
  • DPO (Direct Preference Optimization) - Alignment from preference data
  • GRPO (Group Relative Policy Optimization) - Online RL training
  • Reward Modeling - Train reward models for RLHF

For detailed TRL method documentation:

hf_doc_search("your query", product="trl") hf_doc_fetch("https://huggingface.co/docs/trl/sft_trainer") # SFT hf_doc_fetch("https://huggingface.co/docs/trl/dpo_trainer") # DPO # etc.

See also: references/training_methods.md for method overviews and selection guidance

When to Use This Skill

Use this skill when users want to:

  • Fine-tune language models on cloud GPUs without local infrastructure
  • Train with TRL methods (SFT, DPO, GRPO, etc.)
  • Run training jobs on Hugging Face Jobs infrastructure
  • Convert trained models to GGUF for local deployment (Ollama, LM Studio, llama.cpp)
  • Ensure trained models are permanently saved to the Hub
  • Use modern workflows with optimized defaults

Key Directives

When assisting with training jobs:

  1. ALWAYS use hf_jobs() MCP tool - Submit jobs using hf_jobs("uv", {...}), NOT bash trl-jobs commands. The script parameter accepts Python code directly. Do NOT save to local files unless the user explicitly requests it. Pass the script content as a string to hf_jobs(). If user asks to "train a model", "fine-tune", or similar requests, you MUST create the training script AND submit the job immediately using hf_jobs().

  2. Always include Trackio - Every training script should include Trackio for real-time monitoring. Use example scripts in scripts/ as templates.

  3. Provide job details after submission - After submitting, provide job ID, monitoring URL, estimated time, and note that the user can request status checks later.

  4. Use example scripts as templates - Reference scripts/train_sft_example.py, scripts/train_dpo_example.py, etc. as starting points.

Local Script Dependencies

To run scripts locally (like estimate_cost.py), install dependencies:

pip install -r requirements.txt

Prerequisites Checklist

Before starting any training job, verify:

Account & Authentication

  • Hugging Face Account with Pro, Team, or Enterprise plan (Jobs require paid plan)
  • Authenticated login: Check with hf_whoami()
  • HF_TOKEN for Hub Push ⚠️ CRITICAL - Training environment is ephemeral, must push to Hub or ALL training results are lost
  • Token must have write permissions
  • MUST pass secrets={"HF_TOKEN": "$HF_TOKEN"} in job config to make token available (the $HF_TOKEN syntax references your actual token value)

Dataset Requirements

  • Dataset must exist on Hub or be loadable via datasets.load_dataset()
  • Format must match training method (SFT: "messages"/text/prompt-completion; DPO: chosen/rejected; GRPO: prompt-only)
  • ALWAYS validate unknown datasets before GPU training to prevent format failures (see Dataset Validation section below)
  • Size appropriate for hardware (Demo: 50-100 examples on t4-small; Production: 1K-10K+ on a10g-large/a100-large)

⚠️ Critical Settings

  • Timeout must exceed expected training time - Default 30min is TOO SHORT for most training. Minimum recommended: 1-2 hours. Job fails and loses all progress if timeout is exceeded.
  • Hub push must be enabled - Config: push_to_hub=True, hub_model_id="username/model-name"; Job: secrets={"HF_TOKEN": "$HF_TOKEN"}

Asynchronous Job Guidelines

⚠️ IMPORTANT: Training jobs run asynchronously and can take hours

Action Required

When user requests training:

  1. Create the training script with Trackio included (use scripts/train_sft_example.py as template)
  2. Submit immediately using hf_jobs() MCP tool with script content inline - don't save to file unless user requests
  3. Report submission with job ID, monitoring URL, and estimated time
  4. Wait for user to request status checks - don't poll automatically

Ground Rules

  • Jobs run in background - Submission returns immediately; training continues independently
  • Initial logs delayed - Can take 30-60 seconds for logs to appear
  • User checks status - Wait for user to request status updates
  • Avoid polling - Check logs only on user request; provide monitoring links instead

After Submission

Provide to user:

  • ✅ Job ID and monitoring URL
  • ✅ Expected completion time
  • ✅ Trackio dashboard URL
  • ✅ Note that user can request status checks later

Example Response:

✅ Job submitted successfully!

Job ID: abc123xyz
Monitor: https://huggingface.co/jobs/username/abc123xyz

Expected time: ~2 hours
Estimated cost: ~$10

The job is running in the background. Ask me to check status/logs when ready!

Quick Start: Three Approaches

💡 Tip for Demos: For quick demos on smaller GPUs (t4-small), omit eval_dataset and eval_strategy to save ~40% memory. You'll still see training loss and learning progress.

Sequence Length Configuration

TRL config classes use max_length (not max_seq_length) to control tokenized sequence length:

# ✅ CORRECT - If you need to set sequence length SFTConfig(max_length=512) # Truncate sequences to 512 tokens DPOConfig(max_length=2048) # Longer context (2048 tokens) # ❌ WRONG - This parameter doesn't exist SFTConfig(max_seq_length=512) # TypeError!

Default behavior: max_length=1024 (truncates from right). This works well for most training.

When to override:

  • Longer context: Set higher (e.g., max_length=2048)
  • Memory constraints: Set lower (e.g., max_length=512)
  • Vision models: Set max_length=None (prevents cutting image tokens)

Usually you don't need to set this parameter at all - the examples below use the sensible default.

Approach 1: UV Scripts (Recommended—Default Choice)

UV scripts use PEP 723 inline dependencies for clean, self-contained training. This is the primary approach for Claude Code.

hf_jobs("uv", { "script": """ # /// script # dependencies = ["trl>=0.12.0", "peft>=0.7.0", "trackio"] # /// from datasets import load_dataset from peft import LoraConfig from trl import SFTTrainer, SFTConfig import trackio dataset = load_dataset("trl-lib/Capybara", split="train") # Create train/eval split for monitoring dataset_split = dataset.train_test_split(test_size=0.1, seed=42) trainer = SFTTrainer( model="Qwen/Qwen2.5-0.5B", train_dataset=dataset_split["train"], eval_dataset=dataset_split["test"], peft_config=LoraConfig(r=16, lora_alpha=32), args=SFTConfig( output_dir="my-model", push_to_hub=True, hub_model_id="username/my-model", num_train_epochs=3, eval_strategy="steps", eval_steps=50, report_to="trackio", project="meaningful_prject_name", # project name for the training name (trackio) run_name="meaningful_run_name", # descriptive name for the specific training run (trackio) ) ) trainer.train() trainer.push_to_hub() """, "flavor": "a10g-large", "timeout": "2h", "secrets": {"HF_TOKEN": "$HF_TOKEN"} })

Benefits: Direct MCP tool usage, clean code, dependencies declared inline (PEP 723), no file saving required, full control When to use: Default choice for all training tasks in Claude Code, custom training logic, any scenario requiring hf_jobs()

Working with Scripts

⚠️ Important: The script parameter accepts either inline code (as shown above) OR a URL. Local file paths do NOT work.

Why local paths don't work: Jobs run in isolated Docker containers without access to your local filesystem. Scripts must be:

  • Inline code (recommended for custom training)
  • Publicly accessible URLs
  • Private repo URLs (with HF_TOKEN)

Common mistakes:

# ❌ These will all fail hf_jobs("uv", {"script": "train.py"}) hf_jobs("uv", {"script": "./scripts/train.py"}) hf_jobs("uv", {"script": "/path/to/train.py"})

Correct approaches:

# ✅ Inline code (recommended) hf_jobs("uv", {"script": "# /// script\n# dependencies = [...]\n# ///\n\n<your code>"}) # ✅ From Hugging Face Hub hf_jobs("uv", {"script": "https://huggingface.co/user/repo/resolve/main/train.py"}) # ✅ From GitHub hf_jobs("uv", {"script": "https://raw.githubusercontent.com/user/repo/main/train.py"}) # ✅ From Gist hf_jobs("uv", {"script": "https://gist.githubusercontent.com/user/id/raw/train.py"})

To use local scripts: Upload to HF Hub first:

huggingface-cli repo create my-training-scripts --type model huggingface-cli upload my-training-scripts ./train.py train.py # Use: https://huggingface.co/USERNAME/my-training-scripts/resolve/main/train.py

Approach 2: TRL Maintained Scripts (Official Examples)

TRL provides battl

五维分析
清晰度8/10
创新性8/10
实用性9/10
完整性8/10
可维护性7/10
优缺点分析

优点

  • 训练不需要本地GPU
  • 支持多种高级训练方法
  • 结果自动保存到Hugging Face Hub
  • 通过Trackio进行实时监控

缺点

  • 需要付费的Hugging Face账户
  • 依赖云基础设施
  • 大量训练可能导致高成本
  • 初学者的复杂设置

相关技能

pytorch

S
toolCode Lib / 代码库
92/ 100

“它是深度学习的瑞士军刀,但祝你好运能从47种安装方法里找到那个不会搞崩你系统的那一个。”

agno

S
toolCode Lib / 代码库
90/ 100

“它承诺成为智能体领域的Kubernetes,但得看开发者有没有耐心学习又一个编排层。”

nuxt-skills

S
toolCo-Pilot / 辅助式
90/ 100

“这本质上是一份组织良好的小抄,能把你的 AI 助手变成一只 Nuxt 框架的复读机。”

免责声明:本内容来源于 GitHub 开源项目,仅供展示和评分分析使用。

版权归原作者所有 huggingface.