elevenlabs-remotion-skill
💡 摘要
使用 ElevenLabs AI 生成专业的语音解说,支持可自定义的角色预设。
🎯 适合人群
🤖 AI 吐槽: “该技能需要 API 密钥,如果暴露会带来风险。确保 .env.local 文件安全,并且不包含在版本控制中。”
该技能需要 API 密钥,如果暴露会带来风险。确保 .env.local 文件安全,并且不包含在版本控制中。
name: elevenlabs-remotion description: Generate professional voiceovers using ElevenLabs AI. Use when the user needs to create voiceovers for videos, audio narration, or text-to-speech content. Supports multiple voices with character presets (narrator, salesperson, expert) for natural delivery. Includes single scene regeneration for fine-tuning. allowed-tools: Bash(node:), Bash(npx:)
ElevenLabs Voiceover Generation
Generate professional AI voiceovers for Remotion videos using ElevenLabs API.
Prerequisites
ELEVENLABS_API_KEYin.env.local
Quick Start
# Generate voiceover from text node .claude/skills/elevenlabs-remotion-skill/generate.js --text "Your text here" --output public/audio/voiceover.mp3 # Generate with narrator style (more natural) node .claude/skills/elevenlabs-remotion-skill/generate.js --text "Your text" --character narrator --output voiceover.mp3 # Generate scenes with request stitching node .claude/skills/elevenlabs-remotion-skill/generate.js --scenes remotion/scenes.json --output-dir public/audio/project/ # Regenerate a single scene node .claude/skills/elevenlabs-remotion-skill/generate.js --scenes scenes.json --scene scene2 --new-text "Updated text" # List available voices and character presets node .claude/skills/elevenlabs-remotion-skill/generate.js --list-voices node .claude/skills/elevenlabs-remotion-skill/generate.js --list-characters
Character Presets
Use character presets for more natural voiceovers instead of literal screen text reading:
| Character | Description | Best For |
|-----------|-------------|----------|
| literal | Reads text exactly as written | Screen text, quotes |
| narrator | Professional storyteller, smooth, engaging | Explainers, documentaries |
| salesperson | Enthusiastic, persuasive, energetic | Marketing, ads |
| expert | Authoritative, confident, knowledgeable | Legal content, tutorials |
| conversational | Casual, friendly, natural | Social media, casual content |
| dramatic | Intense, emotional, impactful | Hooks, problem statements |
| calm | Soothing, reassuring, gentle | Trust-building, conclusions |
# Use narrator style globally node .claude/skills/elevenlabs-remotion-skill/generate.js --scenes scenes.json --character narrator --output-dir public/audio/ # Or set per-scene in scenes.json { "scenes": [ { "id": "scene1", "text": "Problem statement", "character": "dramatic" }, { "id": "scene2", "text": "Solution", "character": "calm" } ] }
Scene-Based Generation with Request Stitching
Generate multiple scenes with consistent prosody using ElevenLabs request stitching:
scenes.json Format
{ "name": "product-demo", "voice": "George", "character": "narrator", "scenes": [ { "id": "scene1", "text": "Generic text-to-speech sounds robotic. Your brand deserves better.", "duration": 4.5, "character": "dramatic" }, { "id": "scene2", "text": "With voice cloning, you can use your own voice for unlimited content.", "duration": 5.5 }, { "id": "scene3", "text": "Record a short sample. Clone it. Create professional voiceovers in minutes.", "duration": 6, "delay": 0.3 } ] }
Generate All Scenes
node .claude/skills/elevenlabs-remotion-skill/generate.js \ --scenes remotion/product-demo-scenes.json \ --output-dir public/audio/product-demo/
This creates:
product-demo-scene1.mp3throughsceneN.mp3product-demo-combined.mp3(all scenes stitched)product-demo-info.json(metadata with durations)
Single Scene Regeneration
If a scene starts too early, has wrong timing, or needs different text:
# Regenerate scene2 with new text node .claude/skills/elevenlabs-remotion-skill/generate.js \ --scenes remotion/scenes.json \ --scene scene2 \ --new-text "Updated scene 2 text" \ --output-dir public/audio/project/ # Regenerate scene3 with different character node .claude/skills/elevenlabs-remotion-skill/generate.js \ --scenes remotion/scenes.json \ --scene scene3 \ --character salesperson \ --output-dir public/audio/project/ # Just regenerate (same text, same character) node .claude/skills/elevenlabs-remotion-skill/generate.js \ --scenes remotion/scenes.json \ --scene scene1 \ --output-dir public/audio/project/ # Embed a thumbnail into an MP4 video node .claude/skills/elevenlabs-remotion-skill/generate.js \ --embed-thumbnail public/videos/my-video.mp4 \ --thumbnail public/videos/my-thumbnail.png \ --output public/videos/my-video-with-thumb.mp4
The tool automatically:
- Uses request stitching from previous scenes for consistent prosody
- Updates the info.json file with new metadata
- Updates scenes.json if
--new-textis provided
Thumbnail Embedding
Embed a thumbnail image into MP4 videos so platforms like Twitter, YouTube, and video players display your custom thumbnail instead of the first frame.
Embed Thumbnail into Video
# Basic usage - outputs to video-thumb.mp4 node .claude/skills/elevenlabs-remotion-skill/generate.js \ --embed-thumbnail public/videos/promo.mp4 \ --thumbnail public/videos/thumbnail.png # Custom output path node .claude/skills/elevenlabs-remotion-skill/generate.js \ --embed-thumbnail public/videos/promo.mp4 \ --thumbnail public/videos/thumbnail.png \ --output public/videos/promo-final.mp4
Workflow with Remotion
# 1. Render your video npx remotion render MyVideo public/videos/my-video.mp4 # 2. Render your thumbnail (use Still composition) npx remotion still MyVideoThumbnail public/videos/my-thumbnail.png # 3. Embed the thumbnail node .claude/skills/elevenlabs-remotion-skill/generate.js \ --embed-thumbnail public/videos/my-video.mp4 \ --thumbnail public/videos/my-thumbnail.png \ --output public/videos/my-video-final.mp4
Supported Formats
- Video: MP4 (H.264/H.265)
- Thumbnail: PNG, JPG, JPEG
The embedding uses ffmpeg's -disposition:v:1 attached_pic flag to set the thumbnail as an attached picture, which most video players and platforms recognize.
Timing Validation
The skill automatically validates timing after generation using ffprobe:
What It Checks
| Check | Threshold | Description | |-------|-----------|-------------| | Duration mismatch | >15% | Warns if actual differs from expected duration | | Leading silence | >200ms | Audio starts late (voiceover delayed) | | Trailing silence | >500ms | Unnecessary silence at end | | Speaking rate | 2-4.5 wps | Optimal ~3 words/second |
Validate Existing Audio
# Validate all scenes in a project node .claude/skills/elevenlabs-remotion-skill/generate.js --validate public/audio/product-demo/
Output example:
🔍 Validating product-demo (6 scenes)
❌ scene1: 3.00s (expected: 4.5s)
❌ Audio 1.50s shorter than expected
👍 8 words @ 3.1 words/sec
⚠️ scene2: 6.35s (expected: 5.5s)
⚠️ Leading silence: 235ms (may start late)
🐢 10 words @ 1.8 words/sec
✅ scene4: 4.36s (expected: 4s)
👍 9 words @ 2.3 words/sec
📊 Total duration: 30.80s (expected: 30.00s)
Updated info.json
After validation, the info.json includes actual measurements:
{ "scenes": [ { "id": "scene1", "duration": 4.5, "actualDuration": 3.0, "leadingSilence": 0.05, "wordsPerSecond": 3.1 } ] }
Use actualDuration in your Remotion composition for precise sync.
Options
| Option | Description | Default |
|--------|-------------|---------|
| --text, -t | Text to convert to speech | Required (or --file/--scenes) |
| --file, -f | Read text from file | - |
| --output, -o | Output file path | output.mp3 |
| --output-dir | Output directory for scenes | public/audio |
| --voice, -v | Voice name or ID | George |
| --model, -m | Model ID | eleven_multilingual_v2 |
| --character, -c | Character preset | literal |
| --scenes | JSON file with scenes | - |
| --scene | Regenerate single scene ID | - |
| --new-text | New text for scene regen | - |
| --validate | Validate existing audio dir | - |
| --skip-validation | Skip auto-validation | false |
| --embed-thumbnail | Video file to embed thumbnail into | - |
| --thumbnail | Thumbnail image file (PNG/JPG) | - |
| --stability | Voice stability (0-1) | varies by character |
| --similarity | Voice similarity (0-1) | varies by character |
| --style | Style exaggeration (0-1) | varies by character |
| --no-combined | Skip combined file | false |
Recommended Voices
| Voice | Style | Best For |
|-------|-------|----------|
| George | Warm, captivating British | Narration, explainers |
| Antoni | Professional, warm | Legal content, tutorials |
| Arnold | Authoritative, deep | Corporate, serious topics |
| Josh | Friendly, conversational | Marketing, casual content |
Integration with Remotion
After generating scene voiceovers, use them in your composition:
import { Audio, Sequence, staticFile } from "remotion"; // Use individual scene audio files for precise sync const SCENE_DURATIONS = { scene1: 4.5, // From info.json scene2: 5.5, scene3: 8.0, }; export const VideoWithVoiceover: React.FC = () => { const { fps } = useVideoConfig(); const scene1Frames = Math.round(SCENE_DURATIONS.scene1 * fps); const scene2Frames = Math.round(SCENE_DURATIONS.scene2 * fps); return ( <> <Sequence from={0} durationInFrames={scene1Frames}> <Audio src={staticFile("audio/project/project-scene1.mp3")} /> <Scene1Visual /> </Sequence> <Sequence from={scene1Frames} durationInFrames={scene2Frames}> <Audio src={staticFile("audio/project/project-scene2.mp3")} /> <Scene2Visual /> </Sequence> </> ); };
Tips for Best Results
- Use character presets: Don't read screen text literally - use
narratororexpertfor natural flow 2
优点
- 支持多种语音风格和角色预设
- 自动时序验证以确保音频质量
- 与 Remotion 的视频项目轻松集成
缺点
- 需要 ElevenLabs 的 API 密钥
- 依赖外部服务进行语音生成
- 仅限于支持的音频和视频格式
相关技能
免责声明:本内容来源于 GitHub 开源项目,仅供展示和评分分析使用。
版权归原作者所有 Maartenlouis.
