Co-Pilot / 辅助式

更新于 24 days ago

elevenlabs-remotion-skill

Name: elevenlabs-remotion-skill
Rating: 4.1 (1 reviews)
Author: Maartenlouis

MMaartenlouis

0.0k

maartenlouis/elevenlabs-remotion-skill

Agent 评分

💡 摘要

使用 ElevenLabs AI 生成专业的语音解说，支持可自定义的角色预设。

🎯 适合人群

视频内容创作者制作宣传视频的营销人员创建教学材料的教育工作者将语音解说集成到应用程序中的开发人员增强视频内容的社交媒体经理

🤖 AI 吐槽: “该技能需要 API 密钥，如果暴露会带来风险。确保 .env.local 文件安全，并且不包含在版本控制中。”

安全分析中风险

该技能需要 API 密钥，如果暴露会带来风险。确保 .env.local 文件安全，并且不包含在版本控制中。

name: elevenlabs-remotion description: Generate professional voiceovers using ElevenLabs AI. Use when the user needs to create voiceovers for videos, audio narration, or text-to-speech content. Supports multiple voices with character presets (narrator, salesperson, expert) for natural delivery. Includes single scene regeneration for fine-tuning. allowed-tools: Bash(node:), Bash(npx:)

ElevenLabs Voiceover Generation

Generate professional AI voiceovers for Remotion videos using ElevenLabs API.

Prerequisites

ELEVENLABS_API_KEY in .env.local

Quick Start

# Generate voiceover from text
node .claude/skills/elevenlabs-remotion-skill/generate.js --text "Your text here" --output public/audio/voiceover.mp3

# Generate with narrator style (more natural)
node .claude/skills/elevenlabs-remotion-skill/generate.js --text "Your text" --character narrator --output voiceover.mp3

# Generate scenes with request stitching
node .claude/skills/elevenlabs-remotion-skill/generate.js --scenes remotion/scenes.json --output-dir public/audio/project/

# Regenerate a single scene
node .claude/skills/elevenlabs-remotion-skill/generate.js --scenes scenes.json --scene scene2 --new-text "Updated text"

# List available voices and character presets
node .claude/skills/elevenlabs-remotion-skill/generate.js --list-voices
node .claude/skills/elevenlabs-remotion-skill/generate.js --list-characters

Character Presets

Use character presets for more natural voiceovers instead of literal screen text reading:

| Character | Description | Best For | |-----------|-------------|----------| | literal | Reads text exactly as written | Screen text, quotes | | narrator | Professional storyteller, smooth, engaging | Explainers, documentaries | | salesperson | Enthusiastic, persuasive, energetic | Marketing, ads | | expert | Authoritative, confident, knowledgeable | Legal content, tutorials | | conversational | Casual, friendly, natural | Social media, casual content | | dramatic | Intense, emotional, impactful | Hooks, problem statements | | calm | Soothing, reassuring, gentle | Trust-building, conclusions |

# Use narrator style globally
node .claude/skills/elevenlabs-remotion-skill/generate.js --scenes scenes.json --character narrator --output-dir public/audio/

# Or set per-scene in scenes.json
{
  "scenes": [
    { "id": "scene1", "text": "Problem statement", "character": "dramatic" },
    { "id": "scene2", "text": "Solution", "character": "calm" }
  ]
}

Scene-Based Generation with Request Stitching

Generate multiple scenes with consistent prosody using ElevenLabs request stitching:

scenes.json Format

{
  "name": "product-demo",
  "voice": "George",
  "character": "narrator",
  "scenes": [
    {
      "id": "scene1",
      "text": "Generic text-to-speech sounds robotic. Your brand deserves better.",
      "duration": 4.5,
      "character": "dramatic"
    },
    {
      "id": "scene2",
      "text": "With voice cloning, you can use your own voice for unlimited content.",
      "duration": 5.5
    },
    {
      "id": "scene3",
      "text": "Record a short sample. Clone it. Create professional voiceovers in minutes.",
      "duration": 6,
      "delay": 0.3
    }
  ]
}

Generate All Scenes

node .claude/skills/elevenlabs-remotion-skill/generate.js \
  --scenes remotion/product-demo-scenes.json \
  --output-dir public/audio/product-demo/

This creates:

product-demo-scene1.mp3 through sceneN.mp3
product-demo-combined.mp3 (all scenes stitched)
product-demo-info.json (metadata with durations)

Single Scene Regeneration

If a scene starts too early, has wrong timing, or needs different text:

# Regenerate scene2 with new text
node .claude/skills/elevenlabs-remotion-skill/generate.js \
  --scenes remotion/scenes.json \
  --scene scene2 \
  --new-text "Updated scene 2 text" \
  --output-dir public/audio/project/

# Regenerate scene3 with different character
node .claude/skills/elevenlabs-remotion-skill/generate.js \
  --scenes remotion/scenes.json \
  --scene scene3 \
  --character salesperson \
  --output-dir public/audio/project/

# Just regenerate (same text, same character)
node .claude/skills/elevenlabs-remotion-skill/generate.js \
  --scenes remotion/scenes.json \
  --scene scene1 \
  --output-dir public/audio/project/

# Embed a thumbnail into an MP4 video
node .claude/skills/elevenlabs-remotion-skill/generate.js \
  --embed-thumbnail public/videos/my-video.mp4 \
  --thumbnail public/videos/my-thumbnail.png \
  --output public/videos/my-video-with-thumb.mp4

The tool automatically:

Uses request stitching from previous scenes for consistent prosody
Updates the info.json file with new metadata
Updates scenes.json if --new-text is provided

Thumbnail Embedding

Embed a thumbnail image into MP4 videos so platforms like Twitter, YouTube, and video players display your custom thumbnail instead of the first frame.

Embed Thumbnail into Video

# Basic usage - outputs to video-thumb.mp4
node .claude/skills/elevenlabs-remotion-skill/generate.js \
  --embed-thumbnail public/videos/promo.mp4 \
  --thumbnail public/videos/thumbnail.png

# Custom output path
node .claude/skills/elevenlabs-remotion-skill/generate.js \
  --embed-thumbnail public/videos/promo.mp4 \
  --thumbnail public/videos/thumbnail.png \
  --output public/videos/promo-final.mp4

Workflow with Remotion

# 1. Render your video
npx remotion render MyVideo public/videos/my-video.mp4

# 2. Render your thumbnail (use Still composition)
npx remotion still MyVideoThumbnail public/videos/my-thumbnail.png

# 3. Embed the thumbnail
node .claude/skills/elevenlabs-remotion-skill/generate.js \
  --embed-thumbnail public/videos/my-video.mp4 \
  --thumbnail public/videos/my-thumbnail.png \
  --output public/videos/my-video-final.mp4

Supported Formats

Video: MP4 (H.264/H.265)
Thumbnail: PNG, JPG, JPEG

The embedding uses ffmpeg's -disposition:v:1 attached_pic flag to set the thumbnail as an attached picture, which most video players and platforms recognize.

Timing Validation

The skill automatically validates timing after generation using ffprobe:

What It Checks

| Check | Threshold | Description | |-------|-----------|-------------| | Duration mismatch | >15% | Warns if actual differs from expected duration | | Leading silence | >200ms | Audio starts late (voiceover delayed) | | Trailing silence | >500ms | Unnecessary silence at end | | Speaking rate | 2-4.5 wps | Optimal ~3 words/second |

Validate Existing Audio

# Validate all scenes in a project
node .claude/skills/elevenlabs-remotion-skill/generate.js --validate public/audio/product-demo/

Output example:

🔍 Validating product-demo (6 scenes)

❌ scene1: 3.00s (expected: 4.5s)
   ❌ Audio 1.50s shorter than expected
   👍 8 words @ 3.1 words/sec
⚠️ scene2: 6.35s (expected: 5.5s)
   ⚠️ Leading silence: 235ms (may start late)
   🐢 10 words @ 1.8 words/sec
✅ scene4: 4.36s (expected: 4s)
   👍 9 words @ 2.3 words/sec

📊 Total duration: 30.80s (expected: 30.00s)

Updated info.json

After validation, the info.json includes actual measurements:

{
  "scenes": [
    {
      "id": "scene1",
      "duration": 4.5,
      "actualDuration": 3.0,
      "leadingSilence": 0.05,
      "wordsPerSecond": 3.1
    }
  ]
}

Use actualDuration in your Remotion composition for precise sync.

Options

| Option | Description | Default | |--------|-------------|---------| | --text, -t | Text to convert to speech | Required (or --file/--scenes) | | --file, -f | Read text from file | - | | --output, -o | Output file path | output.mp3 | | --output-dir | Output directory for scenes | public/audio | | --voice, -v | Voice name or ID | George | | --model, -m | Model ID | eleven_multilingual_v2 | | --character, -c | Character preset | literal | | --scenes | JSON file with scenes | - | | --scene | Regenerate single scene ID | - | | --new-text | New text for scene regen | - | | --validate | Validate existing audio dir | - | | --skip-validation | Skip auto-validation | false | | --embed-thumbnail | Video file to embed thumbnail into | - | | --thumbnail | Thumbnail image file (PNG/JPG) | - | | --stability | Voice stability (0-1) | varies by character | | --similarity | Voice similarity (0-1) | varies by character | | --style | Style exaggeration (0-1) | varies by character | | --no-combined | Skip combined file | false |

Recommended Voices

| Voice | Style | Best For | |-------|-------|----------| | George | Warm, captivating British | Narration, explainers | | Antoni | Professional, warm | Legal content, tutorials | | Arnold | Authoritative, deep | Corporate, serious topics | | Josh | Friendly, conversational | Marketing, casual content |

Integration with Remotion

After generating scene voiceovers, use them in your composition:

import { Audio, Sequence, staticFile } from "remotion";

// Use individual scene audio files for precise sync
const SCENE_DURATIONS = {
  scene1: 4.5,  // From info.json
  scene2: 5.5,
  scene3: 8.0,
};

export const VideoWithVoiceover: React.FC = () => {
  const { fps } = useVideoConfig();

  const scene1Frames = Math.round(SCENE_DURATIONS.scene1 * fps);
  const scene2Frames = Math.round(SCENE_DURATIONS.scene2 * fps);

  return (
    <>
      <Sequence from={0} durationInFrames={scene1Frames}>
        <Audio src={staticFile("audio/project/project-scene1.mp3")} />
        <Scene1Visual />
      </Sequence>

      <Sequence from={scene1Frames} durationInFrames={scene2Frames}>
        <Audio src={staticFile("audio/project/project-scene2.mp3")} />
        <Scene2Visual />
      </Sequence>
    </>
  );
};

Tips for Best Results

Use character presets: Don't read screen text literally - use narrator or expert for natural flow 2

五维分析

清晰度8/10

创新性7/10

实用性9/10

完整性9/10

可维护性8/10

优缺点分析

优点

支持多种语音风格和角色预设
自动时序验证以确保音频质量
与 Remotion 的视频项目轻松集成

缺点

需要 ElevenLabs 的 API 密钥
依赖外部服务进行语音生成
仅限于支持的音频和视频格式