youtube-transcript
💡 摘要
使用 yt-dlp 下载 YouTube 视频字幕,并在字幕不可用时使用 Whisper 转录作为备用方案。
🎯 适合人群
🤖 AI 吐槽: “这是一个文档齐全的 yt-dlp 包装器,但你仍然需要像系统管理员一样照看它的外部依赖。”
风险:执行来自不可信 README 的任意 shell 命令,通过 pip/apt/brew 安装软件包(供应链风险),下载并处理用户提供的任意 URL。缓解措施:在沙盒容器中运行;使用固定、经验证的 yt-dlp 和 whisper 版本;在处理前验证 YouTube URL。
name: youtube-transcript description: Download YouTube video transcripts when user provides a YouTube URL or asks to download/get/fetch a transcript from YouTube. Also use when user wants to transcribe or get captions/subtitles from a YouTube video. allowed-tools: Bash,Read,Write
YouTube Transcript Downloader
This skill helps download transcripts (subtitles/captions) from YouTube videos using yt-dlp.
When to Use This Skill
Activate this skill when the user:
- Provides a YouTube URL and wants the transcript
- Asks to "download transcript from YouTube"
- Wants to "get captions" or "get subtitles" from a video
- Asks to "transcribe a YouTube video"
- Needs text content from a YouTube video
How It Works
Priority Order:
- Check if yt-dlp is installed - install if needed
- List available subtitles - see what's actually available
- Try manual subtitles first (
--write-sub) - highest quality - Fallback to auto-generated (
--write-auto-sub) - usually available - Last resort: Whisper transcription - if no subtitles exist (requires user confirmation)
- Confirm the download and show the user where the file is saved
- Optionally clean up the VTT format if the user wants plain text
Installation Check
IMPORTANT: Always check if yt-dlp is installed first:
which yt-dlp || command -v yt-dlp
If Not Installed
Attempt automatic installation based on the system:
macOS (Homebrew):
brew install yt-dlp
Linux (apt/Debian/Ubuntu):
sudo apt update && sudo apt install -y yt-dlp
Alternative (pip - works on all systems):
pip3 install yt-dlp # or python3 -m pip install yt-dlp
If installation fails: Inform the user they need to install yt-dlp manually and provide them with installation instructions from https://github.com/yt-dlp/yt-dlp#installation
Check Available Subtitles
ALWAYS do this first before attempting to download:
yt-dlp --list-subs "YOUTUBE_URL"
This shows what subtitle types are available without downloading anything. Look for:
- Manual subtitles (better quality)
- Auto-generated subtitles (usually available)
- Available languages
Download Strategy
Option 1: Manual Subtitles (Preferred)
Try this first - highest quality, human-created:
yt-dlp --write-sub --skip-download --output "OUTPUT_NAME" "YOUTUBE_URL"
Option 2: Auto-Generated Subtitles (Fallback)
If manual subtitles aren't available:
yt-dlp --write-auto-sub --skip-download --output "OUTPUT_NAME" "YOUTUBE_URL"
Both commands create a .vtt file (WebVTT subtitle format).
Option 3: Whisper Transcription (Last Resort)
ONLY use this if both manual and auto-generated subtitles are unavailable.
Step 1: Show File Size and Ask for Confirmation
# Get audio file size estimate yt-dlp --print "%(filesize,filesize_approx)s" -f "bestaudio" "YOUTUBE_URL" # Or get duration to estimate yt-dlp --print "%(duration)s %(title)s" "YOUTUBE_URL"
IMPORTANT: Display the file size to the user and ask: "No subtitles are available. I can download the audio (approximately X MB) and transcribe it using Whisper. Would you like to proceed?"
Wait for user confirmation before continuing.
Step 2: Check for Whisper Installation
command -v whisper
If not installed, ask user: "Whisper is not installed. Install it with pip install openai-whisper (requires ~1-3GB for models)? This is a one-time installation."
Wait for user confirmation before installing.
Install if approved:
pip3 install openai-whisper
Step 3: Download Audio Only
yt-dlp -x --audio-format mp3 --output "audio_%(id)s.%(ext)s" "YOUTUBE_URL"
Step 4: Transcribe with Whisper
# Auto-detect language (recommended) whisper audio_VIDEO_ID.mp3 --model base --output_format vtt # Or specify language if known whisper audio_VIDEO_ID.mp3 --model base --language en --output_format vtt
Model Options (stick to base for now):
tiny- fastest, least accurate (~1GB)base- good balance (~1GB) ← USE THISsmall- better accuracy (~2GB)medium- very good (~5GB)large- best accuracy (~10GB)
Step 5: Cleanup
After transcription completes, ask user: "Transcription complete! Would you like me to delete the audio file to save space?"
If yes:
rm audio_VIDEO_ID.mp3
Getting Video Information
Extract Video Title (for filename)
yt-dlp --print "%(title)s" "YOUTUBE_URL"
Use this to create meaningful filenames based on the video title. Clean the title for filesystem compatibility:
- Replace
/with- - Replace special characters that might cause issues
- Consider using sanitized version:
$(yt-dlp --print "%(title)s" "URL" | tr '/' '-' | tr ':' '-')
Post-Processing
Convert to Plain Text (Recommended)
YouTube's auto-generated VTT files contain duplicate lines because captions are shown progressively with overlapping timestamps. Always deduplicate when converting to plain text while preserving the original speaking order.
python3 -c " import sys, re seen = set() with open('transcript.en.vtt', 'r') as f: for line in f: line = line.strip() if line and not line.startswith('WEBVTT') and not line.startswith('Kind:') and not line.startswith('Language:') and '-->' not in line: clean = re.sub('<[^>]*>', '', line) clean = clean.replace('&', '&').replace('>', '>').replace('<', '<') if clean and clean not in seen: print(clean) seen.add(clean) " > transcript.txt
Complete Post-Processing with Video Title
# Get video title VIDEO_TITLE=$(yt-dlp --print "%(title)s" "YOUTUBE_URL" | tr '/' '_' | tr ':' '-' | tr '?' '' | tr '"' '') # Find the VTT file VTT_FILE=$(ls *.vtt | head -n 1) # Convert with deduplication python3 -c " import sys, re seen = set() with open('$VTT_FILE', 'r') as f: for line in f: line = line.strip() if line and not line.startswith('WEBVTT') and not line.startswith('Kind:') and not line.startswith('Language:') and '-->' not in line: clean = re.sub('<[^>]*>', '', line) clean = clean.replace('&', '&').replace('>', '>').replace('<', '<') if clean and clean not in seen: print(clean) seen.add(clean) " > "${VIDEO_TITLE}.txt" echo "✓ Saved to: ${VIDEO_TITLE}.txt" # Clean up VTT file rm "$VTT_FILE" echo "✓ Cleaned up temporary VTT file"
Output Formats
- VTT format (
.vtt): Includes timestamps and formatting, good for video players - Plain text (
.txt): Just the text content, good for reading or analysis
Tips
- The filename will be
{output_name}.{language_code}.vtt(e.g.,transcript.en.vtt) - Most YouTube videos have auto-generated English subtitles
- Some videos may have multiple language options
- If auto-subtitles aren't available, try
--write-subinstead for manual subtitles
Complete Workflow Example
VIDEO_URL="https://www.youtube.com/watch?v=dQw4w9WgXcQ" # Get video title for filename VIDEO_TITLE=$(yt-dlp --print "%(title)s" "$VIDEO_URL" | tr '/' '_' | tr ':' '-' | tr '?' '' | tr '"' '') OUTPUT_NAME="transcript_temp" # ============================================ # STEP 1: Check if yt-dlp is installed # ============================================ if ! command -v yt-dlp &> /dev/null; then echo "yt-dlp not found, attempting to install..." if command -v brew &> /dev/null; then brew install yt-dlp elif command -v apt &> /dev/null; then sudo apt update && sudo apt install -y yt-dlp else pip3 install yt-dlp fi fi # ============================================ # STEP 2: List available subtitles # ============================================ echo "Checking available subtitles..." yt-dlp --list-subs "$VIDEO_URL" # ============================================ # STEP 3: Try manual subtitles first # ============================================ echo "Attempting to download manual subtitles..." if yt-dlp --write-sub --skip-download --output "$OUTPUT_NAME" "$VIDEO_URL" 2>/dev/null; then echo "✓ Manual subtitles downloaded successfully!" ls -lh ${OUTPUT_NAME}.* else # ============================================ # STEP 4: Fallback to auto-generated # ============================================ echo "Manual subtitles not available. Trying auto-generated..." if yt-dlp --write-auto-sub --skip-download --output "$OUTPUT_NAME" "$VIDEO_URL" 2>/dev/null; then echo "✓ Auto-generated subtitles downloaded successfully!" ls -lh ${OUTPUT_NAME}.* else # ============================================ # STEP 5: Last resort - Whisper transcription # ============================================ echo "⚠ No subtitles available for this video." # Get file size FILE_SIZE=$(yt-dlp --print "%(filesize_approx)s" -f "bestaudio" "$VIDEO_URL") DURATION=$(yt-dlp --print "%(duration)s" "$VIDEO_URL") TITLE=$(yt-dlp --print "%(title)s" "$VIDEO_URL") echo "Video: $TITLE" echo "Duration: $((DURATION / 60)) minutes" echo "Audio size: ~$((FILE_SIZE / 1024 / 1024)) MB" echo "" echo "Would you like to download and transcribe with Whisper? (y/n)" read -r RESPONSE if [[ "$RESPONSE" =~ ^[Yy]$ ]]; then # Check for Whisper if ! command -v whisper &> /dev/null; then echo "Whisper not installed. Install now? (requires ~1-3GB) (y/n)" read -r INSTALL_RESPONSE if [[ "$INSTALL_RESPONSE" =~ ^[Yy]$ ]]; then pip3 install openai-whisper else echo "Cannot proceed without Whisper. Exiting." exit 1 fi
优点
- 全面的多级回退策略(手动字幕 -> 自动字幕 -> Whisper 转录)
- 包含后处理功能,可将 VTT 格式清理为纯文本
- 针对大型下载/安装提供了清晰的用户确认步骤
缺点
- 严重依赖外部工具(yt-dlp, Whisper),安装路径复杂
- Whisper 回退方案需要大量磁盘空间和用户耐心
- Bash 脚本逻辑在不同用户环境中可能很脆弱
相关技能
免责声明:本内容来源于 GitHub 开源项目,仅供展示和评分分析使用。
版权归原作者所有 michalparkola.
