💡 摘要
ListenHub轻松将文本和URL转换为播客、解释视频和图像。
🎯 适合人群
🤖 AI 吐槽: “看起来很能打,但别让配置把人劝退。”
风险:Medium。建议检查:是否执行 shell/命令行指令;是否发起外网请求(SSRF/数据外发);API Key/Token 的获取、存储与泄露风险;文件读写范围与路径穿越风险。以最小权限运行,并在生产环境启用前审计代码与依赖。
name: listenhub description: | Explain anything — turn ideas into podcasts, explainer videos, or voice narration. Use when the user wants to "make a podcast", "create an explainer video", "read this aloud", "generate an image", or share knowledge in audio/visual form. Supports: topic descriptions, YouTube links, article URLs, plain text, and image prompts.
Four modes, one entry point:
- Podcast — Two-person dialogue, ideal for deep discussions
- Explain — Single narrator + AI visuals, ideal for product intros
- TTS/Flow Speech — Pure voice reading, ideal for articles
- Image Generation — AI image creation, ideal for creative visualization
Users don't need to remember APIs, modes, or parameters. Just say what you want.
⛔ Hard Constraints (Inviolable)
The scripts are the ONLY interface. Period.
┌─────────────────────────────────────────────────────────┐
│ AI Agent ──▶ ./scripts/*.sh ──▶ ListenHub API │
│ ▲ │
│ │ │
│ This is the ONLY path. │
│ Direct API calls are FORBIDDEN. │
└─────────────────────────────────────────────────────────┘
MUST:
- Execute functionality ONLY through provided scripts in
**/skills/listenhub/scripts/ - Pass user intent as script arguments exactly as documented
- Trust script outputs; do not second-guess internal logic
MUST NOT:
- Write curl commands to ListenHub/Marswave API directly
- Construct JSON bodies for API calls manually
- Guess or fabricate speakerIds, endpoints, or API parameters
- Assume API structure based on patterns or web searches
- Hallucinate features not exposed by existing scripts
Why: The API is proprietary. Endpoints, parameters, and speakerIds are NOT publicly documented. Web searches will NOT find this information. Any attempt to bypass scripts will produce incorrect, non-functional code.
Script Location
Scripts are located at **/skills/listenhub/scripts/ relative to your working context.
Different AI clients use different dot-directories:
- Claude Code:
.claude/skills/listenhub/scripts/ - Other clients: may vary (
.cursor/,.windsurf/, etc.)
Resolution: Use glob pattern **/skills/listenhub/scripts/*.sh to locate scripts reliably, or resolve from the SKILL.md file's own path.
Private Data (Cannot Be Searched)
The following are internal implementation details that AI cannot reliably know:
| Category | Examples | How to Obtain |
|----------|----------|---------------|
| API Base URL | api.marswave.ai/... | ✗ Cannot — internal to scripts |
| Endpoints | podcast/episodes, etc. | ✗ Cannot — internal to scripts |
| Speaker IDs | cozy-man-english, etc. | ✓ Call get-speakers.sh |
| Request schemas | JSON body structure | ✗ Cannot — internal to scripts |
| Response formats | Episode ID, status codes | ✓ Documented per script |
Rule: If information is not in this SKILL.md or retrievable via a script (like get-speakers.sh), assume you don't know it.
Design Philosophy
Hide complexity, reveal magic.
Users don't need to know: Episode IDs, API structure, polling mechanisms, credits, endpoint differences. Users only need: Say idea → wait a moment → get the link.
Environment
ListenHub API Key
API key stored in $LISTENHUB_API_KEY. Check on first use:
source ~/.zshrc 2>/dev/null; [ -n "$LISTENHUB_API_KEY" ] && echo "ready" || echo "need_setup"
If setup needed, guide user:
- Visit https://listenhub.ai/zh/settings/api-keys
- Paste key (only the
lh_sk_...part) - Auto-save to ~/.zshrc
Labnana API Key (for Image Generation)
API key stored in $LABNANA_API_KEY, output path in $LABNANA_OUTPUT_DIR.
On first image generation, the script auto-guides configuration:
- Visit https://labnana.com/api-keys (requires subscription)
- Paste API key
- Configure output path (default: ~/Downloads)
- Auto-save to shell rc file
Security: Never expose full API keys in output.
Mode Detection
Auto-detect mode from user input:
→ Podcast (Two-person dialogue)
- Keywords: "podcast", "chat about", "discuss", "debate", "dialogue"
- Use case: Topic exploration, opinion exchange, deep analysis
- Feature: Two voices, interactive feel
→ Explain (Explainer video)
- Keywords: "explain", "introduce", "video", "explainer", "tutorial"
- Use case: Product intro, concept explanation, tutorials
- Feature: Single narrator + AI-generated visuals, can export video
→ TTS (Text-to-speech)
- Keywords: "read aloud", "convert to speech", "tts", "voice"
- Use case: Article to audio, note review, document narration
- Feature: Fastest (1-2 min), pure audio
→ Image Generation
- Keywords: "generate image", "draw", "create picture", "visualize"
- Use case: Creative visualization, concept art, illustrations
- Feature: AI image generation via Labnana API, multiple resolutions and aspect ratios
Default: If unclear, ask user which format they prefer.
Explicit override: User can say "make it a podcast" / "I want explainer video" / "just voice" / "generate image" to override auto-detection.
Interaction Flow
Step 1: Receive input + detect mode
→ Got it! Preparing...
Mode: Two-person podcast
Topic: Latest developments in Manus AI
For URLs, identify type:
youtu.be/XXX→ convert tohttps://www.youtube.com/watch?v=XXX- Other URLs → use directly
Step 2: Submit generation
→ Generation submitted
Estimated time:
• Podcast: 2-3 minutes
• Explain: 3-5 minutes
• TTS: 1-2 minutes
You can:
• Wait and ask "done yet?"
• Check listenhub.ai/zh/app/library
• Do other things, ask later
Internally remember Episode ID for status queries.
Step 3: Query status
When user says "done yet?" / "ready?" / "check status":
- Success: Show result + next options
- Processing: "Still generating, wait another minute?"
- Failed: "Generation failed, content might be unparseable. Try another?"
Step 4: Show results
Podcast result:
✓ Podcast generated!
"{title}"
Listen: https://listenhub.ai/zh/app/library
Duration: ~{duration} minutes
Need to download? Just say so.
Explain result:
✓ Explainer video generated!
"{title}"
Watch: https://listenhub.ai/zh/app/explainer-video/slides/{episodeId}
Duration: ~{duration} minutes
Need to download audio? Just say so.
Image result:
✓ Image generated!
~/Downloads/labnana-{timestamp}.jpg
Important: Prioritize web experience. Only provide download URLs when user explicitly requests.
Script Reference
All scripts are curl-based (no extra dependencies). Locate via **/skills/listenhub/scripts/*.sh.
⚠️ Long-running Tasks: Generation may take 1-5 minutes. Use your CLI client's native background execution feature:
- Claude Code: set
run_in_background: truein Bash tool - Other CLIs: use built-in async/background job management if available
Invocation pattern: $SCRIPTS/script-name.sh [args]
Where $SCRIPTS = resolved path to **/skills/listenhub/scripts/
Podcast (One-Stage)
$SCRIPTS/create-podcast.sh "query" [mode] [source_url] # mode: quick (default) | deep | debate # source_url: optional URL for content analysis # Example: $SCRIPTS/create-podcast.sh "The future of AI development" deep $SCRIPTS/create-podcast.sh "Analyze this article" deep "https://example.com/article"
Podcast (Two-Stage: Text → Audio)
For advanced workflows requiring script editing between generation:
# Stage 1: Generate text content $SCRIPTS/create-podcast-text.sh "query" [mode] [source_url] # Returns: episode_id + scripts array # Stage 2: Generate audio from text $SCRIPTS/create-podcast-audio.sh "<episode-id>" [modified_scripts.json] # Without scripts file: uses original scripts # With scripts file: uses modified scripts
Speech (Multi-Speaker)
$SCRIPTS/create-speech.sh <scripts_json_file> # Or pipe: echo '{"scripts":[...]}' | $SCRIPTS/create-speech.sh - # scripts.json format: # { # "scripts": [ # {"content": "Script content here", "speakerId": "speaker-id"}, # ... # ] # }
Get Available Speakers
$SCRIPTS/get-speakers.sh [language] # language: zh (default) | en
Response structure (for AI parsing):
{ "code": 0, "data": { "items": [ { "name": "Yuanye", "speakerId": "cozy-man-english", "gender": "male", "language": "zh" } ] } }
Usage: When user requests specific voice characteristics (gender, style), call this script first to discover available speakerId values. NEVER hardcode or assume speakerIds.
Explain
$SCRIPTS/create-explainer.sh "<topic>" [mode] # mode: info (default) | story # Generate video file (optional) $SCRIPTS/generate-video.sh "<episode-id>"
TTS
$SCRIPTS/create-tts.sh "<text>" [mode] # mode: smart (default) | direct
Image Generation
$SCRIPTS/generate-image.sh "<prompt>" [size] [ratio] [reference_images] # size: 1K | 2K | 4K (default: 2K) # ratio: 16:9 | 1:1 | 9:16 | 2:3 | 3:2 | 3:4 | 4:3 | 21:9 (default: 16:9) # reference_images: comma-separated URLs (max 14), e.g. "url1,url2" # - Provides visual guidance for style, composition, or content # - Supports jpg, png, gif, webp, bmp formats # - URLs must be publicly accessible
Check Status
$SCRIPTS/check-status.sh "<episode-id>" <type> # type: podcast | explainer | tts
Language Adaptation
Automatic Language Detection: Adapt output language based on user input and context.
Detection Rules:
- User Input Language: If user writes in Chinese, respond in Chinese. If user writes in English, respond in English.
- **Context
优点
- 用户友好的界面,命令简单。
- 支持多种内容格式,包括播客和图像。
- 自动模式检测,使用方便。
缺点
- 功能仅限于预定义脚本。
- 不允许高级用户直接访问API。
- 操作依赖于外部API密钥。
相关技能
免责声明:本内容来源于 GitHub 开源项目,仅供展示和评分分析使用。
版权归原作者所有 marswaveai.
