๐ก Summary
ListenHub transforms text and URLs into podcasts, explainer videos, and images effortlessly.
๐ฏ Target Audience
๐ค AI Roast: โPowerful, but the setup might scare off the impatient.โ
Risk: Medium. Review: shell/CLI command execution; outbound network access (SSRF, data egress); API keys/tokens handling and storage; filesystem read/write scope and path traversal. Run with least privilege and audit before enabling in production.
name: listenhub description: | Explain anything โ turn ideas into podcasts, explainer videos, or voice narration. Use when the user wants to "make a podcast", "create an explainer video", "read this aloud", "generate an image", or share knowledge in audio/visual form. Supports: topic descriptions, YouTube links, article URLs, plain text, and image prompts.
Four modes, one entry point:
- Podcast โ Two-person dialogue, ideal for deep discussions
- Explain โ Single narrator + AI visuals, ideal for product intros
- TTS/Flow Speech โ Pure voice reading, ideal for articles
- Image Generation โ AI image creation, ideal for creative visualization
Users don't need to remember APIs, modes, or parameters. Just say what you want.
โ Hard Constraints (Inviolable)
The scripts are the ONLY interface. Period.
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ AI Agent โโโถ ./scripts/*.sh โโโถ ListenHub API โ
โ โฒ โ
โ โ โ
โ This is the ONLY path. โ
โ Direct API calls are FORBIDDEN. โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
MUST:
- Execute functionality ONLY through provided scripts in
**/skills/listenhub/scripts/ - Pass user intent as script arguments exactly as documented
- Trust script outputs; do not second-guess internal logic
MUST NOT:
- Write curl commands to ListenHub/Marswave API directly
- Construct JSON bodies for API calls manually
- Guess or fabricate speakerIds, endpoints, or API parameters
- Assume API structure based on patterns or web searches
- Hallucinate features not exposed by existing scripts
Why: The API is proprietary. Endpoints, parameters, and speakerIds are NOT publicly documented. Web searches will NOT find this information. Any attempt to bypass scripts will produce incorrect, non-functional code.
Script Location
Scripts are located at **/skills/listenhub/scripts/ relative to your working context.
Different AI clients use different dot-directories:
- Claude Code:
.claude/skills/listenhub/scripts/ - Other clients: may vary (
.cursor/,.windsurf/, etc.)
Resolution: Use glob pattern **/skills/listenhub/scripts/*.sh to locate scripts reliably, or resolve from the SKILL.md file's own path.
Private Data (Cannot Be Searched)
The following are internal implementation details that AI cannot reliably know:
| Category | Examples | How to Obtain |
|----------|----------|---------------|
| API Base URL | api.marswave.ai/... | โ Cannot โ internal to scripts |
| Endpoints | podcast/episodes, etc. | โ Cannot โ internal to scripts |
| Speaker IDs | cozy-man-english, etc. | โ Call get-speakers.sh |
| Request schemas | JSON body structure | โ Cannot โ internal to scripts |
| Response formats | Episode ID, status codes | โ Documented per script |
Rule: If information is not in this SKILL.md or retrievable via a script (like get-speakers.sh), assume you don't know it.
Design Philosophy
Hide complexity, reveal magic.
Users don't need to know: Episode IDs, API structure, polling mechanisms, credits, endpoint differences. Users only need: Say idea โ wait a moment โ get the link.
Environment
ListenHub API Key
API key stored in $LISTENHUB_API_KEY. Check on first use:
source ~/.zshrc 2>/dev/null; [ -n "$LISTENHUB_API_KEY" ] && echo "ready" || echo "need_setup"
If setup needed, guide user:
- Visit https://listenhub.ai/zh/settings/api-keys
- Paste key (only the
lh_sk_...part) - Auto-save to ~/.zshrc
Labnana API Key (for Image Generation)
API key stored in $LABNANA_API_KEY, output path in $LABNANA_OUTPUT_DIR.
On first image generation, the script auto-guides configuration:
- Visit https://labnana.com/api-keys (requires subscription)
- Paste API key
- Configure output path (default: ~/Downloads)
- Auto-save to shell rc file
Security: Never expose full API keys in output.
Mode Detection
Auto-detect mode from user input:
โ Podcast (Two-person dialogue)
- Keywords: "podcast", "chat about", "discuss", "debate", "dialogue"
- Use case: Topic exploration, opinion exchange, deep analysis
- Feature: Two voices, interactive feel
โ Explain (Explainer video)
- Keywords: "explain", "introduce", "video", "explainer", "tutorial"
- Use case: Product intro, concept explanation, tutorials
- Feature: Single narrator + AI-generated visuals, can export video
โ TTS (Text-to-speech)
- Keywords: "read aloud", "convert to speech", "tts", "voice"
- Use case: Article to audio, note review, document narration
- Feature: Fastest (1-2 min), pure audio
โ Image Generation
- Keywords: "generate image", "draw", "create picture", "visualize"
- Use case: Creative visualization, concept art, illustrations
- Feature: AI image generation via Labnana API, multiple resolutions and aspect ratios
Default: If unclear, ask user which format they prefer.
Explicit override: User can say "make it a podcast" / "I want explainer video" / "just voice" / "generate image" to override auto-detection.
Interaction Flow
Step 1: Receive input + detect mode
โ Got it! Preparing...
Mode: Two-person podcast
Topic: Latest developments in Manus AI
For URLs, identify type:
youtu.be/XXXโ convert tohttps://www.youtube.com/watch?v=XXX- Other URLs โ use directly
Step 2: Submit generation
โ Generation submitted
Estimated time:
โข Podcast: 2-3 minutes
โข Explain: 3-5 minutes
โข TTS: 1-2 minutes
You can:
โข Wait and ask "done yet?"
โข Check listenhub.ai/zh/app/library
โข Do other things, ask later
Internally remember Episode ID for status queries.
Step 3: Query status
When user says "done yet?" / "ready?" / "check status":
- Success: Show result + next options
- Processing: "Still generating, wait another minute?"
- Failed: "Generation failed, content might be unparseable. Try another?"
Step 4: Show results
Podcast result:
โ Podcast generated!
"{title}"
Listen: https://listenhub.ai/zh/app/library
Duration: ~{duration} minutes
Need to download? Just say so.
Explain result:
โ Explainer video generated!
"{title}"
Watch: https://listenhub.ai/zh/app/explainer-video/slides/{episodeId}
Duration: ~{duration} minutes
Need to download audio? Just say so.
Image result:
โ Image generated!
~/Downloads/labnana-{timestamp}.jpg
Important: Prioritize web experience. Only provide download URLs when user explicitly requests.
Script Reference
All scripts are curl-based (no extra dependencies). Locate via **/skills/listenhub/scripts/*.sh.
โ ๏ธ Long-running Tasks: Generation may take 1-5 minutes. Use your CLI client's native background execution feature:
- Claude Code: set
run_in_background: truein Bash tool - Other CLIs: use built-in async/background job management if available
Invocation pattern: $SCRIPTS/script-name.sh [args]
Where $SCRIPTS = resolved path to **/skills/listenhub/scripts/
Podcast (One-Stage)
$SCRIPTS/create-podcast.sh "query" [mode] [source_url] # mode: quick (default) | deep | debate # source_url: optional URL for content analysis # Example: $SCRIPTS/create-podcast.sh "The future of AI development" deep $SCRIPTS/create-podcast.sh "Analyze this article" deep "https://example.com/article"
Podcast (Two-Stage: Text โ Audio)
For advanced workflows requiring script editing between generation:
# Stage 1: Generate text content $SCRIPTS/create-podcast-text.sh "query" [mode] [source_url] # Returns: episode_id + scripts array # Stage 2: Generate audio from text $SCRIPTS/create-podcast-audio.sh "<episode-id>" [modified_scripts.json] # Without scripts file: uses original scripts # With scripts file: uses modified scripts
Speech (Multi-Speaker)
$SCRIPTS/create-speech.sh <scripts_json_file> # Or pipe: echo '{"scripts":[...]}' | $SCRIPTS/create-speech.sh - # scripts.json format: # { # "scripts": [ # {"content": "Script content here", "speakerId": "speaker-id"}, # ... # ] # }
Get Available Speakers
$SCRIPTS/get-speakers.sh [language] # language: zh (default) | en
Response structure (for AI parsing):
{ "code": 0, "data": { "items": [ { "name": "Yuanye", "speakerId": "cozy-man-english", "gender": "male", "language": "zh" } ] } }
Usage: When user requests specific voice characteristics (gender, style), call this script first to discover available speakerId values. NEVER hardcode or assume speakerIds.
Explain
$SCRIPTS/create-explainer.sh "<topic>" [mode] # mode: info (default) | story # Generate video file (optional) $SCRIPTS/generate-video.sh "<episode-id>"
TTS
$SCRIPTS/create-tts.sh "<text>" [mode] # mode: smart (default) | direct
Image Generation
$SCRIPTS/generate-image.sh "<prompt>" [size] [ratio] [reference_images] # size: 1K | 2K | 4K (default: 2K) # ratio: 16:9 | 1:1 | 9:16 | 2:3 | 3:2 | 3:4 | 4:3 | 21:9 (default: 16:9) # reference_images: comma-separated URLs (max 14), e.g. "url1,url2" # - Provides visual guidance for style, composition, or content # - Supports jpg, png, gif, webp, bmp formats # - URLs must be publicly accessible
Check Status
$SCRIPTS/check-status.sh "<episode-id>" <type> # type: podcast | explainer | tts
Language Adaptation
Automatic Language Detection: Adapt output language based on user input and context.
Detection Rules:
- User Input Language: If user writes in Chinese, respond in Chinese. If user writes in English, respond in English.
- **Context
Pros
- User-friendly interface with simple commands.
- Supports multiple content formats including podcasts and images.
- Automated mode detection for ease of use.
Cons
- Limited to predefined scripts for functionality.
- No direct API access for advanced users.
- Dependency on external API keys for operation.
Related Skills
pytorch
SโIt's the Swiss Army knife of deep learning, but good luck figuring out which of the 47 installation methods is the one that won't break your system.โ
agno
SโIt promises to be the Kubernetes for agents, but let's see if developers have the patience to learn yet another orchestration layer.โ
nuxt-skills
SโIt's essentially a well-organized cheat sheet that turns your AI assistant into a Nuxt framework parrot.โ
Disclaimer: This content is sourced from GitHub open source projects for display and rating purposes only.
Copyright belongs to the original author marswaveai.
