Co-Pilot / 辅助式

更新于 a month ago

listenhub

Name: listenhub
Rating: 4.0 (1 reviews)
Author: marswaveai

Mmarswaveai

0.0k

marswaveai/skills/skills/listenhub

Agent 评分

💡 摘要

ListenHub轻松将文本和URL转换为播客、解释视频和图像。

🎯 适合人群

希望快速制作音频/视频内容的内容创作者。希望创建引人入胜的解释视频的教育工作者。需要为品牌故事生成播客的营销人员。寻求简单API进行多媒体内容生成的开发人员。希望将文章转换为音频以便更容易消费的学生。

🤖 AI 吐槽: “看起来很能打，但别让配置把人劝退。”

安全分析中风险

风险：Medium。建议检查：是否执行 shell/命令行指令；是否发起外网请求（SSRF/数据外发）；API Key/Token 的获取、存储与泄露风险；文件读写范围与路径穿越风险。以最小权限运行，并在生产环境启用前审计代码与依赖。

name: listenhub description: | Explain anything — turn ideas into podcasts, explainer videos, or voice narration. Use when the user wants to "make a podcast", "create an explainer video", "read this aloud", "generate an image", or share knowledge in audio/visual form. Supports: topic descriptions, YouTube links, article URLs, plain text, and image prompts.

Four modes, one entry point:

Podcast — Two-person dialogue, ideal for deep discussions
Explain — Single narrator + AI visuals, ideal for product intros
TTS/Flow Speech — Pure voice reading, ideal for articles
Image Generation — AI image creation, ideal for creative visualization

Users don't need to remember APIs, modes, or parameters. Just say what you want.

⛔ Hard Constraints (Inviolable)

The scripts are the ONLY interface. Period.

┌─────────────────────────────────────────────────────────┐
│  AI Agent  ──▶  ./scripts/*.sh  ──▶  ListenHub API     │
│                      ▲                                  │
│                      │                                  │
│            This is the ONLY path.                       │
│            Direct API calls are FORBIDDEN.              │
└─────────────────────────────────────────────────────────┘

MUST:

Execute functionality ONLY through provided scripts in **/skills/listenhub/scripts/
Pass user intent as script arguments exactly as documented
Trust script outputs; do not second-guess internal logic

MUST NOT:

Write curl commands to ListenHub/Marswave API directly
Construct JSON bodies for API calls manually
Guess or fabricate speakerIds, endpoints, or API parameters
Assume API structure based on patterns or web searches
Hallucinate features not exposed by existing scripts

Why: The API is proprietary. Endpoints, parameters, and speakerIds are NOT publicly documented. Web searches will NOT find this information. Any attempt to bypass scripts will produce incorrect, non-functional code.

Script Location

Scripts are located at **/skills/listenhub/scripts/ relative to your working context.

Different AI clients use different dot-directories:

Claude Code: .claude/skills/listenhub/scripts/
Other clients: may vary (.cursor/, .windsurf/, etc.)

Resolution: Use glob pattern **/skills/listenhub/scripts/*.sh to locate scripts reliably, or resolve from the SKILL.md file's own path.

Private Data (Cannot Be Searched)

The following are internal implementation details that AI cannot reliably know:

| Category | Examples | How to Obtain | |----------|----------|---------------| | API Base URL | api.marswave.ai/... | ✗ Cannot — internal to scripts | | Endpoints | podcast/episodes, etc. | ✗ Cannot — internal to scripts | | Speaker IDs | cozy-man-english, etc. | ✓ Call get-speakers.sh | | Request schemas | JSON body structure | ✗ Cannot — internal to scripts | | Response formats | Episode ID, status codes | ✓ Documented per script |

Rule: If information is not in this SKILL.md or retrievable via a script (like get-speakers.sh), assume you don't know it.

Design Philosophy

Hide complexity, reveal magic.

Users don't need to know: Episode IDs, API structure, polling mechanisms, credits, endpoint differences. Users only need: Say idea → wait a moment → get the link.

Environment

ListenHub API Key

API key stored in $LISTENHUB_API_KEY. Check on first use:

source ~/.zshrc 2>/dev/null; [ -n "$LISTENHUB_API_KEY" ] && echo "ready" || echo "need_setup"

If setup needed, guide user:

Visit https://listenhub.ai/zh/settings/api-keys
Paste key (only the lh_sk_... part)
Auto-save to ~/.zshrc

Labnana API Key (for Image Generation)

API key stored in $LABNANA_API_KEY, output path in $LABNANA_OUTPUT_DIR.

On first image generation, the script auto-guides configuration:

Visit https://labnana.com/api-keys (requires subscription)
Paste API key
Configure output path (default: ~/Downloads)
Auto-save to shell rc file

Security: Never expose full API keys in output.

Mode Detection

Auto-detect mode from user input:

→ Podcast (Two-person dialogue)

Keywords: "podcast", "chat about", "discuss", "debate", "dialogue"
Use case: Topic exploration, opinion exchange, deep analysis
Feature: Two voices, interactive feel

→ Explain (Explainer video)

Keywords: "explain", "introduce", "video", "explainer", "tutorial"
Use case: Product intro, concept explanation, tutorials
Feature: Single narrator + AI-generated visuals, can export video

→ TTS (Text-to-speech)

Keywords: "read aloud", "convert to speech", "tts", "voice"
Use case: Article to audio, note review, document narration
Feature: Fastest (1-2 min), pure audio

→ Image Generation

Keywords: "generate image", "draw", "create picture", "visualize"
Use case: Creative visualization, concept art, illustrations
Feature: AI image generation via Labnana API, multiple resolutions and aspect ratios

Default: If unclear, ask user which format they prefer.

Explicit override: User can say "make it a podcast" / "I want explainer video" / "just voice" / "generate image" to override auto-detection.

Interaction Flow

Step 1: Receive input + detect mode

→ Got it! Preparing...
  Mode: Two-person podcast
  Topic: Latest developments in Manus AI

For URLs, identify type:

youtu.be/XXX → convert to https://www.youtube.com/watch?v=XXX
Other URLs → use directly

Step 2: Submit generation

→ Generation submitted

  Estimated time:
  • Podcast: 2-3 minutes
  • Explain: 3-5 minutes
  • TTS: 1-2 minutes

  You can:
  • Wait and ask "done yet?"
  • Check listenhub.ai/zh/app/library
  • Do other things, ask later

Internally remember Episode ID for status queries.

Step 3: Query status

When user says "done yet?" / "ready?" / "check status":

Success: Show result + next options
Processing: "Still generating, wait another minute?"
Failed: "Generation failed, content might be unparseable. Try another?"

Step 4: Show results

Podcast result:

✓ Podcast generated!

  "{title}"

  Listen: https://listenhub.ai/zh/app/library

  Duration: ~{duration} minutes

  Need to download? Just say so.

Explain result:

✓ Explainer video generated!

  "{title}"

  Watch: https://listenhub.ai/zh/app/explainer-video/slides/{episodeId}

  Duration: ~{duration} minutes

  Need to download audio? Just say so.

Image result:

✓ Image generated!

  ~/Downloads/labnana-{timestamp}.jpg

Important: Prioritize web experience. Only provide download URLs when user explicitly requests.

Script Reference

All scripts are curl-based (no extra dependencies). Locate via **/skills/listenhub/scripts/*.sh.

⚠️ Long-running Tasks: Generation may take 1-5 minutes. Use your CLI client's native background execution feature:

Claude Code: set run_in_background: true in Bash tool
Other CLIs: use built-in async/background job management if available

Invocation pattern: $SCRIPTS/script-name.sh [args]

Where $SCRIPTS = resolved path to **/skills/listenhub/scripts/

Podcast (One-Stage)

$SCRIPTS/create-podcast.sh "query" [mode] [source_url]
# mode: quick (default) | deep | debate
# source_url: optional URL for content analysis

# Example:
$SCRIPTS/create-podcast.sh "The future of AI development" deep
$SCRIPTS/create-podcast.sh "Analyze this article" deep "https://example.com/article"

Podcast (Two-Stage: Text → Audio)

For advanced workflows requiring script editing between generation:

# Stage 1: Generate text content
$SCRIPTS/create-podcast-text.sh "query" [mode] [source_url]
# Returns: episode_id + scripts array

# Stage 2: Generate audio from text
$SCRIPTS/create-podcast-audio.sh "<episode-id>" [modified_scripts.json]
# Without scripts file: uses original scripts
# With scripts file: uses modified scripts

Speech (Multi-Speaker)

$SCRIPTS/create-speech.sh <scripts_json_file>
# Or pipe: echo '{"scripts":[...]}' | $SCRIPTS/create-speech.sh -

# scripts.json format:
# {
#   "scripts": [
#     {"content": "Script content here", "speakerId": "speaker-id"},
#     ...
#   ]
# }

Get Available Speakers

$SCRIPTS/get-speakers.sh [language]
# language: zh (default) | en

Response structure (for AI parsing):

{
  "code": 0,
  "data": {
    "items": [
      {
        "name": "Yuanye",
        "speakerId": "cozy-man-english",
        "gender": "male",
        "language": "zh"
      }
    ]
  }
}

Usage: When user requests specific voice characteristics (gender, style), call this script first to discover available speakerId values. NEVER hardcode or assume speakerIds.

Explain

$SCRIPTS/create-explainer.sh "<topic>" [mode]
# mode: info (default) | story

# Generate video file (optional)
$SCRIPTS/generate-video.sh "<episode-id>"

TTS

$SCRIPTS/create-tts.sh "<text>" [mode]
# mode: smart (default) | direct

Image Generation

$SCRIPTS/generate-image.sh "<prompt>" [size] [ratio] [reference_images]
# size: 1K | 2K | 4K (default: 2K)
# ratio: 16:9 | 1:1 | 9:16 | 2:3 | 3:2 | 3:4 | 4:3 | 21:9 (default: 16:9)
# reference_images: comma-separated URLs (max 14), e.g. "url1,url2"
#   - Provides visual guidance for style, composition, or content
#   - Supports jpg, png, gif, webp, bmp formats
#   - URLs must be publicly accessible

Check Status

$SCRIPTS/check-status.sh "<episode-id>" <type>
# type: podcast | explainer | tts

Language Adaptation

Automatic Language Detection: Adapt output language based on user input and context.

Detection Rules:

User Input Language: If user writes in Chinese, respond in Chinese. If user writes in English, respond in English.
**Context

五维分析

清晰度8/10

创新性8/10

实用性9/10

完整性8/10

可维护性7/10

优缺点分析

优点

用户友好的界面，命令简单。
支持多种内容格式，包括播客和图像。
自动模式检测，使用方便。

缺点

功能仅限于预定义脚本。
不允许高级用户直接访问API。
操作依赖于外部API密钥。