Co-Pilot / 辅助式

更新于 5 months ago

video-agent-skill

Name: video-agent-skill
Rating: 4.0 (40 reviews)
Author: donghaozhang

Ddonghaozhang

0.0k

donghaozhang/video-agent-skill

Agent 评分

💡 摘要

一个全面的AI内容生成套件，提供多种文本、图像和视频处理模型。

🎯 适合人群

希望自动化视频制作的内容创作者将AI模型集成到应用程序中的开发者需要快速视觉内容生成的营销人员创建引人入胜的多媒体材料的教育工作者探索AI在媒体生成中能力的研究人员

🤖 AI 吐槽: “这个README就像一把瑞士军刀——有用，但有时有点让人不知所措。”

安全分析中风险

README中提到使用多个服务的API密钥，如果管理不当会带来风险。确保使用环境变量，并避免硬编码敏感信息。

AI Content Generation Suite

A comprehensive AI content generation package with multiple providers and services, consolidated into a single installable package.

⚡ Production-ready Python package with comprehensive CLI, parallel execution, and enterprise-grade architecture

🎬 Demo Video

Click to watch the complete demo of AI Content Generation Suite in action

🎨 Available AI Models

40+ AI models across 8 categories - showing top picks below. See full models reference for complete list.

Text-to-Image (Top Picks)

| Model | Cost | Best For | |-------|------|----------| | nano_banana_pro | $0.002 | Fast & high-quality | | gpt_image_1_5 | $0.003 | GPT-powered generation |

Image-to-Video (Top Picks)

| Model | Cost | Best For | |-------|------|----------| | sora_2 | $0.40-1.20 | OpenAI quality | | kling_2_6_pro | $0.50-1.00 | Professional quality |

Text-to-Video (Top Picks)

| Model | Cost | Best For | |-------|------|----------| | sora_2 | $0.40-1.20 | OpenAI quality | | kling_2_6_pro | $0.35-1.40 | Quality + audio |

💡 Cost-Saving Tip: Use --mock flag for FREE validation: ai-content-pipeline generate-image --text "test" --mock

📚 View all 40+ models →

🏷️ Latest Release

What's New in v1.0.18

✅ Automated PyPI publishing via GitHub Actions
🔧 Consolidated setup files for cleaner package structure
🎯 All 40+ AI models with comprehensive parallel processing support
📦 Improved CI/CD workflow with skip-existing option

🚀 FLAGSHIP: AI Content Pipeline

The unified AI content generation pipeline with parallel execution support, multi-model integration, and YAML-based configuration.

Core Capabilities

🔄 Unified Pipeline Architecture - YAML/JSON-based configuration for complex multi-step workflows
⚡ Parallel Execution Engine - 2-3x performance improvement with thread-based parallel processing
🎯 Type-Safe Configuration - Pydantic models with comprehensive validation
💰 Cost Management - Real-time cost estimation and tracking across all services
📊 Rich Logging - Beautiful console output with progress tracking and performance metrics

AI Service Integrations

🖼️ FAL AI - Text-to-image, image-to-image, text-to-video, video generation, avatar creation
🗣️ ElevenLabs - Professional text-to-speech with 20+ voice options
🎥 Google Vertex AI - Veo video generation and Gemini text generation
🔗 OpenRouter - Alternative TTS and chat completion services

Developer Experience

🛠️ Professional CLI - Comprehensive command-line interface with Click
📦 Modular Architecture - Clean separation of concerns with extensible design
🧪 Comprehensive Testing - Unit and integration tests with pytest
📚 Type Hints - Full type coverage for excellent IDE support

📦 Installation

Quick Start

# Install from PyPI
pip install video-ai-studio

# Or install in development mode
pip install -e .

🔑 API Keys Setup

After installation, you need to configure your API keys:

Download the example configuration:

# Option 1: Download from GitHub
curl -o .env https://raw.githubusercontent.com/donghaozhang/video-agent-skill/main/.env.example

# Option 2: Create manually
touch .env

Add your API keys to .env:

# Required for most functionality
FAL_KEY=your_fal_api_key_here

# Optional - add as needed
GEMINI_API_KEY=your_gemini_api_key_here
OPENROUTER_API_KEY=your_openrouter_api_key_here
ELEVENLABS_API_KEY=your_elevenlabs_api_key_here

Get API keys from:
- FAL AI: https://fal.ai/dashboard (required for most models)
- Google Gemini: https://makersuite.google.com/app/apikey
- OpenRouter: https://openrouter.ai/keys
- ElevenLabs: https://elevenlabs.io/app/settings

📋 Dependencies

The package installs core dependencies automatically. See requirements.txt for the complete list.

🛠️ Quick Start

Console Commands

# List all available AI models
ai-content-pipeline list-models

# Generate image from text
ai-content-pipeline generate-image --text "epic space battle" --model flux_dev

# Create video (text → image → video)
ai-content-pipeline create-video --text "serene mountain lake"

# Run custom pipeline from YAML config
ai-content-pipeline run-chain --config config.yaml --input "cyberpunk city"

# Create example configurations
ai-content-pipeline create-examples

# Shortened command alias
aicp --help

Python API

from packages.core.ai_content_pipeline.pipeline.manager import AIPipelineManager

# Initialize manager
manager = AIPipelineManager()

# Quick video creation
result = manager.quick_create_video(
    text="serene mountain lake",
    image_model="flux_dev",
    video_model="auto"
)

# Run custom chain
chain = manager.create_chain_from_config("config.yaml")
result = manager.execute_chain(chain, "input text")

📚 Package Structure

Core Packages

ai_content_pipeline - Main unified pipeline with parallel execution

Provider Packages

Google Services

google-veo - Google Veo video generation (Vertex AI)

FAL AI Services

fal-video - Video generation (MiniMax Hailuo-02, Kling Video 2.1)
fal-text-to-video - Text-to-video (Hailuo Pro, Veo 3, Kling v2.6 Pro, Sora 2/Pro)
fal-image-to-video - Image-to-video (Veo 3, Hailuo, Kling, Wan v2.6)
fal-avatar - Avatar generation with TTS integration
fal-text-to-image - Text-to-image (Imagen 4, Seedream v3, FLUX.1)
fal-image-to-image - Image transformation (Luma Photon Flash)
fal-video-to-video - Video processing (ThinksSound + Topaz)

Service Packages

text-to-speech - ElevenLabs TTS integration (20+ voices)
video-tools - Video processing utilities with AI analysis

🔧 Configuration

Environment Setup

Create a .env file in the project root:

# FAL AI API Configuration
FAL_KEY=your_fal_api_key

# Google Cloud Configuration (for Veo)
PROJECT_ID=your-project-id
OUTPUT_BUCKET_PATH=gs://your-bucket/veo_output/

# ElevenLabs Configuration
ELEVENLABS_API_KEY=your_elevenlabs_api_key

# Optional: Gemini for AI analysis
GEMINI_API_KEY=your_gemini_api_key

# Optional: OpenRouter for additional models
OPENROUTER_API_KEY=your_openrouter_api_key

YAML Pipeline Configuration

name: "Text to Video Pipeline"
description: "Generate video from text prompt"
steps:
  - name: "generate_image"
    type: "text_to_image"
    model: "flux_dev"
    aspect_ratio: "16:9"
    
  - name: "create_video"
    type: "image_to_video"
    model: "kling_video"
    input_from: "generate_image"
    duration: 8

Parallel Execution

Enable parallel processing for 2-3x speedup:

# Enable parallel execution
PIPELINE_PARALLEL_ENABLED=true ai-content-pipeline run-chain --config config.yaml

Example parallel pipeline configuration:

name: "Parallel Processing Example"
steps:
  - type: "parallel_group"
    steps:
      - type: "text_to_image"
        model: "flux_schnell"
        params:
          prompt: "A cat"
      - type: "text_to_image"
        model: "flux_schnell"
        params:
          prompt: "A dog"
      - type: "text_to_image"
        model: "flux_schnell"
        params:
          prompt: "A bird"

💰 Cost Management

Cost Estimation

Always estimate costs before running pipelines:

# Estimate cost for a pipeline
ai-content-pipeline estimate-cost --config config.yaml

Typical Costs

Text-to-Image: $0.001-0.004 per image
Image-to-Image: $0.01-0.05 per modification
Text-to-Video: $0.08-6.00 per video (model dependent)
Avatar Generation: $0.02-0.05 per video
Text-to-Speech: Varies by usage (ElevenLabs pricing)
Video Processing: $0.05-2.50 per video (model dependent)

Cost-Conscious Usage

Use cheaper models for prototyping (flux_schnell, hailuo)
Test with small batches before large-scale generation
Monitor costs with built-in tracking

🧪 Testing

# Quick tests
python tests/run_all_tests.py --quick

📋 See tests/README.md for complete testing guide.

💰 Cost Management

Estimation

FAL AI Video: ~$0.05-0.10 per video
FAL AI Text-to-Video: ~$0.08 (MiniMax) to $2.50-6.00 (Google Veo 3)
FAL AI Avatar: ~$0.02-0.05 per video
FAL AI Images: ~$0.001-0.01 per image
Text-to-Speech: Varies by usage (ElevenLabs pricing)

Best Practices

Always run test_setup.py first (FREE)
Use cost estimation in pipeline manager
Start wi

五维分析

清晰度8/10

创新性7/10

实用性9/10

完整性8/10

可维护性8/10

优缺点分析

优点

支持多种AI模型以生成多样化内容
提供并行处理以提高性能
全面的命令行界面，易于使用

缺点

设置过程复杂，需要多个API密钥
根据使用情况可能会产生高昂的费用
新用户学习曲线陡峭

video-agent-skill

💡 摘要

🎯 适合人群

AI Content Generation Suite

🎬 Demo Video

🎨 Available AI Models

Text-to-Image (Top Picks)

Image-to-Video (Top Picks)

Text-to-Video (Top Picks)

🏷️ Latest Release

What's New in v1.0.18

🚀 FLAGSHIP: AI Content Pipeline

Core Capabilities

AI Service Integrations

Developer Experience

📦 Installation

Quick Start

🔑 API Keys Setup

📋 Dependencies

🛠️ Quick Start

Console Commands

Python API

📚 Package Structure

Core Packages

Provider Packages

Google Services

FAL AI Services

Service Packages

🔧 Configuration

Environment Setup

YAML Pipeline Configuration

Parallel Execution

💰 Cost Management

Cost Estimation

Typical Costs

Cost-Conscious Usage

🧪 Testing

💰 Cost Management

Estimation

Best Practices

优点

缺点

相关技能

pytorch

agno

nuxt-skills