whisper-transcription
关于
This skill transcribes audio and video files to text using OpenAI's Whisper model. It's ideal for developers needing to generate subtitles, convert podcasts to text, or build searchable audio archives. Key capabilities include extracting quotes from interviews and repurposing multimedia content into written formats.
快速安装
Claude Code
推荐npx skills add guia-matthieu/clawfu-skills -a claude-code/plugin add https://github.com/guia-matthieu/clawfu-skillsgit clone https://github.com/guia-matthieu/clawfu-skills.git ~/.claude/skills/whisper-transcription在 Claude Code 中复制并粘贴此命令以安装该技能
技能文档
Whisper Transcription
Transcribe any audio or video to text using OpenAI's Whisper model - the same technology powering ChatGPT voice features.
When to Use This Skill
- Podcast repurposing - Convert episodes to blog posts, show notes, social snippets
- Video subtitles - Generate SRT/VTT files for YouTube, social media
- Interview extraction - Pull quotes and insights from recorded calls
- Content audit - Make audio/video libraries searchable
- Translation - Transcribe and translate foreign language content
What Claude Does vs What You Decide
| Claude Does | You Decide |
|---|---|
| Structures production workflow | Final creative direction |
| Suggests technical approaches | Equipment and tool choices |
| Creates templates and checklists | Quality standards |
| Identifies best practices | Brand/voice decisions |
| Generates script outlines | Final script approval |
Dependencies
pip install openai-whisper torch ffmpeg-python click
# Also requires ffmpeg installed on system
# macOS: brew install ffmpeg
# Ubuntu: sudo apt install ffmpeg
Commands
Transcribe Single File
python scripts/main.py transcribe audio.mp3 --model medium --output transcript.txt
python scripts/main.py transcribe video.mp4 --format srt --output subtitles.srt
Batch Transcription
python scripts/main.py batch ./recordings/ --format txt --output ./transcripts/
Transcribe + Translate
python scripts/main.py translate foreign-audio.mp3 --to en
Extract Timestamps
python scripts/main.py timestamps podcast.mp3 --format json
Examples
Example 1: Podcast to Blog Post
# Transcribe 1-hour podcast
python scripts/main.py transcribe episode-42.mp3 --model medium
# Output: episode-42.txt (full transcript with timestamps)
# Processing time: ~5 min for 1 hour audio on M1 Mac
Example 2: YouTube Subtitles
# Generate SRT for video upload
python scripts/main.py transcribe marketing-video.mp4 --format srt
# Output: marketing-video.srt
# Upload directly to YouTube/Vimeo
Example 3: Batch Process Interview Library
# Transcribe all recordings in folder
python scripts/main.py batch ./customer-interviews/ --model small --format txt
# Output: ./customer-interviews/*.txt (one per audio file)
Model Selection Guide
| Model | Speed | Accuracy | VRAM | Best For |
|---|---|---|---|---|
tiny | Fastest | ~70% | 1GB | Quick drafts, short clips |
base | Fast | ~80% | 1GB | Social media clips |
small | Medium | ~85% | 2GB | Podcasts, interviews |
medium | Slow | ~90% | 5GB | Professional transcripts |
large | Slowest | ~95% | 10GB | Critical accuracy needs |
Recommendation: Start with small for most marketing content. Use medium for client deliverables.
Output Formats
| Format | Extension | Use Case |
|---|---|---|
txt | .txt | Blog posts, analysis |
srt | .srt | Video subtitles (YouTube) |
vtt | .vtt | Web video subtitles |
json | .json | Programmatic access |
tsv | .tsv | Spreadsheet analysis |
Performance Tips
- GPU acceleration - 10x faster with CUDA GPU
- Audio extraction - Script auto-extracts audio from video
- Chunking - Long files auto-split for memory efficiency
- Language detection - Automatic, or specify with
--language
Skill Boundaries
What This Skill Does Well
- Structuring audio production workflows
- Providing technical guidance
- Creating quality checklists
- Suggesting creative approaches
What This Skill Cannot Do
- Replace audio engineering expertise
- Make subjective creative decisions
- Access or edit audio files directly
- Guarantee commercial success
Related Skills
- video-processing - Extract audio from video
- youtube-downloader - Download videos to transcribe
- content-repurposer - Transform transcripts to content
- podcast-production - Create podcasts
Skill Metadata
- Mode: cyborg
category: automation
subcategory: audio-processing
dependencies: [openai-whisper, torch, ffmpeg-python]
difficulty: beginner
time_saved: 10+ hours/week
GitHub 仓库
相关推荐技能
content-collections
元Content Collections 是一个 TypeScript 优先的构建工具,可将本地 Markdown/MDX 文件转换为类型安全的数据集合。它专为构建博客、文档站和内容密集型 Vite+React 应用而设计,提供基于 Zod 的自动模式验证。该工具涵盖从 Vite 插件配置、MDX 编译到生产环境部署的完整工作流。
polymarket
元这个Claude Skill为开发者提供完整的Polymarket预测市场开发支持,涵盖API调用、交易执行和市场数据分析。关键特性包括实时WebSocket数据流,可监控实时交易、订单和市场动态。开发者可用它构建预测市场应用、实施交易策略并集成实时市场预测功能。
creating-opencode-plugins
元该Skill帮助开发者创建OpenCode插件,用于接入命令、文件、LSP等25+种事件。它提供了插件结构、事件API规范和JavaScript/TypeScript实现模式,适合需要拦截操作、扩展功能或自定义事件处理的场景。开发者可通过它快速构建响应式模块来增强OpenCode AI助手的能力。
sglang
元SGLang是一个专为LLM设计的高性能推理框架,特别适用于需要结构化输出的场景。它通过RadixAttention前缀缓存技术,在处理JSON、正则表达式、工具调用等具有重复前缀的复杂工作流时,能实现极速生成。如果你正在构建智能体或多轮对话系统,并追求远超vLLM的推理性能,SGLang是理想选择。
