Back to Skills

audiocraft-audio-generation

majiayu000
Updated 13 days ago
18 views
58
9
58
View on GitHub
MetaMultimodalAudio GenerationText-to-MusicText-to-AudioMusicGen

About

This Claude Skill enables audio generation using Meta's AudioCraft library, providing text-to-music (MusicGen) and text-to-sound (AudioGen) capabilities. Developers can use it to generate music from text descriptions, create sound effects, or perform melody-conditioned music generation. It supports stereo audio output and controllable generation with style transfer features.

Quick Install

Claude Code

Recommended
Primary
npx skills add majiayu000/claude-skill-registry -a claude-code
Plugin CommandAlternative
/plugin add https://github.com/majiayu000/claude-skill-registry
Git CloneAlternative
git clone https://github.com/majiayu000/claude-skill-registry.git ~/.claude/skills/audiocraft-audio-generation

Copy and paste this command in Claude Code to install this skill

GitHub Repository

majiayu000/claude-skill-registry
Path: skills/audiocraft
0

Related Skills

blip-2-vision-language

Design

BLIP-2 is a vision-language framework that connects a frozen image encoder with a large language model for multimodal tasks. Use it for zero-shot image captioning, visual question answering, or image-text retrieval without task-specific fine-tuning. It's ideal for developers needing to add state-of-the-art visual understanding to LLM-based applications.

View skill

stable-diffusion-image-generation

Meta

This skill enables text-to-image generation and image manipulation using Stable Diffusion via HuggingFace Diffusers. It supports image generation from prompts, image-to-image translation, inpainting, and custom pipeline creation. Developers should use it when building applications requiring AI-powered visual content generation or editing.

View skill

audiocraft-audio-generation

Meta

This Claude Skill provides text-to-music and text-to-audio generation using Meta's AudioCraft PyTorch library. It enables developers to generate music from descriptions, create sound effects, and perform melody-conditioned music generation. Key capabilities include using the MusicGen and AudioGen models for controllable, high-quality stereo audio output.

View skill

whisper

Other

Whisper is OpenAI's multilingual speech recognition model for transcription and translation across 99 languages. It handles tasks like speech-to-text, podcast transcription, and processing noisy or multilingual audio. Developers should use it for robust, production-ready automatic speech recognition (ASR).

View skill