Back to Skills

segment-anything-model

davila7
Updated 4 days ago
267 views
18,478
1,685
18,478
View on GitHub
MetaMultimodalImage SegmentationComputer VisionSAMZero-Shot

About

The segment-anything-model skill performs zero-shot image segmentation, allowing developers to isolate objects using prompts like points or bounding boxes, or to automatically generate all object masks. It's ideal for building annotation tools, generating training data, or processing images in new domains without task-specific training. Key capabilities include handling interactive prompts and providing strong out-of-the-box performance for various computer vision pipelines.

Quick Install

Claude Code

Recommended
Primary
npx skills add davila7/claude-code-templates -a claude-code
Plugin CommandAlternative
/plugin add https://github.com/davila7/claude-code-templates
Git CloneAlternative
git clone https://github.com/davila7/claude-code-templates.git ~/.claude/skills/segment-anything-model

Copy and paste this command in Claude Code to install this skill

GitHub Repository

davila7/claude-code-templates
Path: cli-tool/components/skills/ai-research/multimodal-segment-anything
0
anthropicanthropic-claudeclaudeclaude-code

Related Skills

blip-2-vision-language

Design

BLIP-2 is a vision-language framework that connects a frozen image encoder with a large language model for multimodal tasks. Use it for zero-shot image captioning, visual question answering, or image-text retrieval without task-specific fine-tuning. It's ideal for developers needing to add state-of-the-art visual understanding to LLM-based applications.

View skill

stable-diffusion-image-generation

Meta

This skill enables text-to-image generation and image manipulation using Stable Diffusion via HuggingFace Diffusers. It supports image generation from prompts, image-to-image translation, inpainting, and custom pipeline creation. Developers should use it when building applications requiring AI-powered visual content generation or editing.

View skill

audiocraft-audio-generation

Meta

This Claude Skill provides text-to-music and text-to-audio generation using Meta's AudioCraft PyTorch library. It enables developers to generate music from descriptions, create sound effects, and perform melody-conditioned music generation. Key capabilities include using the MusicGen and AudioGen models for controllable, high-quality stereo audio output.

View skill

whisper

Other

Whisper is OpenAI's multilingual speech recognition model for transcription and translation across 99 languages. It handles tasks like speech-to-text, podcast transcription, and processing noisy or multilingual audio. Developers should use it for robust, production-ready automatic speech recognition (ASR).

View skill