llamaguard
About
LlamaGuard is a specialized 7-8B parameter model from Meta for classifying LLM inputs and outputs across six safety categories like violence and hate speech. It offers 94-95% accuracy and integrates with common deployment tools like vLLM and Hugging Face, as well as NeMo Guardrails. Use this skill to add a robust, dedicated moderation layer to filter unsafe content in your AI applications.
Quick Install
Claude Code
Recommendednpx skills add davila7/claude-code-templates -a claude-code/plugin add https://github.com/davila7/claude-code-templatesgit clone https://github.com/davila7/claude-code-templates.git ~/.claude/skills/llamaguardCopy and paste this command in Claude Code to install this skill
GitHub Repository
Related Skills
nemo-guardrails
TestingNeMo Guardrails is a runtime safety framework for LLM applications that adds programmable guardrails. It provides key safety features like jailbreak detection, input/output validation, and hallucination detection using the Colang 2.0 DSL. Use it to enforce safety and compliance rules in production LLM deployments.
clip
OtherCLIP is a vision-language model for zero-shot image classification and cross-modal retrieval, requiring no fine-tuning. It excels at general-purpose tasks like image-text matching, semantic search, and content moderation. Developers can use it for vision-language applications by providing image and text pairs for similarity scoring.
constitutional-ai
OtherConstitutional AI trains models to be harmless using a two-phase method of self-critique/revision and reinforcement learning from AI feedback (RLAIF). It's designed for safety alignment, enabling models to reduce harmful outputs without relying on human labels. Developers can use this skill to implement the core safety system that powers Claude.
constitutional-ai
OtherThis skill implements Anthropic's Constitutional AI method for training harmless AI models through self-critique and revision. It provides a two-phase approach using supervised learning with AI self-critique followed by RLAIF (Reinforcement Learning from AI Feedback) for safety alignment. Use it to reduce harmful outputs in your Claude applications without requiring human-labeled harmful data.
