content-evaluation-framework
About
The `content-evaluation-framework` skill provides a systematic, rubric-based tool for developers to evaluate educational content quality across six weighted categories. It outputs multi-tier assessments and is designed for use during iterative drafting, final reviews, or on-demand quality checks. Key features include quantified scoring with a pass/fail check for constitution compliance.
Quick Install
Claude Code
Recommended/plugin add https://github.com/majiayu000/claude-skill-registrygit clone https://github.com/majiayu000/claude-skill-registry.git ~/.claude/skills/content-evaluation-frameworkCopy and paste this command in Claude Code to install this skill
Documentation
Content Evaluation Framework
This skill provides a comprehensive, systematic rubric for evaluating educational book chapters and lessons with quantifiable quality standards.
Constitution Alignment: v4.0.1 emphasizing:
- Principle 1: Specification Primacy ("Specs Are the New Syntax")
- Section IIa: Panaversity 4-Layer Teaching Method
- Section IIb: AI Three Roles Framework (bidirectional co-learning)
- 8 Foundational Principles: Including Factual Accuracy, Coherent Structure, Progressive Complexity
- Nine Pillars (Section I): AI CLI, Markdown, MCP, AI-First IDEs, Cross-Platform, TDD, SDD, Composable Skills, Cloud-Native
Purpose
Evaluate educational content across 6 weighted categories to ensure:
- Technical correctness and code quality
- Effective pedagogical design and learning outcomes
- Clear, accessible writing for target audience
- Proper structure and organization
- AI-augmented learning principles (learning WITH AI, not generating FROM AI)
- Constitution compliance and standards adherence
When to Use This Skill
Invoke this evaluation framework at multiple checkpoints:
- During Iterative Drafting - Mid-process quality checks to catch issues early
- After Lesson/Chapter Completion - Comprehensive evaluation before moving to next content unit
- On-Demand Review Requests - When user explicitly asks for quality assessment
- Before Validation Phase - Part of the SDD Validate phase workflow for final sign-off
Evaluation Methodology
Scoring System
Multi-Tier Assessment:
- Excellent (90-100%) - Exceeds standards, exemplary quality
- Good (75-89%) - Meets all standards with minor improvements possible
- Needs Work (50-74%) - Meets some standards but requires significant revision
- Insufficient (<50%) - Does not meet minimum standards, requires major rework
Weighted Categories
The evaluation uses 6 categories with the following weights:
| Category | Weight | Focus Area |
|---|---|---|
| Technical Accuracy | 30% | Code correctness, type hints, explanations, examples work as stated |
| Pedagogical Effectiveness | 25% | Show-then-explain pattern, progressive complexity, quality exercises |
| Writing Quality | 20% | Readability (Flesch-Kincaid 8-10), voice, clarity, grade-level appropriateness |
| Structure & Organization | 15% | Learning objectives met, logical flow, appropriate length, transitions |
| AI-First Teaching | 10% | Co-learning partnership demonstrated, Three Roles Framework shown, Nine Pillars aligned, Specs-As-Syntax emphasized |
| Constitution Compliance | Pass/Fail | Must pass all non-negotiable constitutional requirements including Nine Pillars alignment (gate) |
Total Weighted Score Calculation:
Final Score = (Technical × 0.30) + (Pedagogical × 0.25) + (Writing × 0.20) +
(Structure × 0.15) + (AI-First × 0.10)
Constitution Compliance: Must achieve "Pass" status. If "Fail," content cannot proceed regardless of weighted score.
How to Conduct an Evaluation
Step 1: Prepare Context
Before evaluation, gather:
- Content being evaluated (lesson.md, chapter.md, or section file)
- Relevant spec, plan, and tasks files from
specs/<feature>/ - Constitution file (
.specify/memory/constitution.md) - Learning objectives and success criteria for the content unit
- Output style template used (
.claude/output-styles/lesson.mdor similar)
Step 2: Load Detailed Rubric
Read the detailed tier criteria for each category:
Read: references/rubric-details.md
This file contains specific criteria defining Excellent/Good/Needs Work/Insufficient for each of the 6 categories.
Step 3: Evaluate Constitution Compliance First
Constitution compliance is a gate - if content fails constitutional requirements, it cannot proceed.
Use the constitution checklist:
Read: references/constitution-checklist.md
Assess all non-negotiable principles and requirements. Mark as Pass or Fail with specific violations noted.
If Constitution Compliance = Fail: Stop evaluation and report violations immediately. Content must be revised before proceeding.
If Constitution Compliance = Pass: Continue to weighted category evaluation.
Step 4: Score Each Weighted Category
For each of the 5 weighted categories (Technical Accuracy, Pedagogical Effectiveness, Writing Quality, Structure & Organization, AI-First Teaching):
- Review specific criteria from
rubric-details.mdfor that category - Assess content against criteria for each tier
- Assign tier (Excellent/Good/Needs Work/Insufficient) with score range
- Record specific evidence - Quote examples, note line numbers, cite specific passages
- Provide improvement recommendations - Concrete, actionable feedback
Step 5: Calculate Weighted Score
Apply the weighted formula:
Final Score = (Technical × 0.30) + (Pedagogical × 0.25) + (Writing × 0.20) +
(Structure × 0.15) + (AI-First × 0.10)
Convert tier scores to numeric values:
- Excellent: 95%
- Good: 82%
- Needs Work: 62%
- Insufficient: 40%
(Or use specific numeric score within tier range if warranted)
Step 6: Generate Evaluation Report
Use the structured evaluation template:
Read: references/evaluation-template.md
Complete all sections:
- Executive Summary - Overall score, tier, pass/fail status
- Category Scores - Table showing each category score, tier, and weight contribution
- Detailed Findings - Evidence-based assessment for each category
- Strengths - What the content does well (specific examples)
- Areas for Improvement - Prioritized list of issues with recommendations
- Constitution Compliance Status - Pass/Fail with specific principle checks
- Actionable Next Steps - Concrete tasks to improve content
Step 7: Communicate Results
Present evaluation report with:
- Clear verdict - Pass/Fail and overall quality tier
- Evidence-based feedback - Specific quotes and line numbers
- Prioritized improvements - Most critical issues first
- Encouragement - Acknowledge strengths and effort
Evaluation Best Practices
Be Objective and Evidence-Based
- Quote specific passages from content being evaluated
- Reference line numbers or section headers
- Compare against objective rubric criteria, not subjective preference
- Use concrete metrics where possible (word count, readability scores, etc.)
Focus on Standards, Not Perfection
- Content rated "Good" (75-89%) is publication-ready with minor polish
- Content rated "Excellent" (90-100%) exceeds standards but is not required
- Focus improvements on moving "Needs Work" → "Good" before "Good" → "Excellent"
Provide Actionable Feedback
- Don't just say "improve clarity" - specify which sentences are unclear and suggest rewrites
- Don't just say "add examples" - suggest specific example types that would help
- Prioritize recommendations: critical (blocking issues) → important → nice-to-have
Respect the Learning Journey
- Recognize iterative improvement - drafts evolve through multiple passes
- Celebrate progress and strengths
- Frame criticism constructively as opportunities for growth
- Remember: the goal is helping create excellent educational content, not gatekeeping
Quality Gates and Thresholds
Minimum Acceptance Threshold
- Constitution Compliance: MUST be Pass (gate)
- Overall Weighted Score: MUST be ≥ 75% (Good or better)
- No category below 50%: Each individual category must achieve at least "Needs Work" tier
Recommended for Publication
- Constitution Compliance: Pass
- Overall Weighted Score: ≥ 82% (Good tier)
- Technical Accuracy: ≥ 75% (Good tier) - Critical for credibility
- Pedagogical Effectiveness: ≥ 75% (Good tier) - Critical for learning outcomes
Exemplary Content (Optional)
- Overall Weighted Score: ≥ 90% (Excellent tier)
- At least 3 categories at Excellent tier
- No categories below Good tier
Common Evaluation Scenarios
Scenario 1: Mid-Draft Check (Iterative)
Context: Writer requests feedback on partial draft Approach:
- Focus on foundational issues (structure, learning objectives, concept scaffolding)
- Flag critical issues early (technical errors, constitution violations)
- Provide guidance for remaining sections
- Don't expect polish - prioritize content completeness and correctness
Scenario 2: Completion Review
Context: Writer believes content is complete and ready for validation Approach:
- Conduct full evaluation across all 6 categories
- Calculate final weighted score
- Check all quality gates and thresholds
- Provide comprehensive report with prioritized improvements
- Determine if content meets publication standards
Scenario 3: Pre-Validation Quality Gate
Context: Content enters SDD Validate phase Approach:
- Verify constitution compliance (gate)
- Confirm minimum acceptance threshold (≥75%)
- Validate all category scores meet minimums
- Generate pass/fail recommendation with evidence
- If fails gate: return to implementation with specific revision tasks
Scenario 4: On-Demand Spot Check
Context: User asks "How's this looking?" for specific section Approach:
- Evaluate relevant categories for that section (may not be all 6)
- Provide quick feedback on specific concerns
- Highlight any critical issues
- Suggest improvements without full formal report
- Use judgment on depth based on context
Resources and References
This skill includes detailed reference materials:
references/rubric-details.md- Comprehensive tier criteria for all 6 categories with specific indicatorsreferences/constitution-checklist.md- Pass/Fail checklist for constitutional compliance evaluationreferences/evaluation-template.md- Structured template for consistent evaluation reports
Load these references as needed during evaluation to ensure consistency and thoroughness.
Example Evaluation Flow
User Request: "Please evaluate this lesson draft: apps/learn-app/docs/chapter-3/lesson-2.md"
Evaluation Process:
- Read content:
apps/learn-app/docs/chapter-3/lesson-2.md - Load context: spec, plan, constitution, learning objectives
- Check constitution compliance:
references/constitution-checklist.md- Result: Pass (all non-negotiables met)
- Load detailed rubric:
references/rubric-details.md - Evaluate each category:
- Technical Accuracy: Good (80%) - Code works, minor type hint gaps
- Pedagogical Effectiveness: Excellent (92%) - Strong scaffolding, great exercises
- Writing Quality: Good (78%) - Clear writing, minor readability improvements
- Structure & Organization: Good (85%) - Good flow, all LOs met
- AI-First Teaching: Needs Work (65%) - AI exercises present but weak guidance
- Calculate weighted score:
- (80×0.30) + (92×0.25) + (78×0.20) + (85×0.15) + (65×0.10) = 81.55%
- Final Tier: Good (81.55%)
- Load template:
references/evaluation-template.md - Generate report with findings, strengths, improvements, next steps
- Communicate verdict: "Good (81.55%) - Ready for publication with minor improvements to AI-First Teaching section"
Use this skill to maintain consistent, objective, evidence-based quality standards for all educational content.
GitHub Repository
Related Skills
sglang
MetaSGLang is a high-performance LLM serving framework that specializes in fast, structured generation for JSON, regex, and agentic workflows using its RadixAttention prefix caching. It delivers significantly faster inference, especially for tasks with repeated prefixes, making it ideal for complex, structured outputs and multi-turn conversations. Choose SGLang over alternatives like vLLM when you need constrained decoding or are building applications with extensive prefix sharing.
evaluating-llms-harness
TestingThis Claude Skill runs the lm-evaluation-harness to benchmark LLMs across 60+ standardized academic tasks like MMLU and GSM8K. It's designed for developers to compare model quality, track training progress, or report academic results. The tool supports various backends including HuggingFace and vLLM models.
langchain
MetaLangChain is a framework for building LLM applications using agents, chains, and RAG pipelines. It supports multiple LLM providers, offers 500+ integrations, and includes features like tool calling and memory management. Use it for rapid prototyping and deploying production systems like chatbots, autonomous agents, and question-answering services.
llamaguard
OtherLlamaGuard is Meta's 7-8B parameter model for moderating LLM inputs and outputs across six safety categories like violence and hate speech. It offers 94-95% accuracy and can be deployed using vLLM, Hugging Face, or Amazon SageMaker. Use this skill to easily integrate content filtering and safety guardrails into your AI applications.
