Back to Skills

context-engineering

majiayu000
Updated Yesterday
58
9
58
View on GitHub
Testingai

About

This skill helps developers monitor and optimize Claude's context window usage to prevent failures and reduce costs. It provides tools for checking usage limits, debugging issues, and implementing efficient memory or agent architectures. Use it when building LLM pipelines where context constraints impact performance or latency.

Quick Install

Claude Code

Recommended
Plugin CommandRecommended
/plugin add https://github.com/majiayu000/claude-skill-registry
Git CloneAlternative
git clone https://github.com/majiayu000/claude-skill-registry.git ~/.claude/skills/context-engineering

Copy and paste this command in Claude Code to install this skill

Documentation

Context Engineering

Context engineering curates the smallest high-signal token set for LLM tasks. The goal: maximize reasoning quality while minimizing token usage.

When to Activate

  • Designing/debugging agent systems
  • Context limits constrain performance
  • Optimizing cost/latency
  • Building multi-agent coordination
  • Implementing memory systems
  • Evaluating agent performance
  • Developing LLM-powered pipelines

Core Principles

  1. Context quality > quantity - High-signal tokens beat exhaustive content
  2. Attention is finite - U-shaped curve favors beginning/end positions
  3. Progressive disclosure - Load information just-in-time
  4. Isolation prevents degradation - Partition work across sub-agents
  5. Measure before optimizing - Know your baseline

IMPORTANT:

  • Sacrifice grammar for the sake of concision.
  • Ensure token efficiency while maintaining high quality.
  • Pass these rules to subagents.

Quick Reference

TopicWhen to UseReference
FundamentalsUnderstanding context anatomy, attention mechanicscontext-fundamentals.md
DegradationDebugging failures, lost-in-middle, poisoningcontext-degradation.md
OptimizationCompaction, masking, caching, partitioningcontext-optimization.md
CompressionLong sessions, summarization strategiescontext-compression.md
MemoryCross-session persistence, knowledge graphsmemory-systems.md
Multi-AgentCoordination patterns, context isolationmulti-agent-patterns.md
EvaluationTesting agents, LLM-as-Judge, metricsevaluation.md
Tool DesignTool consolidation, description engineeringtool-design.md
PipelinesProject development, batch processingproject-development.md
Runtime AwarenessUsage limits, context window monitoringruntime-awareness.md

Key Metrics

  • Token utilization: Warning at 70%, trigger optimization at 80%
  • Token variance: Explains 80% of agent performance variance
  • Multi-agent cost: ~15x single agent baseline
  • Compaction target: 50-70% reduction, <5% quality loss
  • Cache hit target: 70%+ for stable workloads

Four-Bucket Strategy

  1. Write: Save context externally (scratchpads, files)
  2. Select: Pull only relevant context (retrieval, filtering)
  3. Compress: Reduce tokens while preserving info (summarization)
  4. Isolate: Split across sub-agents (partitioning)

Anti-Patterns

  • Exhaustive context over curated context
  • Critical info in middle positions
  • No compaction triggers before limits
  • Single agent for parallelizable tasks
  • Tools without clear descriptions

Guidelines

  1. Place critical info at beginning/end of context
  2. Implement compaction at 70-80% utilization
  3. Use sub-agents for context isolation, not role-play
  4. Design tools with 4-question framework (what, when, inputs, returns)
  5. Optimize for tokens-per-task, not tokens-per-request
  6. Validate with probe-based evaluation
  7. Monitor KV-cache hit rates in production
  8. Start minimal, add complexity only when proven necessary

Runtime Awareness

The system automatically injects usage awareness via PostToolUse hook:

<usage-awareness>
Claude Usage Limits: 5h=45%, 7d=32%
Context Window Usage: 67%
</usage-awareness>

Thresholds:

  • 70%: WARNING - consider optimization/compaction
  • 90%: CRITICAL - immediate action needed

Data Sources:

  • Usage limits: Anthropic OAuth API (https://api.anthropic.com/api/oauth/usage)
  • Context window: Statusline temp file (/tmp/ck-context-{session_id}.json)

Scripts

GitHub Repository

majiayu000/claude-skill-registry
Path: skills/context-engineering

Related Skills

sglang

Meta

SGLang is a high-performance LLM serving framework that specializes in fast, structured generation for JSON, regex, and agentic workflows using its RadixAttention prefix caching. It delivers significantly faster inference, especially for tasks with repeated prefixes, making it ideal for complex, structured outputs and multi-turn conversations. Choose SGLang over alternatives like vLLM when you need constrained decoding or are building applications with extensive prefix sharing.

View skill

evaluating-llms-harness

Testing

This Claude Skill runs the lm-evaluation-harness to benchmark LLMs across 60+ standardized academic tasks like MMLU and GSM8K. It's designed for developers to compare model quality, track training progress, or report academic results. The tool supports various backends including HuggingFace and vLLM models.

View skill

langchain

Meta

LangChain is a framework for building LLM applications using agents, chains, and RAG pipelines. It supports multiple LLM providers, offers 500+ integrations, and includes features like tool calling and memory management. Use it for rapid prototyping and deploying production systems like chatbots, autonomous agents, and question-answering services.

View skill

llamaguard

Other

LlamaGuard is Meta's 7-8B parameter model for moderating LLM inputs and outputs across six safety categories like violence and hate speech. It offers 94-95% accuracy and can be deployed using vLLM, Hugging Face, or Amazon SageMaker. Use this skill to easily integrate content filtering and safety guardrails into your AI applications.

View skill