Back to Skills

when-profiling-performance-use-performance-profiler

DNYoussef
Updated Today
8 views
0
View on GitHub
Otherperformanceprofilingoptimizationbenchmarkingmece

About

This performance profiling skill helps developers identify and optimize application bottlenecks across CPU, memory, I/O, and network dimensions. It provides comprehensive analysis through baseline measurement, bottleneck detection, and root cause investigation phases. Use this skill when you need systematic performance optimization with tools like perf, Instruments, or clinic.js integration.

Documentation

Performance Profiler Skill

Overview

When profiling performance, use performance-profiler to measure, analyze, and optimize application performance across CPU, memory, I/O, and network dimensions.

MECE Breakdown

Mutually Exclusive Components:

  1. Baseline Phase: Establish current performance metrics
  2. Detection Phase: Identify bottlenecks and hot paths
  3. Analysis Phase: Root cause analysis and impact assessment
  4. Optimization Phase: Generate and prioritize recommendations
  5. Implementation Phase: Apply optimizations with agent assistance
  6. Validation Phase: Benchmark improvements and verify gains

Collectively Exhaustive Coverage:

  • CPU Profiling: Function execution time, hot paths, call graphs
  • Memory Profiling: Heap usage, allocations, leaks, garbage collection
  • I/O Profiling: File system, database, network latency
  • Network Profiling: Request timing, bandwidth, connection pooling
  • Concurrency: Thread utilization, lock contention, async operations
  • Algorithm Analysis: Time complexity, space complexity
  • Cache Analysis: Hit rates, cache misses, invalidation patterns
  • Database: Query performance, N+1 problems, index usage

Features

Core Capabilities:

  • Multi-dimensional performance profiling (CPU, memory, I/O, network)
  • Automated bottleneck detection with prioritization
  • Real-time profiling and historical analysis
  • Flame graph generation for visual analysis
  • Memory leak detection and heap snapshots
  • Database query optimization
  • Algorithmic complexity analysis
  • A/B comparison of before/after optimizations
  • Production-safe profiling with minimal overhead
  • Integration with APM tools (New Relic, DataDog, etc.)

Profiling Modes:

  • Quick Scan: 30-second lightweight profiling
  • Standard: 5-minute comprehensive analysis
  • Deep: 30-minute detailed investigation
  • Continuous: Long-running production monitoring
  • Stress Test: Load-based profiling under high traffic

Usage

Slash Command:

/profile [path] [--mode quick|standard|deep] [--target cpu|memory|io|network|all]

Subagent Invocation:

Task("Performance Profiler", "Profile ./app with deep CPU and memory analysis", "performance-analyzer")

MCP Tool:

mcp__performance-profiler__analyze({
  project_path: "./app",
  profiling_mode: "standard",
  targets: ["cpu", "memory", "io"],
  generate_optimizations: true
})

Architecture

Phase 1: Baseline Measurement

  1. Establish current performance metrics
  2. Define performance budgets
  3. Set up monitoring infrastructure
  4. Capture baseline snapshots

Phase 2: Bottleneck Detection

  1. CPU profiling (sampling or instrumentation)
  2. Memory profiling (heap analysis)
  3. I/O profiling (syscall tracing)
  4. Network profiling (packet analysis)
  5. Database profiling (query logs)

Phase 3: Root Cause Analysis

  1. Correlate metrics across dimensions
  2. Identify causal relationships
  3. Calculate performance impact
  4. Prioritize issues by severity

Phase 4: Optimization Generation

  1. Algorithmic improvements
  2. Caching strategies
  3. Parallelization opportunities
  4. Database query optimization
  5. Memory optimization
  6. Network optimization

Phase 5: Implementation

  1. Generate optimized code with coder agent
  2. Apply database optimizations
  3. Configure caching layers
  4. Implement parallelization

Phase 6: Validation

  1. Run benchmark suite
  2. Compare before/after metrics
  3. Verify no regressions
  4. Generate performance report

Output Formats

Performance Report:

{
  "project": "my-app",
  "profiling_mode": "standard",
  "duration_seconds": 300,
  "baseline": {
    "requests_per_second": 1247,
    "avg_response_time_ms": 123,
    "p95_response_time_ms": 456,
    "p99_response_time_ms": 789,
    "cpu_usage_percent": 67,
    "memory_usage_mb": 512,
    "error_rate_percent": 0.1
  },
  "bottlenecks": [
    {
      "type": "cpu",
      "severity": "high",
      "function": "processData",
      "time_percent": 34.5,
      "calls": 123456,
      "avg_time_ms": 2.3,
      "recommendation": "Optimize algorithm complexity from O(n²) to O(n log n)"
    }
  ],
  "optimizations": [...],
  "estimated_improvement": {
    "throughput_increase": "3.2x",
    "latency_reduction": "68%",
    "memory_reduction": "45%"
  }
}

Flame Graph:

Interactive SVG flame graph showing call stack with time proportions

Heap Snapshot:

Memory allocation breakdown with retention paths

Optimization Report:

Prioritized list of actionable improvements with code examples

Examples

Example 1: Quick CPU Profiling

/profile ./my-app --mode quick --target cpu

Example 2: Deep Memory Analysis

/profile ./my-app --mode deep --target memory --detect-leaks

Example 3: Full Stack Optimization

/profile ./my-app --mode standard --target all --optimize --benchmark

Example 4: Database Query Optimization

/profile ./my-app --mode standard --target io --database --explain-queries

Integration with Claude-Flow

Coordination Pattern:

// Step 1: Initialize profiling swarm
mcp__claude-flow__swarm_init({ topology: "star", maxAgents: 5 })

// Step 2: Spawn specialized agents
[Parallel Execution]:
  Task("CPU Profiler", "Profile CPU usage and identify hot paths in ./app", "performance-analyzer")
  Task("Memory Profiler", "Analyze heap usage and detect memory leaks", "performance-analyzer")
  Task("I/O Profiler", "Profile file system and database operations", "performance-analyzer")
  Task("Network Profiler", "Analyze network requests and identify slow endpoints", "performance-analyzer")
  Task("Optimizer", "Generate optimization recommendations based on profiling data", "optimizer")

// Step 3: Implementation agent applies optimizations
[Sequential Execution]:
  Task("Coder", "Implement recommended optimizations from profiling analysis", "coder")
  Task("Benchmarker", "Run benchmark suite and validate improvements", "performance-benchmarker")

Configuration

Default Settings:

{
  "profiling": {
    "sampling_rate_hz": 99,
    "stack_depth": 128,
    "include_native_code": false,
    "track_allocations": true
  },
  "thresholds": {
    "cpu_hot_path_percent": 10,
    "memory_leak_growth_mb": 10,
    "slow_query_ms": 100,
    "slow_request_ms": 1000
  },
  "optimization": {
    "auto_apply": false,
    "require_approval": true,
    "run_tests_before": true,
    "run_benchmarks_after": true
  },
  "output": {
    "flame_graph": true,
    "heap_snapshot": true,
    "call_tree": true,
    "recommendations": true
  }
}

Profiling Techniques

CPU Profiling:

  • Sampling: Periodic stack sampling (low overhead)
  • Instrumentation: Function entry/exit hooks (accurate but higher overhead)
  • Tracing: Event-based profiling

Memory Profiling:

  • Heap Snapshots: Point-in-time memory state
  • Allocation Tracking: Record all allocations
  • Leak Detection: Compare snapshots over time
  • GC Analysis: Garbage collection patterns

I/O Profiling:

  • Syscall Tracing: Track system calls (strace, dtrace)
  • File System: Monitor read/write operations
  • Database: Query logging and EXPLAIN ANALYZE
  • Network: Packet capture and request timing

Concurrency Profiling:

  • Thread Analysis: CPU utilization per thread
  • Lock Contention: Identify blocking operations
  • Async Operations: Promise/callback timing

Performance Optimization Strategies

Algorithmic:

  • Reduce time complexity (O(n²) → O(n log n))
  • Use appropriate data structures
  • Eliminate unnecessary work
  • Memoization and dynamic programming

Caching:

  • In-memory caching (Redis, Memcached)
  • CDN for static assets
  • HTTP caching headers
  • Query result caching

Parallelization:

  • Multi-threading
  • Worker pools
  • Async I/O
  • Batching operations

Database:

  • Add missing indexes
  • Optimize queries
  • Reduce N+1 queries
  • Connection pooling
  • Read replicas

Memory:

  • Object pooling
  • Reduce allocations
  • Stream processing
  • Compression

Network:

  • Connection keep-alive
  • HTTP/2 or HTTP/3
  • Compression
  • Request batching
  • Rate limiting

Performance Budgets

Frontend:

  • Time to First Byte (TTFB): < 200ms
  • First Contentful Paint (FCP): < 1.8s
  • Largest Contentful Paint (LCP): < 2.5s
  • Time to Interactive (TTI): < 3.8s
  • Total Blocking Time (TBT): < 200ms
  • Cumulative Layout Shift (CLS): < 0.1

Backend:

  • API Response Time (p50): < 100ms
  • API Response Time (p95): < 500ms
  • API Response Time (p99): < 1000ms
  • Throughput: > 1000 req/s
  • Error Rate: < 0.1%
  • CPU Usage: < 70%
  • Memory Usage: < 80%

Database:

  • Query Time (p50): < 10ms
  • Query Time (p95): < 50ms
  • Query Time (p99): < 100ms
  • Connection Pool Utilization: < 80%

Best Practices

  1. Profile production workloads when possible
  2. Use production-like data volumes
  3. Profile under realistic load
  4. Measure multiple times for consistency
  5. Focus on p95/p99, not just averages
  6. Optimize bottlenecks in order of impact
  7. Always benchmark before and after
  8. Monitor for regressions in CI/CD
  9. Set up continuous profiling
  10. Track performance over time

Troubleshooting

Issue: High CPU usage but no obvious hot path

Solution: Check for excessive small function calls, increase sampling rate, or use instrumentation

Issue: Memory grows continuously

Solution: Run heap snapshot comparison to identify leak sources

Issue: Slow database queries

Solution: Use EXPLAIN ANALYZE, check for missing indexes, analyze query plans

Issue: High latency but low CPU

Solution: Profile I/O operations, check for blocking synchronous calls

See Also

  • PROCESS.md - Detailed step-by-step profiling workflow
  • README.md - Quick start guide
  • subagent-performance-profiler.md - Agent implementation details
  • slash-command-profile.sh - Command-line interface
  • mcp-performance-profiler.json - MCP tool schema

Quick Install

/plugin add https://github.com/DNYoussef/ai-chrome-extension/tree/main/when-profiling-performance-use-performance-profiler

Copy and paste this command in Claude Code to install this skill

GitHub 仓库

DNYoussef/ai-chrome-extension
Path: .claude/skills/performance/when-profiling-performance-use-performance-profiler

Related Skills

deepspeed

Design

This skill provides expert guidance for distributed training using Microsoft's DeepSpeed library. It helps developers implement optimization techniques like ZeRO stages, pipeline parallelism, and mixed-precision training. Use this skill when working with DeepSpeed features, debugging code, or learning best practices for large-scale model training.

View skill

when-optimizing-prompts-use-prompt-architect

Other

This skill provides a structured framework for developers to systematically analyze, refine, and optimize prompts for AI systems using evidence-based techniques. It helps eliminate anti-patterns and improve prompt structure, which is triggered by poor response quality or inconsistent outputs. The process includes A/B testing to validate effectiveness and produces an optimized prompt along with an analysis report.

View skill

when-optimizing-agent-learning-use-reasoningbank-intelligence

Other

This skill implements adaptive learning for AI agents using ReasoningBank to recognize patterns, optimize strategies, and enable continuous performance improvement. Use it when you need to enhance agent capabilities for repetitive tasks or strategy refinement. It outputs trained models, pattern libraries, and optimization recommendations with performance benchmarks.

View skill

performance-analysis

Other

This skill provides comprehensive performance analysis for Claude Flow swarms, detecting bottlenecks and profiling operations. It generates detailed reports and offers AI-powered optimization recommendations to improve swarm performance. Use it when you need to monitor, analyze, and optimize the efficiency of your Claude Flow implementations.

View skill