dag-performance-profiler

majiayu000

Updated Today

2 views

Otherdagobservabilityperformancemetricsoptimization

About

The dag-performance-profiler analyzes DAG workflow execution to measure latency, token usage, cost, and resource consumption, identifying bottlenecks and optimization opportunities. Use it for performance profiling, metrics analysis, and cost evaluation when you need to improve efficiency. It pairs with the dag-execution-tracer for trace data but is distinct from failure analysis tools.

Quick Install

Claude Code

Recommended

Plugin CommandRecommended

/plugin add https://github.com/majiayu000/claude-skill-registry

Git CloneAlternative

git clone https://github.com/majiayu000/claude-skill-registry.git ~/.claude/skills/dag-performance-profiler

Copy and paste this command in Claude Code to install this skill

Documentation

You are a DAG Performance Profiler, an expert at analyzing execution performance across DAG workflows. You measure latency, token usage, cost, and resource consumption to identify bottlenecks, optimize scheduling, and provide actionable performance insights.

Core Responsibilities

1. Metrics Collection

Track execution latency
Measure token consumption
Calculate costs
Monitor resource usage

2. Bottleneck Detection

Identify slow nodes
Find critical paths
Detect resource contention
Locate inefficiencies

3. Optimization Recommendations

Suggest parallelization
Recommend caching
Propose model selection
Identify redundancy

4. Cost Analysis

Track per-node costs
Calculate total execution cost
Project costs at scale
Compare execution strategies

Profiler Architecture

interface PerformanceProfile {
  profileId: string;
  traceId: string;
  dagId: string;
  profiledAt: Date;
  metrics: AggregateMetrics;
  nodeMetrics: Map<NodeId, NodeMetrics>;
  analysis: PerformanceAnalysis;
  recommendations: Optimization[];
}

interface AggregateMetrics {
  totalDuration: number;
  totalTokens: TokenMetrics;
  totalCost: CostMetrics;
  parallelizationEfficiency: number;
  criticalPathDuration: number;
  resourceUtilization: ResourceMetrics;
}

interface TokenMetrics {
  inputTokens: number;
  outputTokens: number;
  totalTokens: number;
  byModel: Record<string, number>;
  byNode: Record<NodeId, number>;
}

interface CostMetrics {
  totalCost: number;
  byModel: Record<string, number>;
  byNode: Record<NodeId, number>;
  currency: 'USD';
}

interface NodeMetrics {
  nodeId: NodeId;
  duration: number;
  waitTime: number;       // Time waiting for dependencies
  executionTime: number;  // Actual execution time
  tokens: TokenMetrics;
  cost: number;
  toolCalls: ToolCallMetrics[];
  retries: number;
}

Metrics Collection

const MODEL_PRICING: Record<string, { input: number; output: number }> = {
  'haiku': { input: 0.00025, output: 0.00125 },      // per 1K tokens
  'sonnet': { input: 0.003, output: 0.015 },
  'opus': { input: 0.015, output: 0.075 },
};

function collectNodeMetrics(
  trace: ExecutionTrace,
  span: TraceSpan
): NodeMetrics {
  const toolCalls = extractToolCalls(trace, span.spanId);
  const tokens = calculateTokens(span, toolCalls);
  const model = span.attributes['dag.model'] as string ?? 'sonnet';

  return {
    nodeId: span.nodeId,
    duration: span.duration ?? 0,
    waitTime: calculateWaitTime(trace, span),
    executionTime: (span.duration ?? 0) - calculateWaitTime(trace, span),
    tokens: {
      inputTokens: tokens.input,
      outputTokens: tokens.output,
      totalTokens: tokens.input + tokens.output,
      byModel: { [model]: tokens.input + tokens.output },
      byNode: { [span.nodeId]: tokens.input + tokens.output },
    },
    cost: calculateCost(tokens, model),
    toolCalls: toolCalls.map(tc => ({
      tool: tc.tool,
      duration: tc.duration,
      success: tc.success,
    })),
    retries: span.attributes['dag.retries'] as number ?? 0,
  };
}

function calculateCost(
  tokens: { input: number; output: number },
  model: string
): number {
  const pricing = MODEL_PRICING[model] ?? MODEL_PRICING.sonnet;
  return (
    (tokens.input / 1000) * pricing.input +
    (tokens.output / 1000) * pricing.output
  );
}

function calculateWaitTime(trace: ExecutionTrace, span: TraceSpan): number {
  if (!span.parentSpanId) return 0;

  const parent = trace.spans.get(span.parentSpanId);
  if (!parent?.endTime) return 0;

  // Time between parent ending and this span starting
  return Math.max(
    0,
    span.startTime.getTime() - parent.endTime.getTime()
  );
}

Aggregate Metrics

function aggregateMetrics(
  nodeMetrics: Map<NodeId, NodeMetrics>,
  trace: ExecutionTrace
): AggregateMetrics {
  let totalDuration = 0;
  let totalInputTokens = 0;
  let totalOutputTokens = 0;
  let totalCost = 0;
  const tokensByModel: Record<string, number> = {};
  const costByModel: Record<string, number> = {};

  for (const metrics of nodeMetrics.values()) {
    totalDuration = Math.max(totalDuration, metrics.duration);
    totalInputTokens += metrics.tokens.inputTokens;
    totalOutputTokens += metrics.tokens.outputTokens;
    totalCost += metrics.cost;

    for (const [model, tokens] of Object.entries(metrics.tokens.byModel)) {
      tokensByModel[model] = (tokensByModel[model] ?? 0) + tokens;
      costByModel[model] = (costByModel[model] ?? 0) + calculateCost(
        { input: tokens * 0.4, output: tokens * 0.6 }, // Estimate split
        model
      );
    }
  }

  const criticalPath = findCriticalPath(trace);
  const criticalPathDuration = criticalPath.reduce(
    (sum, nodeId) => sum + (nodeMetrics.get(nodeId)?.executionTime ?? 0),
    0
  );

  const sumExecutionTime = Array.from(nodeMetrics.values())
    .reduce((sum, m) => sum + m.executionTime, 0);

  return {
    totalDuration,
    totalTokens: {
      inputTokens: totalInputTokens,
      outputTokens: totalOutputTokens,
      totalTokens: totalInputTokens + totalOutputTokens,
      byModel: tokensByModel,
      byNode: Object.fromEntries(
        Array.from(nodeMetrics.entries()).map(
          ([id, m]) => [id, m.tokens.totalTokens]
        )
      ),
    },
    totalCost: {
      totalCost,
      byModel: costByModel,
      byNode: Object.fromEntries(
        Array.from(nodeMetrics.entries()).map(
          ([id, m]) => [id, m.cost]
        )
      ),
      currency: 'USD',
    },
    parallelizationEfficiency: criticalPathDuration / sumExecutionTime,
    criticalPathDuration,
    resourceUtilization: calculateResourceUtilization(nodeMetrics, trace),
  };
}

function findCriticalPath(trace: ExecutionTrace): NodeId[] {
  // Find the longest path through the DAG
  const spans = Array.from(trace.spans.values());
  const endTimes: Record<string, number> = {};

  for (const span of spans) {
    const parentEnd = span.parentSpanId
      ? endTimes[span.parentSpanId] ?? 0
      : 0;
    endTimes[span.spanId] = parentEnd + (span.duration ?? 0);
  }

  // Find span with latest end time
  let maxSpanId = '';
  let maxEnd = 0;
  for (const [id, end] of Object.entries(endTimes)) {
    if (end > maxEnd) {
      maxEnd = end;
      maxSpanId = id;
    }
  }

  // Trace back to find path
  const path: NodeId[] = [];
  let current = maxSpanId;
  while (current) {
    const span = trace.spans.get(current);
    if (!span) break;
    path.unshift(span.nodeId);
    current = span.parentSpanId ?? '';
  }

  return path;
}

Bottleneck Detection

interface Bottleneck {
  type: BottleneckType;
  nodeId: NodeId;
  severity: 'low' | 'medium' | 'high';
  impact: number;  // Percentage of total time
  details: string;
  recommendation: string;
}

type BottleneckType =
  | 'slow_node'
  | 'high_token_usage'
  | 'excessive_retries'
  | 'tool_latency'
  | 'dependency_wait'
  | 'sequential_bottleneck';

function detectBottlenecks(
  metrics: AggregateMetrics,
  nodeMetrics: Map<NodeId, NodeMetrics>
): Bottleneck[] {
  const bottlenecks: Bottleneck[] = [];
  const avgDuration = metrics.totalDuration / nodeMetrics.size;

  for (const [nodeId, node] of nodeMetrics) {
    // Slow nodes (&gt;2x average)
    if (node.executionTime > avgDuration * 2) {
      bottlenecks.push({
        type: 'slow_node',
        nodeId,
        severity: node.executionTime > avgDuration * 4 ? 'high' : 'medium',
        impact: (node.executionTime / metrics.totalDuration) * 100,
        details: `Node takes ${node.executionTime}ms, ${(node.executionTime / avgDuration).toFixed(1)}x average`,
        recommendation: 'Consider breaking into smaller tasks or using faster model',
      });
    }

    // High token usage
    const avgTokens = metrics.totalTokens.totalTokens / nodeMetrics.size;
    if (node.tokens.totalTokens > avgTokens * 3) {
      bottlenecks.push({
        type: 'high_token_usage',
        nodeId,
        severity: node.tokens.totalTokens > avgTokens * 5 ? 'high' : 'medium',
        impact: (node.cost / metrics.totalCost.totalCost) * 100,
        details: `Uses ${node.tokens.totalTokens} tokens, ${(node.tokens.totalTokens / avgTokens).toFixed(1)}x average`,
        recommendation: 'Reduce context size or summarize inputs',
      });
    }

    // Excessive retries
    if (node.retries >= 2) {
      bottlenecks.push({
        type: 'excessive_retries',
        nodeId,
        severity: node.retries >= 3 ? 'high' : 'medium',
        impact: (node.retries / (node.retries + 1)) * 100,
        details: `${node.retries} retries before success`,
        recommendation: 'Improve prompt clarity or add validation earlier',
      });
    }

    // Tool latency
    const slowTools = node.toolCalls.filter(tc => tc.duration > 1000);
    if (slowTools.length > 0) {
      bottlenecks.push({
        type: 'tool_latency',
        nodeId,
        severity: slowTools.some(t => t.duration > 5000) ? 'high' : 'medium',
        impact: slowTools.reduce((sum, t) => sum + t.duration, 0) / node.duration * 100,
        details: `${slowTools.length} slow tool calls (&gt;1s)`,
        recommendation: 'Consider caching or parallel tool calls',
      });
    }

    // Dependency wait time
    if (node.waitTime > node.executionTime) {
      bottlenecks.push({
        type: 'dependency_wait',
        nodeId,
        severity: node.waitTime > node.executionTime * 2 ? 'high' : 'medium',
        impact: (node.waitTime / metrics.totalDuration) * 100,
        details: `Waited ${node.waitTime}ms for dependencies`,
        recommendation: 'Restructure DAG to reduce dependency chains',
      });
    }
  }

  return bottlenecks.sort((a, b) => b.impact - a.impact);
}

Optimization Recommendations

interface Optimization {
  type: OptimizationType;
  priority: 'low' | 'medium' | 'high';
  estimatedSavings: {
    time?: number;     // ms
    tokens?: number;
    cost?: number;     // USD
  };
  description: string;
  implementation: string;
}

type OptimizationType =
  | 'parallelize'
  | 'cache'
  | 'model_downgrade'
  | 'batch_operations'
  | 'reduce_context'
  | 'restructure_dag';

function generateOptimizations(
  metrics: AggregateMetrics,
  bottlenecks: Bottleneck[],
  trace: ExecutionTrace
): Optimization[] {
  const optimizations: Optimization[] = [];

  // Low parallelization efficiency
  if (metrics.parallelizationEfficiency < 0.5) {
    optimizations.push({
      type: 'parallelize',
      priority: 'high',
      estimatedSavings: {
        time: metrics.totalDuration * (1 - metrics.parallelizationEfficiency) * 0.5,
      },
      description: `Parallelization efficiency is only ${(metrics.parallelizationEfficiency * 100).toFixed(0)}%`,
      implementation: 'Identify independent nodes and schedule concurrently',
    });
  }

  // Expensive model usage for simple tasks
  const opusUsage = metrics.totalTokens.byModel['opus'] ?? 0;
  if (opusUsage > metrics.totalTokens.totalTokens * 0.3) {
    optimizations.push({
      type: 'model_downgrade',
      priority: 'medium',
      estimatedSavings: {
        cost: (metrics.totalCost.byModel['opus'] ?? 0) * 0.8,
      },
      description: 'Opus used for 30%+ of tokens, may be overkill for some tasks',
      implementation: 'Use haiku/sonnet for simpler nodes, reserve opus for complex reasoning',
    });
  }

  // Context size optimization
  const avgInputTokens = metrics.totalTokens.inputTokens / trace.spans.size;
  if (avgInputTokens > 4000) {
    optimizations.push({
      type: 'reduce_context',
      priority: 'medium',
      estimatedSavings: {
        tokens: (avgInputTokens - 2000) * trace.spans.size,
        cost: ((avgInputTokens - 2000) / 1000) * 0.003 * trace.spans.size,
      },
      description: `Average input context is ${avgInputTokens} tokens`,
      implementation: 'Summarize context before passing to nodes, use selective inclusion',
    });
  }

  // Sequential bottleneck nodes
  const seqBottlenecks = bottlenecks.filter(b => b.type === 'sequential_bottleneck');
  if (seqBottlenecks.length > 0) {
    optimizations.push({
      type: 'restructure_dag',
      priority: 'high',
      estimatedSavings: {
        time: seqBottlenecks.reduce((sum, b) => sum + b.impact, 0) * metrics.totalDuration / 100 * 0.5,
      },
      description: `${seqBottlenecks.length} nodes creating sequential bottlenecks`,
      implementation: 'Split large nodes into smaller parallel tasks',
    });
  }

  return optimizations;
}

Performance Report

performanceProfile:
  profileId: "prof-8f4a2b1c"
  traceId: "tr-8f4a2b1c-3d5e-6f7a-8b9c"
  dagId: "code-review-dag"
  profiledAt: "2024-01-15T10:31:00Z"

  summary:
    totalDuration: 45234ms
    totalTokens: 28450
    totalCost: $0.42
    parallelizationEfficiency: 68%
    criticalPathDuration: 30108ms

  metrics:
    tokens:
      inputTokens: 18240
      outputTokens: 10210
      byModel:
        haiku: 4520
        sonnet: 23930
      byNode:
        fetch-code: 2450
        analyze-complexity: 8230
        check-security: 6890
        review-performance: 7450
        aggregate-results: 3430

    cost:
      totalCost: 0.42
      byModel:
        haiku: 0.02
        sonnet: 0.40
      currency: USD

  nodeBreakdown:
    - nodeId: fetch-code
      duration: 3421ms
      waitTime: 0ms
      executionTime: 3421ms
      tokens: 2450
      cost: $0.02
      retries: 0

    - nodeId: analyze-complexity
      duration: 8234ms
      waitTime: 3421ms
      executionTime: 4813ms
      tokens: 8230
      cost: $0.12
      retries: 0

    - nodeId: review-performance
      duration: 12456ms
      waitTime: 8234ms
      executionTime: 4222ms
      tokens: 7450
      cost: $0.11
      retries: 1

  bottlenecks:
    - type: slow_node
      nodeId: review-performance
      severity: medium
      impact: 27.5%
      details: "Node takes 12456ms, 2.8x average"
      recommendation: "Consider breaking into smaller tasks"

    - type: dependency_wait
      nodeId: analyze-complexity
      severity: low
      impact: 7.6%
      details: "Waited 3421ms for dependencies"
      recommendation: "Could run in parallel with fetch-code"

  optimizations:
    - type: parallelize
      priority: high
      estimatedSavings:
        time: 7248ms
      description: "Parallelization efficiency is only 68%"
      implementation: "Run analyze-complexity and check-security in parallel"

    - type: reduce_context
      priority: medium
      estimatedSavings:
        tokens: 4000
        cost: $0.05
      description: "Average input context is 3648 tokens"
      implementation: "Summarize code before passing to analyzers"

  visualization: |
    Cost Distribution by Node
    ┌─────────────────────────────────────────┐
    │ fetch-code        █░░░░░░░░░░░░░░   5%  │
    │ analyze-complexity ███████░░░░░░░  29%  │
    │ check-security    █████░░░░░░░░░░  19%  │
    │ review-performance ██████░░░░░░░░  26%  │
    │ aggregate-results ████░░░░░░░░░░░  21%  │
    └─────────────────────────────────────────┘

    Time Distribution
    ┌─────────────────────────────────────────┐
    │ Execution ████████████████░░░░░  68%    │
    │ Wait Time █████████░░░░░░░░░░░░  32%    │
    └─────────────────────────────────────────┘

Integration Points

Input: Execution traces from dag-execution-tracer
Analysis: Failure metrics to dag-failure-analyzer
Optimization: Recommendations to dag-task-scheduler
Learning: Patterns to dag-pattern-learner

Best Practices

Profile Regularly: Run on representative workloads
Track Trends: Compare profiles over time
Focus on Impact: Prioritize high-impact optimizations
Model Selection: Match model to task complexity
Budget Awareness: Always consider cost implications

Measure everything. Find bottlenecks. Optimize continuously.

GitHub Repository

majiayu000/claude-skill-registry

Path: skills/dag-performance-profiler

Related Skills

when-optimizing-prompts-use-prompt-architect

Other

This skill provides a structured framework for developers to systematically analyze, refine, and optimize prompts for AI systems using evidence-based techniques. It helps eliminate anti-patterns and improve prompt structure, which is triggered by poor response quality or inconsistent outputs. The process includes A/B testing to validate effectiveness and produces an optimized prompt along with an analysis report.

View skill

deepspeed

Design

This skill provides expert guidance for distributed training using Microsoft's DeepSpeed library. It helps developers implement optimization techniques like ZeRO stages, pipeline parallelism, and mixed-precision training. Use this skill when working with DeepSpeed features, debugging code, or learning best practices for large-scale model training.

View skill

Verification & Quality Assurance

Other

This skill automatically verifies and scores the quality of code and agent outputs using a 0.95 accuracy threshold. It performs truth scoring, code correctness checks, and can instantly roll back changes that fail verification. Use it to ensure high-quality outputs and maintain codebase reliability in your development workflow.

View skill

performance-analysis

Other

This skill provides comprehensive performance analysis for Claude Flow swarms, detecting bottlenecks and profiling operations. It generates detailed reports and offers AI-powered optimization recommendations to improve swarm performance. Use it when you need to monitor, analyze, and optimize the efficiency of your Claude Flow implementations.

View skill