Back to Skills

deploying-monitoring-stacks

jeremylongshore
Updated 4 days ago
36 views
712
74
712
View on GitHub
Metadesigndata

About

This skill generates production-ready configurations for deploying monitoring stacks like Prometheus, Grafana, and Datadog. Use it when you need to set up metric collection, visualization dashboards, and alerting rules. It provides infrastructure-aware configurations for Kubernetes, Docker, or bare metal environments.

Quick Install

Claude Code

Recommended
Plugin CommandRecommended
/plugin add https://github.com/jeremylongshore/claude-code-plugins-plus
Git CloneAlternative
git clone https://github.com/jeremylongshore/claude-code-plugins-plus.git ~/.claude/skills/deploying-monitoring-stacks

Copy and paste this command in Claude Code to install this skill

Documentation

Prerequisites

Before using this skill, ensure:

  • Target infrastructure is identified (Kubernetes, Docker, bare metal)
  • Metric endpoints are accessible from monitoring platform
  • Storage backend is configured for time-series data
  • Alert notification channels are defined (email, Slack, PagerDuty)
  • Resource requirements are calculated based on scale

Instructions

  1. Select Platform: Choose Prometheus/Grafana, Datadog, or hybrid approach
  2. Deploy Collectors: Install exporters and agents on monitored systems
  3. Configure Scraping: Define metric collection endpoints and intervals
  4. Set Up Storage: Configure retention policies and data compaction
  5. Create Dashboards: Build visualization panels for key metrics
  6. Define Alerts: Create alerting rules with appropriate thresholds
  7. Test Monitoring: Verify metrics flow and alert triggering

Output

Prometheus + Grafana (Kubernetes):

# {baseDir}/monitoring/prometheus.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-config
data:
  prometheus.yml: |
    global:
      scrape_interval: 15s
      evaluation_interval: 15s
    scrape_configs:
      - job_name: 'kubernetes-pods'
        kubernetes_sd_configs:
          - role: pod
        relabel_configs:
          - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
            action: keep
            regex: true
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: prometheus
spec:
  replicas: 1
  template:
    spec:
      containers:
      - name: prometheus
        image: prom/prometheus:latest
        args:
          - '--config.file=/etc/prometheus/prometheus.yml'
          - '--storage.tsdb.retention.time=30d'
        ports:
        - containerPort: 9090

Grafana Dashboard Configuration:

{
  "dashboard": {
    "title": "Application Metrics",
    "panels": [
      {
        "title": "CPU Usage",
        "type": "graph",
        "targets": [
          {
            "expr": "rate(container_cpu_usage_seconds_total[5m])"
          }
        ]
      }
    ]
  }
}

Error Handling

Metrics Not Appearing

  • Error: "No data points"
  • Solution: Verify scrape targets are accessible and returning metrics

High Cardinality

  • Error: "Too many time series"
  • Solution: Reduce label combinations or increase Prometheus resources

Alert Not Firing

  • Error: "Alert condition met but no notification"
  • Solution: Check Alertmanager configuration and notification channels

Dashboard Load Failure

  • Error: "Failed to load dashboard"
  • Solution: Verify Grafana datasource configuration and permissions

Resources

GitHub Repository

jeremylongshore/claude-code-plugins-plus
Path: plugins/devops/monitoring-stack-deployer/skills/monitoring-stack-deployer
aiautomationclaude-codedevopsmarketplacemcp

Related Skills

content-collections

Meta

This skill provides a production-tested setup for Content Collections, a TypeScript-first tool that transforms Markdown/MDX files into type-safe data collections with Zod validation. Use it when building blogs, documentation sites, or content-heavy Vite + React applications to ensure type safety and automatic content validation. It covers everything from Vite plugin configuration and MDX compilation to deployment optimization and schema validation.

View skill

creating-opencode-plugins

Meta

This skill provides the structure and API specifications for creating OpenCode plugins that hook into 25+ event types like commands, files, and LSP operations. It offers implementation patterns for JavaScript/TypeScript modules that intercept and extend the AI assistant's lifecycle. Use it when you need to build event-driven plugins for monitoring, custom handling, or extending OpenCode's capabilities.

View skill

langchain

Meta

LangChain is a framework for building LLM applications using agents, chains, and RAG pipelines. It supports multiple LLM providers, offers 500+ integrations, and includes features like tool calling and memory management. Use it for rapid prototyping and deploying production systems like chatbots, autonomous agents, and question-answering services.

View skill

cloudflare-turnstile

Meta

This skill provides comprehensive guidance for implementing Cloudflare Turnstile as a CAPTCHA-alternative bot protection system. It covers integration for forms, login pages, API endpoints, and frameworks like React/Next.js/Hono, while handling invisible challenges that maintain user experience. Use it when migrating from reCAPTCHA, debugging error codes, or implementing token validation and E2E tests.

View skill