when-debugging-ml-training-use-ml-training-debugger
About
This skill helps developers diagnose and fix common machine learning training issues like loss divergence, overfitting, and slow convergence. It provides systematic debugging to identify root causes and generate fixes for training problems. Use it when you encounter poor validation performance or training instability to restore model convergence.
Quick Install
Claude Code
Recommendednpx skills add DNYoussef/ai-chrome-extension -a claude-code/plugin add https://github.com/DNYoussef/ai-chrome-extensiongit clone https://github.com/DNYoussef/ai-chrome-extension.git ~/.claude/skills/when-debugging-ml-training-use-ml-training-debuggerCopy and paste this command in Claude Code to install this skill
GitHub Repository
Related Skills
when-optimizing-prompts-use-prompt-architect
OtherPrompt Architect is a framework for developers to systematically analyze, refine, and optimize prompts using evidence-based techniques. It helps improve AI response quality and consistency by identifying anti-patterns and validating changes through A/B testing. Use it when you need to refactor an underperforming prompt or design a new, effective one from scratch.
deepspeed
DesignThis skill provides expert guidance for distributed training using Microsoft's DeepSpeed library. It helps developers implement optimization techniques like ZeRO stages, pipeline parallelism, and mixed-precision training. Use this skill when working with DeepSpeed features, debugging code, or learning best practices for large-scale model training.
when-profiling-performance-use-performance-profiler
OtherThis skill provides comprehensive performance profiling to measure, analyze, and optimize application performance across CPU, memory, I/O, and network dimensions. It helps developers identify bottlenecks, perform root cause analysis, and implement optimizations using tools like perf, Instruments, and clinic.js. Use it when you need systematic performance improvement through baseline measurement, detection, and optimization phases.
performance-analysis
OtherThis skill provides comprehensive performance analysis and bottleneck detection for Claude Flow swarms, helping developers identify optimization opportunities. It offers real-time monitoring, profiling of swarm operations, and generates detailed reports with actionable recommendations. Use this skill when you need to diagnose performance issues and improve the efficiency of your Claude Code applications.
