Back to Skills

deepspeed

zechenzhangAGI
Updated 7 days ago
477 views
62
2
62
View on GitHub
Designdeepspeeddistributed-trainingzeropipeline-parallelismmixed-precisionoptimizationmicrosoftlarge-scale-trainingfp16fp8

About

This skill provides expert guidance for distributed training using Microsoft's DeepSpeed library. It helps developers implement optimization techniques like ZeRO stages, pipeline parallelism, and mixed-precision training. Use this skill when working with DeepSpeed features, debugging code, or learning best practices for large-scale model training.

Quick Install

Claude Code

Recommended
Primary
npx skills add zechenzhangAGI/AI-research-SKILLs -a claude-code
Plugin CommandAlternative
/plugin add https://github.com/zechenzhangAGI/AI-research-SKILLs
Git CloneAlternative
git clone https://github.com/zechenzhangAGI/AI-research-SKILLs.git ~/.claude/skills/deepspeed

Copy and paste this command in Claude Code to install this skill

GitHub Repository

zechenzhangAGI/AI-research-SKILLs
Path: 08-distributed-training/deepspeed
0
aiai-researchclaudeclaude-codeclaude-skillscodex

Related Skills

when-optimizing-prompts-use-prompt-architect

Other

Prompt Architect is a framework for developers to systematically analyze, refine, and optimize prompts using evidence-based techniques. It helps improve AI response quality and consistency by identifying anti-patterns and validating changes through A/B testing. Use it when you need to refactor an underperforming prompt or design a new, effective one from scratch.

View skill

pytorch-fsdp

Design

This Claude Skill provides expert guidance for PyTorch Fully Sharded Data Parallel (FSDP) training, helping developers implement distributed training solutions. It covers key features like parameter sharding, mixed precision, CPU offloading, and FSDP2 for large-scale model training. Use this skill when working with FSDP APIs, debugging distributed training code, or learning best practices for sharded data parallelism.

View skill

performance-analysis

Other

This skill provides comprehensive performance analysis and bottleneck detection for Claude Flow swarms, helping developers identify optimization opportunities. It offers real-time monitoring, profiling of swarm operations, and generates detailed reports with actionable recommendations. Use this skill when you need to diagnose performance issues and improve the efficiency of your Claude Code applications.

View skill

when-profiling-performance-use-performance-profiler

Other

This skill provides comprehensive performance profiling to measure, analyze, and optimize application performance across CPU, memory, I/O, and network dimensions. It helps developers identify bottlenecks, perform root cause analysis, and implement optimizations using tools like perf, Instruments, and clinic.js. Use it when you need systematic performance improvement through baseline measurement, detection, and optimization phases.

View skill