openrlhf-training

davila7

Updated 11 days ago

399 views

18,478

1,685

18,478

View on GitHub

DesignPost-TrainingOpenRLHFRLHFPPOGRPORLOODPORayvLLMDistributed TrainingLarge ModelsZeRO-3

About

OpenRLHF is a high-performance RLHF training framework for fine-tuning large language models (7B-70B+ parameters) using methods like PPO, DPO, and GRPO. It leverages Ray for distributed architecture and vLLM for accelerated inference, achieving speeds 2x faster than alternatives like DeepSpeedChat. Use this skill when you need efficient, distributed RLHF training with optimized GPU resource sharing and ZeRO-3 support.

Quick Install

Claude Code

Recommended

Primary

npx skills add davila7/claude-code-templates -a claude-code

Plugin CommandAlternative

/plugin add https://github.com/davila7/claude-code-templates

Git CloneAlternative

git clone https://github.com/davila7/claude-code-templates.git ~/.claude/skills/openrlhf-training

Copy and paste this command in Claude Code to install this skill

GitHub Repository

davila7/claude-code-templates

Path: cli-tool/components/skills/ai-research/post-training-openrlhf

anthropicanthropic-claudeclaudeclaude-code

Related Skills

fine-tuning-with-trl

Other

This skill enables fine-tuning of LLMs using TRL's reinforcement learning methods including SFT, DPO, and PPO for RLHF and preference alignment. It's designed for aligning models with human feedback and works with HuggingFace Transformers. Use it when you need to implement RLHF, optimize with rewards, or train from human preferences.

View skill

training-llms-megatron

Design

This skill trains massive LLMs (2B-462B parameters) using NVIDIA's Megatron-Core framework for maximum GPU efficiency. Use it when training models over 1B parameters and needing advanced parallelism like tensor, pipeline, or expert parallelism. It's a production-ready framework proven on models like Nemotron and LLaMA.

View skill

grpo-rl-training

Design

This skill provides expert guidance for implementing GRPO (Group Relative Policy Optimization) reinforcement learning fine-tuning using the TRL library. It's designed for training models on tasks requiring structured outputs, verifiable reasoning, or objective correctness metrics like coding or math. Key features include production-ready workflows for custom reward functions and enforcing specific output formats.

View skill

gptq

Other

GPTQ is a 4-bit post-training quantization technique for LLMs that enables 4x memory reduction and 3-4x faster inference with minimal accuracy loss. It's ideal for deploying large models on consumer GPUs and integrates with transformers and PEFT for QLoRA fine-tuning. Use it when you need to fit 70B+ parameter models on limited hardware while maintaining performance.

View skill