Back to Skills

huggingface-tokenizers

davila7
Updated 9 days ago
354 views
18,478
1,685
18,478
View on GitHub
DocumentsTokenizationHuggingFaceBPEWordPieceUnigramFast TokenizationRustCustom TokenizerAlignment TrackingProduction

About

This skill provides high-performance tokenization using HuggingFace's Rust-based library, processing 1GB of text in under 20 seconds. It supports BPE, WordPiece, and Unigram algorithms while enabling custom tokenizer training and alignment tracking. Use it when you need production-fast tokenization or to build custom tokenizers integrated with the transformers ecosystem.

Quick Install

Claude Code

Recommended
Primary
npx skills add davila7/claude-code-templates -a claude-code
Plugin CommandAlternative
/plugin add https://github.com/davila7/claude-code-templates
Git CloneAlternative
git clone https://github.com/davila7/claude-code-templates.git ~/.claude/skills/huggingface-tokenizers

Copy and paste this command in Claude Code to install this skill

GitHub Repository

davila7/claude-code-templates
Path: cli-tool/components/skills/ai-research/tokenization-huggingface-tokenizers
0
anthropicanthropic-claudeclaudeclaude-code

Related Skills

quantizing-models-bitsandbytes

Other

This skill quantizes LLMs to 8-bit or 4-bit precision using bitsandbytes, achieving 50-75% memory reduction with minimal accuracy loss. It's ideal for running larger models on limited GPU memory or accelerating inference, supporting formats like INT8, NF4, and FP4. The skill integrates with HuggingFace Transformers and enables QLoRA training and 8-bit optimizers.

View skill

weights-and-biases

Design

This skill integrates Weights & Biases for comprehensive ML experiment tracking and MLOps. It automatically logs metrics, visualizes training in real-time, and manages hyperparameter sweeps and model versions. Use it to compare runs, optimize models, and collaborate within team workspaces directly from your development environment.

View skill

fine-tuning-with-trl

Other

This skill enables fine-tuning of LLMs using TRL's reinforcement learning methods including SFT, DPO, and PPO for RLHF and preference alignment. It's designed for aligning models with human feedback and works with HuggingFace Transformers. Use it when you need to implement RLHF, optimize with rewards, or train from human preferences.

View skill

qdrant-vector-search

Meta

The qdrant-vector-search skill provides a high-performance vector similarity search engine for building production RAG systems. It enables fast nearest neighbor search, hybrid search with filtering, and scalable vector storage powered by Rust. Use it when you need low-latency semantic search with horizontal scaling capabilities and full data control.

View skill