Unsloth Fine-tuning¶
Production fine-tuning workflows using the Unsloth framework for efficient LLM training
Collection Statistics
Total Notebooks: 37
Setup¶
Initial Unsloth configuration and environment setup
| # | Notebook | Description |
|---|---|---|
| 1 | Unsloth Environment Verification |
Fast Inference¶
Quick model inference examples with Llama and Qwen models
| # | Notebook | Description |
|---|---|---|
| 1 | Fast Inference Test: Llama-3.2-1B | |
| 2 | Fast Inference Test: Qwen3-4B | |
| 3 | Fast Inference Test: Qwen3-4B-Thinking-2507 |
Vision Training¶
Vision model fine-tuning with Ministral and Pixtral
| # | Notebook | Description |
|---|---|---|
| 1 | Unsloth Vision Training Verification | |
| 2 | Unsloth Vision Training Verification (Pixtral) |
SFT Training¶
Supervised Fine-Tuning for text and vision models
GRPO Training¶
Generative Reward Policy Optimization training
DPO Training¶
Direct Preference Optimization for alignment
| # | Notebook | Description |
|---|---|---|
| 1 | DPO Training Test: Qwen3-4B | |
| 2 | DPO Training Test: Qwen3-4B-Thinking-2507 |
Reward Training¶
Reward model training for RLHF
| # | Notebook | Description |
|---|---|---|
| 1 | Reward Model Training Test: Qwen3-4B | |
| 2 | Reward Model Training Test: Qwen3-4B-Thinking-2507 |
RLOO Training¶
Reinforcement Learning from Language Model Optimization
QLoRA Experiments¶
Quantized LoRA experiments including alpha scaling, rank comparison, and quantization