SFT Training Test: Ministral (Text-Only)¶

Tests Supervised Fine-Tuning with Unsloth's optimized SFTTrainer on Ministral-3B using text-only mode.

Model Variant: Text-only (FastLanguageModel) Expected Result: Testing - Ministral is multimodal architecture

Key features tested:

FastLanguageModel loading with 4-bit quantization (Ministral multimodal architecture)
LoRA adapter configuration
SFTTrainer with minimal synthetic dataset
Post-training inference verification

Key Differences from Qwen:

Uses unsloth/Ministral-3-3B-Reasoning-2512 (multimodal architecture)
Chat template uses multimodal format: {"type": "text", "text": "..."}
FastLanguageModel with multimodal model (testing compatibility)

Important: This notebook includes a kernel shutdown cell at the end to release all GPU memory.

In [8]:

  Copied!     
 
# Environment Setup
import os
from dotenv import load_dotenv
load_dotenv()

# Force text-based progress instead of HTML widgets
os.environ["TQDM_NOTEBOOK"] = "false"

# CRITICAL: Import unsloth FIRST for proper TRL patching
import unsloth
from unsloth import FastLanguageModel, is_bf16_supported

import torch

# Environment summary
gpu = torch.cuda.get_device_name(0) if torch.cuda.is_available() else "CPU"
print(f"Environment: unsloth {unsloth.__version__}, PyTorch {torch.__version__}, {gpu}")
print(f"HF_TOKEN loaded: {'Yes' if os.environ.get('HF_TOKEN') else 'No'}")
# Environment Setup import os from dotenv import load_dotenv load_dotenv() # Force text-based progress instead of HTML widgets os.environ["TQDM_NOTEBOOK"] = "false" # CRITICAL: Import unsloth FIRST for proper TRL patching import unsloth from unsloth import FastLanguageModel, is_bf16_supported import torch # Environment summary gpu = torch.cuda.get_device_name(0) if torch.cuda.is_available() else "CPU" print(f"Environment: unsloth {unsloth.__version__}, PyTorch {torch.__version__}, {gpu}") print(f"HF_TOKEN loaded: {'Yes' if os.environ.get('HF_TOKEN') else 'No'}")

Out[8]:

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.

Out[8]:

/opt/pixi/.pixi/envs/default/lib/python3.13/site-packages/trl/__init__.py:203: UserWarning: TRL currently supports vLLM versions: 0.10.2, 0.11.0, 0.11.1, 0.11.2. You have version 0.14.0rc1.dev201+gadcf682fc.cu130 installed. We recommend installing a supported version to avoid compatibility issues.
  if is_vllm_available():

Out[8]:

🦥 Unsloth Zoo will now patch everything to make training faster!

Out[8]:

Environment: unsloth 2025.12.10, PyTorch 2.9.1+cu130, NVIDIA GeForce RTX 4080 SUPER
HF_TOKEN loaded: Yes

In [9]:

  Copied!     
 
# Load Ministral-3B with 4-bit quantization (using FastLanguageModel for text-only)
MODEL_NAME = "unsloth/Ministral-3-3B-Reasoning-2512"
print(f"\nLoading {MODEL_NAME.split('/')[-1]} with FastLanguageModel (text-only mode)...")

model, tokenizer = FastLanguageModel.from_pretrained(
    MODEL_NAME,
    max_seq_length=512,
    load_in_4bit=True,
    dtype=None,  # Auto-detect
)
print(f"Model loaded: {type(model).__name__}")
# Load Ministral-3B with 4-bit quantization (using FastLanguageModel for text-only) MODEL_NAME = "unsloth/Ministral-3-3B-Reasoning-2512" print(f"\nLoading {MODEL_NAME.split('/')[-1]} with FastLanguageModel (text-only mode)...") model, tokenizer = FastLanguageModel.from_pretrained( MODEL_NAME, max_seq_length=512, load_in_4bit=True, dtype=None, # Auto-detect ) print(f"Model loaded: {type(model).__name__}")

Out[9]:

Loading Ministral-3-3B-Reasoning-2512 with FastLanguageModel (text-only mode)...

Out[9]:

==((====))==  Unsloth 2025.12.10: Fast Ministral3 patching. Transformers: 5.0.0rc1. vLLM: 0.14.0rc1.dev201+gadcf682fc.cu130.
   \\   /|    NVIDIA GeForce RTX 4080 SUPER. Num GPUs = 1. Max memory: 15.568 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.9.1+cu130. CUDA: 8.9. CUDA Toolkit: 13.0. Triton: 3.5.1
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.33.post2. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!

Out[9]:

Loading weights:   0%|          | 0/458 [00:00<?, ?it/s]

Out[9]:

Model loaded: Mistral3ForConditionalGeneration

In [10]:

  Copied!     
 
# Apply LoRA adapters
model = FastLanguageModel.get_peft_model(
    model,
    r=16,
    lora_alpha=16,
    lora_dropout=0,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
                    "gate_proj", "up_proj", "down_proj"],
    bias="none",
    use_gradient_checkpointing="unsloth",
    random_state=42,
)

trainable = sum(p.numel() for p in model.parameters() if p.requires_grad)
total = sum(p.numel() for p in model.parameters())
print(f"LoRA applied: {trainable:,} trainable / {total:,} total ({100*trainable/total:.2f}%)")
# Apply LoRA adapters model = FastLanguageModel.get_peft_model( model, r=16, lora_alpha=16, lora_dropout=0, target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"], bias="none", use_gradient_checkpointing="unsloth", random_state=42, ) trainable = sum(p.numel() for p in model.parameters() if p.requires_grad) total = sum(p.numel() for p in model.parameters()) print(f"LoRA applied: {trainable:,} trainable / {total:,} total ({100*trainable/total:.2f}%)")

Out[10]:

Unsloth: Making `model.base_model.model.model.vision_tower.transformer` require gradients

Out[10]:

LoRA applied: 33,751,040 trainable / 2,160,030,720 total (1.56%)

In [11]:

  Copied!     
 
# Create minimal synthetic instruction dataset (5 samples)
# Using Ministral's multimodal chat format for text-only content
from datasets import Dataset

# Synthetic instruction-response pairs for testing
synthetic_data = [
    {"instruction": "What is machine learning?",
     "response": "Machine learning is a subset of artificial intelligence where computers learn patterns from data."},
    {"instruction": "Explain Python in one sentence.",
     "response": "Python is a high-level programming language known for its readability and versatility."},
    {"instruction": "What is a neural network?",
     "response": "A neural network is a computational model inspired by biological neurons that processes information."},
    {"instruction": "Define supervised learning.",
     "response": "Supervised learning is training a model on labeled data to predict outcomes for new inputs."},
    {"instruction": "What is gradient descent?",
     "response": "Gradient descent is an optimization algorithm that minimizes loss by iteratively adjusting parameters."},
]

# Format as chat conversations using Ministral's multimodal format
def format_conversation(sample):
    # Ministral uses multimodal format even for text: [{"type": "text", "text": "..."}]
    messages = [
        {"role": "user", "content": [{"type": "text", "text": sample["instruction"]}]},
        {"role": "assistant", "content": [{"type": "text", "text": sample["response"]}]}
    ]
    return {"text": tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=False)}

dataset = Dataset.from_list(synthetic_data)
dataset = dataset.map(format_conversation, remove_columns=["instruction", "response"])
print(f"Dataset created: {len(dataset)} samples")
print(f"Sample: {dataset[0]['text'][:150]}...")
# Create minimal synthetic instruction dataset (5 samples) # Using Ministral's multimodal chat format for text-only content from datasets import Dataset # Synthetic instruction-response pairs for testing synthetic_data = [ {"instruction": "What is machine learning?", "response": "Machine learning is a subset of artificial intelligence where computers learn patterns from data."}, {"instruction": "Explain Python in one sentence.", "response": "Python is a high-level programming language known for its readability and versatility."}, {"instruction": "What is a neural network?", "response": "A neural network is a computational model inspired by biological neurons that processes information."}, {"instruction": "Define supervised learning.", "response": "Supervised learning is training a model on labeled data to predict outcomes for new inputs."}, {"instruction": "What is gradient descent?", "response": "Gradient descent is an optimization algorithm that minimizes loss by iteratively adjusting parameters."}, ] # Format as chat conversations using Ministral's multimodal format def format_conversation(sample): # Ministral uses multimodal format even for text: [{"type": "text", "text": "..."}] messages = [ {"role": "user", "content": [{"type": "text", "text": sample["instruction"]}]}, {"role": "assistant", "content": [{"type": "text", "text": sample["response"]}]} ] return {"text": tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=False)} dataset = Dataset.from_list(synthetic_data) dataset = dataset.map(format_conversation, remove_columns=["instruction", "response"]) print(f"Dataset created: {len(dataset)} samples") print(f"Sample: {dataset[0]['text'][:150]}...")

Out[11]:

Map:   0%|          | 0/5 [00:00<?, ? examples/s]

Out[11]:

Dataset created: 5 samples
Sample: <s>[SYSTEM_PROMPT]# HOW YOU SHOULD THINK AND ANSWER

First draft your thinking process (inner monologue) until you arrive at a response. Format your r...

In [ ]:

  Copied!     
 
# SFT Training (minimal steps for testing)
from trl import SFTTrainer, SFTConfig

sft_config = SFTConfig(
    output_dir="outputs_sft_ministral_text_test",
    per_device_train_batch_size=1,
    gradient_accumulation_steps=1,
    max_steps=3,  # Minimal steps for testing
    warmup_steps=1,
    learning_rate=2e-4,
    logging_steps=1,
    fp16=not is_bf16_supported(),
    bf16=is_bf16_supported(),
    optim="adamw_8bit",
    weight_decay=0.01,
    max_seq_length=512,
    seed=42,
)

trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=dataset,
    dataset_text_field="text",
    args=sft_config,
)

print("Starting SFT training (3 steps)...")
try:
    trainer_stats = trainer.train()
    final_loss = trainer_stats.metrics.get('train_loss', 'N/A')
    print(f"Training completed. Final loss: {final_loss:.4f}")
    SFT_TEXT_SUPPORTED = True
except Exception as e:
    print(f"Training failed: {e}")
    SFT_TEXT_SUPPORTED = False
# SFT Training (minimal steps for testing) from trl import SFTTrainer, SFTConfig sft_config = SFTConfig( output_dir="outputs_sft_ministral_text_test", per_device_train_batch_size=1, gradient_accumulation_steps=1, max_steps=3, # Minimal steps for testing warmup_steps=1, learning_rate=2e-4, logging_steps=1, fp16=not is_bf16_supported(), bf16=is_bf16_supported(), optim="adamw_8bit", weight_decay=0.01, max_seq_length=512, seed=42, ) trainer = SFTTrainer( model=model, tokenizer=tokenizer, train_dataset=dataset, dataset_text_field="text", args=sft_config, ) print("Starting SFT training (3 steps)...") try: trainer_stats = trainer.train() final_loss = trainer_stats.metrics.get('train_loss', 'N/A') print(f"Training completed. Final loss: {final_loss:.4f}") SFT_TEXT_SUPPORTED = True except Exception as e: print(f"Training failed: {e}") SFT_TEXT_SUPPORTED = False

In [13]:

  Copied!     
 
# Post-training inference test
FastLanguageModel.for_inference(model)

# Use Ministral's multimodal format for text
messages = [{"role": "user", "content": [{"type": "text", "text": "What is deep learning?"}]}]
input_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(None, input_text, add_special_tokens=False, return_tensors="pt").to("cuda")

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=64,
        temperature=0.7,
        top_p=0.9,
        do_sample=True,
        pad_token_id=tokenizer.pad_token_id,
    )

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
# Clean up BPE artifacts from Ministral tokenizer (Ġ=space, Ċ=newline)
response = response.replace('Ġ', ' ').replace('Ċ', '\n').strip()

# Clear success/failure banner
print("=" * 60)
if SFT_TEXT_SUPPORTED:
    print("SFT Training: SUPPORTED for Ministral (Text-Only)")
    print("Model: FastLanguageModel + Ministral-3-3B-Reasoning-2512")
else:
    print("SFT Training: NOT SUPPORTED for Ministral (Text-Only)")
    print("Reason: See error above")
print("=" * 60)
print(f"Sample generation:\n{response[-200:]}")
# Post-training inference test FastLanguageModel.for_inference(model) # Use Ministral's multimodal format for text messages = [{"role": "user", "content": [{"type": "text", "text": "What is deep learning?"}]}] input_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) inputs = tokenizer(None, input_text, add_special_tokens=False, return_tensors="pt").to("cuda") with torch.no_grad(): outputs = model.generate( **inputs, max_new_tokens=64, temperature=0.7, top_p=0.9, do_sample=True, pad_token_id=tokenizer.pad_token_id, ) response = tokenizer.decode(outputs[0], skip_special_tokens=True) # Clean up BPE artifacts from Ministral tokenizer (Ġ=space, Ċ=newline) response = response.replace('Ġ', ' ').replace('Ċ', '\n').strip() # Clear success/failure banner print("=" * 60) if SFT_TEXT_SUPPORTED: print("SFT Training: SUPPORTED for Ministral (Text-Only)") print("Model: FastLanguageModel + Ministral-3-3B-Reasoning-2512") else: print("SFT Training: NOT SUPPORTED for Ministral (Text-Only)") print("Reason: See error above") print("=" * 60) print(f"Sample generation:\n{response[-200:]}")

Out[13]:

============================================================
SFT Training: SUPPORTED for Ministral (Text-Only)
Model: FastLanguageModel + Ministral-3-3B-Reasoning-2512
============================================================
Sample generation:
earning.

Deep learning is a branch of artificial intelligence that deals with artificial neural networks which are able to "learn" from data. But to explain this, I need to break it down into simpler

Test Complete¶

The SFT Training Pipeline test for Ministral (Text-Only) has completed. The kernel will now shut down to release all GPU memory.

What Was Verified¶

FastLanguageModel loading with 4-bit quantization (Ministral-3B multimodal architecture)
LoRA adapter configuration (r=16, all projection modules)
Synthetic dataset creation with Ministral's multimodal chat format
SFTTrainer training loop (3 steps)
Post-training inference generation

Ministral Text-Only Notes¶

Uses FastLanguageModel instead of FastVisionModel
Chat format uses [{"type": "text", "text": "..."}] structure
No image data or vision collator required

Next Steps¶

Compare with 03_SFT_Training_Ministral_Vision.ipynb for vision training

In [14]:

  Copied!     
 
# Shutdown kernel to release all GPU memory
import IPython
print("Shutting down kernel to release GPU memory...")
app = IPython.Application.instance()
app.kernel.do_shutdown(restart=False)
# Shutdown kernel to release all GPU memory import IPython print("Shutting down kernel to release GPU memory...") app = IPython.Application.instance() app.kernel.do_shutdown(restart=False)

Out[14]:

Shutting down kernel to release GPU memory...

Out[14]:

{'status': 'ok', 'restart': False}