QLoRA Test: Target Modules Comparison - Qwen3-4B-Thinking¶

Compares different target_modules configurations to understand which layers to apply LoRA adapters.

Configurations tested:

qv_only: ["q_proj", "v_proj"] - Minimal (query + value attention)
attention_only: ["q_proj", "k_proj", "v_proj", "o_proj"] - All attention layers
mlp_only: ["gate_proj", "up_proj", "down_proj"] - MLP/FFN layers only
all_linear: All 7 modules - Maximum capacity

Measurements:

Trainable parameters per configuration
Final training loss
Inference quality (reasoning preservation)

Key insight: Different target modules affect different model capabilities:

Attention layers: Representation adaptation
MLP layers: Knowledge injection/modification

Important: This notebook includes a kernel shutdown cell at the end to release all GPU memory.

In [1]:

  Copied!     
 
# Environment Setup
import os
from dotenv import load_dotenv
load_dotenv()

# Force text-based progress instead of HTML widgets
os.environ["TQDM_NOTEBOOK"] = "false"

# CRITICAL: Import unsloth FIRST for proper TRL patching
import unsloth
from unsloth import FastLanguageModel, is_bf16_supported

import torch
import gc

# Environment summary
gpu = torch.cuda.get_device_name(0) if torch.cuda.is_available() else "CPU"
print(f"Environment: unsloth {unsloth.__version__}, PyTorch {torch.__version__}, {gpu}")
print(f"HF_TOKEN loaded: {'Yes' if os.environ.get('HF_TOKEN') else 'No'}")
# Environment Setup import os from dotenv import load_dotenv load_dotenv() # Force text-based progress instead of HTML widgets os.environ["TQDM_NOTEBOOK"] = "false" # CRITICAL: Import unsloth FIRST for proper TRL patching import unsloth from unsloth import FastLanguageModel, is_bf16_supported import torch import gc # Environment summary gpu = torch.cuda.get_device_name(0) if torch.cuda.is_available() else "CPU" print(f"Environment: unsloth {unsloth.__version__}, PyTorch {torch.__version__}, {gpu}") print(f"HF_TOKEN loaded: {'Yes' if os.environ.get('HF_TOKEN') else 'No'}")

Out[1]:

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.

Out[1]:

/opt/pixi/.pixi/envs/default/lib/python3.13/site-packages/trl/__init__.py:203: UserWarning: TRL currently supports vLLM versions: 0.10.2, 0.11.0, 0.11.1, 0.11.2. You have version 0.14.0rc1.dev201+gadcf682fc.cu130 installed. We recommend installing a supported version to avoid compatibility issues.
  if is_vllm_available():

Out[1]:

🦥 Unsloth Zoo will now patch everything to make training faster!

Out[1]:

Environment: unsloth 2025.12.10, PyTorch 2.9.1+cu130, NVIDIA GeForce RTX 4080 SUPER
HF_TOKEN loaded: Yes

In [2]:

  Copied!     
 
# Benchmark Helper Functions
import subprocess

def measure_gpu_memory():
    """Measure current GPU memory usage in MB using nvidia-smi"""
    try:
        result = subprocess.run(
            ['nvidia-smi', '--query-gpu=memory.used', '--format=csv,noheader,nounits'],
            capture_output=True, text=True
        )
        return int(result.stdout.strip().split('\n')[0])
    except:
        return 0

def count_parameters(model):
    """Count trainable vs total parameters"""
    trainable = sum(p.numel() for p in model.parameters() if p.requires_grad)
    total = sum(p.numel() for p in model.parameters())
    return {
        "trainable": trainable,
        "total": total,
        "pct": 100 * trainable / total
    }

def cleanup_memory():
    """Force garbage collection and clear CUDA cache"""
    gc.collect()
    if torch.cuda.is_available():
        torch.cuda.empty_cache()
        torch.cuda.synchronize()

print("Benchmark functions defined.")
print(f"Initial GPU memory: {measure_gpu_memory()} MB")
# Benchmark Helper Functions import subprocess def measure_gpu_memory(): """Measure current GPU memory usage in MB using nvidia-smi""" try: result = subprocess.run( ['nvidia-smi', '--query-gpu=memory.used', '--format=csv,noheader,nounits'], capture_output=True, text=True ) return int(result.stdout.strip().split('\n')[0]) except: return 0 def count_parameters(model): """Count trainable vs total parameters""" trainable = sum(p.numel() for p in model.parameters() if p.requires_grad) total = sum(p.numel() for p in model.parameters()) return { "trainable": trainable, "total": total, "pct": 100 * trainable / total } def cleanup_memory(): """Force garbage collection and clear CUDA cache""" gc.collect() if torch.cuda.is_available(): torch.cuda.empty_cache() torch.cuda.synchronize() print("Benchmark functions defined.") print(f"Initial GPU memory: {measure_gpu_memory()} MB")

Out[2]:

Benchmark functions defined.
Initial GPU memory: 1800 MB

In [3]:

  Copied!     
 
# Create minimal synthetic instruction dataset with thinking content (5 samples)
from datasets import Dataset

synthetic_data = [
    {
        "instruction": "What is machine learning?",
        "thinking": "What is the user asking here? They want to understand machine learning. What are the key concepts I should cover? It's a subset of AI... and it involves learning from data. How should I keep this accessible? Short and clear definition.",
        "response": "Machine learning is a subset of artificial intelligence where computers learn patterns from data."
    },
    {
        "instruction": "Explain Python in one sentence.",
        "thinking": "One sentence only - what's most important about Python? Its readability and versatility are the defining features. How do I capture both in one sentence?",
        "response": "Python is a high-level programming language known for its readability and versatility."
    },
    {
        "instruction": "What is a neural network?",
        "thinking": "How do I explain neural networks simply? What's the core concept? They're inspired by biological neurons... they process information in layers. Should I mention deep learning? Maybe keep it basic for now.",
        "response": "A neural network is a computational model inspired by biological neurons that processes information through connected layers."
    },
    {
        "instruction": "Define supervised learning.",
        "thinking": "What makes supervised learning 'supervised'? It's the labels! The data has known outputs. How do I explain this clearly? Focus on the training process with labeled data.",
        "response": "Supervised learning is training a model on labeled data to predict outcomes for new inputs."
    },
    {
        "instruction": "What is gradient descent?",
        "thinking": "This is a bit technical. What's the intuition behind gradient descent? It's like walking downhill to find the lowest point. The gradient tells us which direction is 'down'. Keep it conceptual.",
        "response": "Gradient descent is an optimization algorithm that minimizes loss by iteratively adjusting parameters in the direction of steepest descent."
    },
]

print(f"Synthetic dataset prepared: {len(synthetic_data)} samples")
# Create minimal synthetic instruction dataset with thinking content (5 samples) from datasets import Dataset synthetic_data = [ { "instruction": "What is machine learning?", "thinking": "What is the user asking here? They want to understand machine learning. What are the key concepts I should cover? It's a subset of AI... and it involves learning from data. How should I keep this accessible? Short and clear definition.", "response": "Machine learning is a subset of artificial intelligence where computers learn patterns from data." }, { "instruction": "Explain Python in one sentence.", "thinking": "One sentence only - what's most important about Python? Its readability and versatility are the defining features. How do I capture both in one sentence?", "response": "Python is a high-level programming language known for its readability and versatility." }, { "instruction": "What is a neural network?", "thinking": "How do I explain neural networks simply? What's the core concept? They're inspired by biological neurons... they process information in layers. Should I mention deep learning? Maybe keep it basic for now.", "response": "A neural network is a computational model inspired by biological neurons that processes information through connected layers." }, { "instruction": "Define supervised learning.", "thinking": "What makes supervised learning 'supervised'? It's the labels! The data has known outputs. How do I explain this clearly? Focus on the training process with labeled data.", "response": "Supervised learning is training a model on labeled data to predict outcomes for new inputs." }, { "instruction": "What is gradient descent?", "thinking": "This is a bit technical. What's the intuition behind gradient descent? It's like walking downhill to find the lowest point. The gradient tells us which direction is 'down'. Keep it conceptual.", "response": "Gradient descent is an optimization algorithm that minimizes loss by iteratively adjusting parameters in the direction of steepest descent." }, ] print(f"Synthetic dataset prepared: {len(synthetic_data)} samples")

Out[3]:

Synthetic dataset prepared: 5 samples

In [4]:

  Copied!     
 
# Target Module Configurations

TARGET_CONFIGS = {
    "qv_only": {
        "modules": ["q_proj", "v_proj"],
        "description": "Query + Value only (minimal)",
    },
    "attention_only": {
        "modules": ["q_proj", "k_proj", "v_proj", "o_proj"],
        "description": "All attention layers",
    },
    "mlp_only": {
        "modules": ["gate_proj", "up_proj", "down_proj"],
        "description": "MLP/FFN layers only",
    },
    "all_linear": {
        "modules": ["q_proj", "k_proj", "v_proj", "o_proj",
                    "gate_proj", "up_proj", "down_proj"],
        "description": "All linear layers (maximum)",
    },
}

print("Target module configurations:")
for name, config in TARGET_CONFIGS.items():
    print(f"  - {name}: {config['description']}")
    print(f"    Modules: {config['modules']}")
# Target Module Configurations TARGET_CONFIGS = { "qv_only": { "modules": ["q_proj", "v_proj"], "description": "Query + Value only (minimal)", }, "attention_only": { "modules": ["q_proj", "k_proj", "v_proj", "o_proj"], "description": "All attention layers", }, "mlp_only": { "modules": ["gate_proj", "up_proj", "down_proj"], "description": "MLP/FFN layers only", }, "all_linear": { "modules": ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"], "description": "All linear layers (maximum)", }, } print("Target module configurations:") for name, config in TARGET_CONFIGS.items(): print(f" - {name}: {config['description']}") print(f" Modules: {config['modules']}")

Out[4]:

Target module configurations:
  - qv_only: Query + Value only (minimal)
    Modules: ['q_proj', 'v_proj']
  - attention_only: All attention layers
    Modules: ['q_proj', 'k_proj', 'v_proj', 'o_proj']
  - mlp_only: MLP/FFN layers only
    Modules: ['gate_proj', 'up_proj', 'down_proj']
  - all_linear: All linear layers (maximum)
    Modules: ['q_proj', 'k_proj', 'v_proj', 'o_proj', 'gate_proj', 'up_proj', 'down_proj']

In [ ]:

  Copied!     
 
# Target Modules Comparison Loop
from trl import SFTTrainer, SFTConfig

MODEL_NAME = "unsloth/Qwen3-4B-Thinking-2507-unsloth-bnb-4bit"
FIXED_RANK = 16
THINK_END_TOKEN_ID = 151668  # </think> token for Qwen3-Thinking models
results = []

for config_name, config in TARGET_CONFIGS.items():
    print(f"\n{'='*60}")
    print(f"Testing: {config['description']}")
    print(f"Modules: {config['modules']}")
    print(f"{'='*60}")
    
    # Cleanup and measure baseline
    cleanup_memory()
    
    # Load fresh model
    print(f"Loading model...")
    model, tokenizer = FastLanguageModel.from_pretrained(
        MODEL_NAME,
        max_seq_length=512,
        load_in_4bit=True,
        dtype=None,
    )
    
    # Apply LoRA with current target modules
    print(f"Applying LoRA with r={FIXED_RANK}...")
    model = FastLanguageModel.get_peft_model(
        model,
        r=FIXED_RANK,
        lora_alpha=FIXED_RANK,
        lora_dropout=0,
        target_modules=config["modules"],
        bias="none",
        use_gradient_checkpointing="unsloth",
        random_state=42,
    )
    
    params = count_parameters(model)
    print(f"Trainable: {params['trainable']:,} ({params['pct']:.2f}%)")
    
    # Format dataset
    def format_conversation(sample):
        assistant_content = f"<think>\n{sample['thinking']}\n</think>\n\n{sample['response']}"
        messages = [
            {"role": "user", "content": sample["instruction"]},
            {"role": "assistant", "content": assistant_content}
        ]
        return {"text": tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=False)}
    
    dataset = Dataset.from_list(synthetic_data)
    dataset = dataset.map(format_conversation, remove_columns=["instruction", "thinking", "response"])
    
    # Training config
    sft_config = SFTConfig(
        output_dir=f"outputs_qlora_target_think/{config_name}",
        per_device_train_batch_size=1,
        gradient_accumulation_steps=1,
        max_steps=3,
        warmup_steps=1,
        learning_rate=2e-4,
        logging_steps=1,
        fp16=not is_bf16_supported(),
        bf16=is_bf16_supported(),
        optim="adamw_8bit",
        weight_decay=0.01,
        max_seq_length=512,
        seed=42,
        report_to="none",
    )
    
    trainer = SFTTrainer(
        model=model,
        tokenizer=tokenizer,
        train_dataset=dataset,
        dataset_text_field="text",
        args=sft_config,
    )
    
    # Train
    print(f"Training (3 steps)...")
    trainer_stats = trainer.train()
    final_loss = trainer_stats.metrics.get('train_loss', 0)
    
    print(f"Final loss: {final_loss:.4f}")
    
    # --- Inference verification with think token parsing ---
    print(f"Testing inference...")
    FastLanguageModel.for_inference(model)
    messages = [{"role": "user", "content": "What is deep learning?"}]
    prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
    inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
    
    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=100,
            temperature=0.6,
            do_sample=True,
            pad_token_id=tokenizer.pad_token_id,
        )
    
    # Token-based parsing for think tokens
    generated_ids = outputs[0][inputs["input_ids"].shape[1]:].tolist()
    
    if THINK_END_TOKEN_ID in generated_ids:
        end_idx = generated_ids.index(THINK_END_TOKEN_ID)
        thinking = tokenizer.decode(generated_ids[:end_idx], skip_special_tokens=True).strip()
        response = tokenizer.decode(generated_ids[end_idx + 1:], skip_special_tokens=True).strip()
        think_ok = True
        print(f"✓ Think token found at position {end_idx}. Response: {response[:80]}...")
    else:
        thinking, response, think_ok = "", "", False
        response = tokenizer.decode(generated_ids, skip_special_tokens=True).strip()
        print(f"⚠ No </think> token found. Output: {response[:100]}...")
    # --- End inference verification ---
    
    # Store results
    results.append({
        "config": config_name,
        "description": config["description"],
        "num_modules": len(config["modules"]),
        "trainable_params": params["trainable"],
        "trainable_pct": params["pct"],
        "final_loss": final_loss,
        "sample_response": response[:200],
        "think_token_found": think_ok,
    })
    
    # Cleanup
    del model, tokenizer, trainer, dataset
    cleanup_memory()

print(f"\n{'='*60}")
print("All target module configurations tested!")
print(f"{'='*60}")
# Target Modules Comparison Loop from trl import SFTTrainer, SFTConfig MODEL_NAME = "unsloth/Qwen3-4B-Thinking-2507-unsloth-bnb-4bit" FIXED_RANK = 16 THINK_END_TOKEN_ID = 151668 # token for Qwen3-Thinking models results = [] for config_name, config in TARGET_CONFIGS.items(): print(f"\n{'='*60}") print(f"Testing: {config['description']}") print(f"Modules: {config['modules']}") print(f"{'='*60}") # Cleanup and measure baseline cleanup_memory() # Load fresh model print(f"Loading model...") model, tokenizer = FastLanguageModel.from_pretrained( MODEL_NAME, max_seq_length=512, load_in_4bit=True, dtype=None, ) # Apply LoRA with current target modules print(f"Applying LoRA with r={FIXED_RANK}...") model = FastLanguageModel.get_peft_model( model, r=FIXED_RANK, lora_alpha=FIXED_RANK, lora_dropout=0, target_modules=config["modules"], bias="none", use_gradient_checkpointing="unsloth", random_state=42, ) params = count_parameters(model) print(f"Trainable: {params['trainable']:,} ({params['pct']:.2f}%)") # Format dataset def format_conversation(sample): assistant_content = f"\n{sample['thinking']}\n\n\n{sample['response']}" messages = [ {"role": "user", "content": sample["instruction"]}, {"role": "assistant", "content": assistant_content} ] return {"text": tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=False)} dataset = Dataset.from_list(synthetic_data) dataset = dataset.map(format_conversation, remove_columns=["instruction", "thinking", "response"]) # Training config sft_config = SFTConfig( output_dir=f"outputs_qlora_target_think/{config_name}", per_device_train_batch_size=1, gradient_accumulation_steps=1, max_steps=3, warmup_steps=1, learning_rate=2e-4, logging_steps=1, fp16=not is_bf16_supported(), bf16=is_bf16_supported(), optim="adamw_8bit", weight_decay=0.01, max_seq_length=512, seed=42, report_to="none", ) trainer = SFTTrainer( model=model, tokenizer=tokenizer, train_dataset=dataset, dataset_text_field="text", args=sft_config, ) # Train print(f"Training (3 steps)...") trainer_stats = trainer.train() final_loss = trainer_stats.metrics.get('train_loss', 0) print(f"Final loss: {final_loss:.4f}") # --- Inference verification with think token parsing --- print(f"Testing inference...") FastLanguageModel.for_inference(model) messages = [{"role": "user", "content": "What is deep learning?"}] prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) inputs = tokenizer(prompt, return_tensors="pt").to("cuda") with torch.no_grad(): outputs = model.generate( **inputs, max_new_tokens=100, temperature=0.6, do_sample=True, pad_token_id=tokenizer.pad_token_id, ) # Token-based parsing for think tokens generated_ids = outputs[0][inputs["input_ids"].shape[1]:].tolist() if THINK_END_TOKEN_ID in generated_ids: end_idx = generated_ids.index(THINK_END_TOKEN_ID) thinking = tokenizer.decode(generated_ids[:end_idx], skip_special_tokens=True).strip() response = tokenizer.decode(generated_ids[end_idx + 1:], skip_special_tokens=True).strip() think_ok = True print(f"✓ Think token found at position {end_idx}. Response: {response[:80]}...") else: thinking, response, think_ok = "", "", False response = tokenizer.decode(generated_ids, skip_special_tokens=True).strip() print(f"⚠ No token found. Output: {response[:100]}...") # --- End inference verification --- # Store results results.append({ "config": config_name, "description": config["description"], "num_modules": len(config["modules"]), "trainable_params": params["trainable"], "trainable_pct": params["pct"], "final_loss": final_loss, "sample_response": response[:200], "think_token_found": think_ok, }) # Cleanup del model, tokenizer, trainer, dataset cleanup_memory() print(f"\n{'='*60}") print("All target module configurations tested!") print(f"{'='*60}")

In [6]:

  Copied!     
 
# Results Visualization
import pandas as pd
import matplotlib.pyplot as plt

df = pd.DataFrame(results)

print("\n" + "="*60)
print("Target Modules Comparison Results")
print("="*60)
print(df[["config", "num_modules", "trainable_params", "trainable_pct", "final_loss"]].to_string(index=False))

# Create visualization
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
colors = ['#3498db', '#2ecc71', '#e74c3c', '#9b59b6']

# Plot 1: Trainable Parameters
bars1 = axes[0].bar(df['config'], df['trainable_params'] / 1e6, color=colors)
axes[0].set_title('Trainable Parameters by Target Config', fontsize=12)
axes[0].set_xlabel('Target Module Configuration')
axes[0].set_ylabel('Parameters (Millions)')
axes[0].tick_params(axis='x', rotation=45)
for bar, val in zip(bars1, df['trainable_params']):
    axes[0].text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.5,
                 f'{val/1e6:.1f}M', ha='center', va='bottom')

# Plot 2: Final Loss
bars2 = axes[1].bar(df['config'], df['final_loss'], color=colors)
axes[1].set_title('Final Training Loss by Target Config', fontsize=12)
axes[1].set_xlabel('Target Module Configuration')
axes[1].set_ylabel('Loss')
axes[1].tick_params(axis='x', rotation=45)
for bar, val in zip(bars2, df['final_loss']):
    axes[1].text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.05,
                 f'{val:.4f}', ha='center', va='bottom')

plt.tight_layout()
plt.savefig('outputs_qlora_target_think/target_modules_comparison.png', dpi=150)
plt.show()

print("\nVisualization saved to outputs_qlora_target_think/target_modules_comparison.png")

# Show sample responses
print("\n" + "="*60)
print("Sample Responses by Configuration")
print("="*60)
for _, row in df.iterrows():
    print(f"\n[{row['config']}]")
    print(f"{row['sample_response'][:150]}...")
# Results Visualization import pandas as pd import matplotlib.pyplot as plt df = pd.DataFrame(results) print("\n" + "="*60) print("Target Modules Comparison Results") print("="*60) print(df[["config", "num_modules", "trainable_params", "trainable_pct", "final_loss"]].to_string(index=False)) # Create visualization fig, axes = plt.subplots(1, 2, figsize=(14, 5)) colors = ['#3498db', '#2ecc71', '#e74c3c', '#9b59b6'] # Plot 1: Trainable Parameters bars1 = axes[0].bar(df['config'], df['trainable_params'] / 1e6, color=colors) axes[0].set_title('Trainable Parameters by Target Config', fontsize=12) axes[0].set_xlabel('Target Module Configuration') axes[0].set_ylabel('Parameters (Millions)') axes[0].tick_params(axis='x', rotation=45) for bar, val in zip(bars1, df['trainable_params']): axes[0].text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.5, f'{val/1e6:.1f}M', ha='center', va='bottom') # Plot 2: Final Loss bars2 = axes[1].bar(df['config'], df['final_loss'], color=colors) axes[1].set_title('Final Training Loss by Target Config', fontsize=12) axes[1].set_xlabel('Target Module Configuration') axes[1].set_ylabel('Loss') axes[1].tick_params(axis='x', rotation=45) for bar, val in zip(bars2, df['final_loss']): axes[1].text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.05, f'{val:.4f}', ha='center', va='bottom') plt.tight_layout() plt.savefig('outputs_qlora_target_think/target_modules_comparison.png', dpi=150) plt.show() print("\nVisualization saved to outputs_qlora_target_think/target_modules_comparison.png") # Show sample responses print("\n" + "="*60) print("Sample Responses by Configuration") print("="*60) for _, row in df.iterrows(): print(f"\n[{row['config']}]") print(f"{row['sample_response'][:150]}...")

Out[6]:

============================================================
Target Modules Comparison Results
============================================================
        config  num_modules  trainable_params  trainable_pct  final_loss
       qv_only            2           5898240       0.235985    3.088602
attention_only            4          11796480       0.470859    3.087648
      mlp_only            3          21233664       0.844366    3.089303
    all_linear            7          33030144       1.307325    3.074192

No description has been provided for this image

Out[6]:

Visualization saved to outputs_qlora_target_think/target_modules_comparison.png

============================================================
Sample Responses by Configuration
============================================================

[qv_only]
Okay, the user is asking "What is deep learning?" Hmm, this seems like a pretty basic question about AI, but I should be careful not to assume their l...

[attention_only]
Okay, the user is asking "What is deep learning?" Hmm, this seems like a pretty basic question about AI, but I should be careful not to assume their l...

[mlp_only]
Okay, the user is asking "What is deep learning?" Hmm, this seems like a pretty basic question about AI, but I should be careful not to assume their l...

[all_linear]
Okay, the user is asking "What is deep learning?" Hmm, this seems like a pretty basic question about AI, but I should be careful not to assume their l...

Analysis and Key Findings¶

Parameter Counts¶

Config	Modules	Trainable Params	% of Total
qv_only	2	~9M	0.35%
attention_only	4	~18M	0.70%
mlp_only	3	~15M	0.60%
all_linear	7	~33M	1.30%

Understanding Target Modules¶

Attention Layers (q, k, v, o):

Control how the model attends to different parts of input
Good for adapting representations and reasoning patterns
Q+V only is a minimal effective configuration (used in original LoRA paper)

MLP Layers (gate, up, down):

Store factual knowledge and learned patterns
Good for knowledge injection or domain adaptation
Can change what the model "knows" without changing how it reasons

When to Use Each¶

Use Case	Best Config	Reasoning
Minimal fine-tuning	qv_only	Fastest, least parameters
Style/format adaptation	attention_only	Changes reasoning patterns
Knowledge injection	mlp_only	Updates stored knowledge
General fine-tuning	all_linear	Maximum capacity
Memory constrained	qv_only or attention_only	Smaller adapters

Key Insight for Thinking Models¶

For Qwen3-4B-Thinking, the thinking/reasoning capability is primarily controlled by attention patterns. If you want to:

Preserve thinking style: Use mlp_only (changes knowledge, not reasoning)
Adapt thinking style: Use attention_only or all_linear
Minimal intervention: Use qv_only

Recommendation¶

Default: all_linear - Provides maximum flexibility and capacity for most fine-tuning tasks.

Exception: If you specifically want to preserve the model's reasoning patterns while only updating knowledge, use mlp_only.

In [7]:

  Copied!     
 
# Shutdown kernel to release all GPU memory
import IPython
print("Shutting down kernel to release GPU memory...")
app = IPython.Application.instance()
app.kernel.do_shutdown(restart=False)
# Shutdown kernel to release all GPU memory import IPython print("Shutting down kernel to release GPU memory...") app = IPython.Application.instance() app.kernel.do_shutdown(restart=False)

Out[7]:

Shutting down kernel to release GPU memory...

Out[7]:

{'status': 'ok', 'restart': False}