Importing HuggingFace Models into Ollama¶

This notebook demonstrates how to import GGUF models from HuggingFace into Ollama.

Model: Nous-Hermes-2-Mistral-7B-DPO¶

Property	Value
Parameters	7B
Architecture	Mistral (Llama-compatible)
Prompt Format	ChatML
License	Apache 2.0
Quantization	Q4_K_M (4.37 GB)

Prerequisites¶

Ollama pod running: ujust ollama start
Internet connection for downloading from HuggingFace

1. Setup & Configuration¶

In [1]:

  Copied!     
 
import os
import time
import ollama

# === Configuration ===
OLLAMA_HOST = os.getenv("OLLAMA_HOST", "http://ollama:11434")
HF_MODEL = "hf.co/NousResearch/Nous-Hermes-2-Mistral-7B-DPO-GGUF:Q4_K_M"

print(f"Ollama host: {OLLAMA_HOST}")
print(f"HuggingFace model: {HF_MODEL}")
import os import time import ollama # === Configuration === OLLAMA_HOST = os.getenv("OLLAMA_HOST", "http://ollama:11434") HF_MODEL = "hf.co/NousResearch/Nous-Hermes-2-Mistral-7B-DPO-GGUF:Q4_K_M" print(f"Ollama host: {OLLAMA_HOST}") print(f"HuggingFace model: {HF_MODEL}")

Ollama host: http://ollama:11434
HuggingFace model: hf.co/NousResearch/Nous-Hermes-2-Mistral-7B-DPO-GGUF:Q4_K_M

2. Connection Health Check¶

In [2]:

  Copied!     
 
def check_ollama_health() -> bool:
    """Check if Ollama server is running.
    
    Returns:
        bool: True if server is healthy
    """
    try:
        models = ollama.list()
        print("✓ Ollama server is running!")
        model_names = [m.get("model", "") for m in models.get("models", [])]
        
        if model_names:
            print(f"  Currently installed: {len(model_names)} model(s)")
        else:
            print("  No models currently installed")
        return True
    except Exception as e:
        print(f"✗ Cannot connect to Ollama server!")
        print(f"Error: {e}")
        print("To fix this, run: ujust ollama start")
        return False

ollama_healthy = check_ollama_health()
def check_ollama_health() -> bool: """Check if Ollama server is running. Returns: bool: True if server is healthy """ try: models = ollama.list() print("✓ Ollama server is running!") model_names = [m.get("model", "") for m in models.get("models", [])] if model_names: print(f" Currently installed: {len(model_names)} model(s)") else: print(" No models currently installed") return True except Exception as e: print(f"✗ Cannot connect to Ollama server!") print(f"Error: {e}") print("To fix this, run: ujust ollama start") return False ollama_healthy = check_ollama_health()

✓ Ollama server is running!
  Currently installed: 2 model(s)

3. Pull Model from HuggingFace¶

Ollama can directly pull GGUF models from HuggingFace using the hf.co/ prefix.

In [3]:

  Copied!     
 
print(f"=== Pulling {HF_MODEL} ===")
print()

if not ollama_healthy:
    print("⚠ Skipping - Ollama server not available")
    print("  Run: ujust ollama start")
else:
    try:
        print("Downloading from HuggingFace...")
        print()
        
        last_status = ""
        layer_count = 0
        total_bytes = 0
        
        for progress in ollama.pull(HF_MODEL, stream=True):
            status = progress.get("status", "")
            digest = progress.get("digest", "")
            total = progress.get("total")
            
            # Only print when status changes (not progress updates)
            if status != last_status:
                # Count completed layers
                if last_status.startswith("pulling") and status != last_status:
                    layer_count += 1
                    if total:
                        total_bytes += total
                
                # Print status changes (skip repetitive pulling messages)
                if status == "pulling manifest":
                    print(f"  {status}")
                elif status.startswith("pulling") and digest:
                    short_digest = digest.split(":")[-1][:12] if ":" in digest else digest[:12]
                    size_mb = (total / 1024 / 1024) if total else 0
                    if size_mb > 100:
                        print(f"  pulling {short_digest}... ({size_mb:.0f} MB)")
                    elif size_mb > 0:
                        print(f"  pulling {short_digest}... ({size_mb:.1f} MB)")
                elif status in ["verifying sha256 digest", "writing manifest", "success"]:
                    print(f"  {status}")
                
                last_status = status
        
        print()
        print(f"✓ Model pulled successfully!")
        
    except Exception as e:
        print(f"✗ Error pulling model: {e}")
print(f"=== Pulling {HF_MODEL} ===") print() if not ollama_healthy: print("⚠ Skipping - Ollama server not available") print(" Run: ujust ollama start") else: try: print("Downloading from HuggingFace...") print() last_status = "" layer_count = 0 total_bytes = 0 for progress in ollama.pull(HF_MODEL, stream=True): status = progress.get("status", "") digest = progress.get("digest", "") total = progress.get("total") # Only print when status changes (not progress updates) if status != last_status: # Count completed layers if last_status.startswith("pulling") and status != last_status: layer_count += 1 if total: total_bytes += total # Print status changes (skip repetitive pulling messages) if status == "pulling manifest": print(f" {status}") elif status.startswith("pulling") and digest: short_digest = digest.split(":")[-1][:12] if ":" in digest else digest[:12] size_mb = (total / 1024 / 1024) if total else 0 if size_mb > 100: print(f" pulling {short_digest}... ({size_mb:.0f} MB)") elif size_mb > 0: print(f" pulling {short_digest}... ({size_mb:.1f} MB)") elif status in ["verifying sha256 digest", "writing manifest", "success"]: print(f" {status}") last_status = status print() print(f"✓ Model pulled successfully!") except Exception as e: print(f"✗ Error pulling model: {e}")

=== Pulling hf.co/NousResearch/Nous-Hermes-2-Mistral-7B-DPO-GGUF:Q4_K_M ===

Downloading from HuggingFace...

  pulling manifest  pulling 3f99518c1e2c... (4166 MB)
  pulling a47b02e00552... (0.0 MB)
  pulling b78301c0df4d... (0.0 MB)
  pulling 05644d9257f4... (0.0 MB)
  verifying sha256 digest
  writing manifest
  success

✓ Model pulled successfully!

In [4]:

  Copied!     
 
# Verify model is installed
print("=== Verify Model Installation ===")

models = ollama.list()
model_names = [m.get("model", "") for m in models.get("models", [])]

# Check for the HF model (may have different name format)
hf_model_installed = any("Nous-Hermes-2-Mistral-7B-DPO" in name or HF_MODEL in name for name in model_names)

if hf_model_installed:
    print(f"✓ Model is installed")
    for name in model_names:
        if "Nous-Hermes" in name or "hf.co" in name:
            print(f"  Name: {name}")
else:
    print("✗ Model not found in list")
    print("Available models:")
    for name in model_names:
        print(f"  - {name}")
# Verify model is installed print("=== Verify Model Installation ===") models = ollama.list() model_names = [m.get("model", "") for m in models.get("models", [])] # Check for the HF model (may have different name format) hf_model_installed = any("Nous-Hermes-2-Mistral-7B-DPO" in name or HF_MODEL in name for name in model_names) if hf_model_installed: print(f"✓ Model is installed") for name in model_names: if "Nous-Hermes" in name or "hf.co" in name: print(f" Name: {name}") else: print("✗ Model not found in list") print("Available models:") for name in model_names: print(f" - {name}")

=== Verify Model Installation ===
✓ Model is installed
  Name: hf.co/NousResearch/Nous-Hermes-2-Mistral-7B-DPO-GGUF:Q4_K_M

4. Test Imported Model¶

In [5]:

  Copied!     
 
print("=== Show Model Details ===")

try:
    model_info = ollama.show(HF_MODEL)
    
    print(f"Model: {HF_MODEL}")
    print()
    if "details" in model_info:
        details = model_info["details"]
        print("Details:")
        print(f"  Family: {details.get('family', 'N/A')}")
        print(f"  Parameter Size: {details.get('parameter_size', 'N/A')}")
        print(f"  Quantization: {details.get('quantization_level', 'N/A')}")
    
    print()
    print("Model file preview:")
    modelfile = model_info.get("modelfile", "N/A")
    print(f"  {modelfile[:300]}..." if len(modelfile) > 300 else f"  {modelfile}")
except Exception as e:
    print(f"✗ Error: {e}")
print("=== Show Model Details ===") try: model_info = ollama.show(HF_MODEL) print(f"Model: {HF_MODEL}") print() if "details" in model_info: details = model_info["details"] print("Details:") print(f" Family: {details.get('family', 'N/A')}") print(f" Parameter Size: {details.get('parameter_size', 'N/A')}") print(f" Quantization: {details.get('quantization_level', 'N/A')}") print() print("Model file preview:") modelfile = model_info.get("modelfile", "N/A") print(f" {modelfile[:300]}..." if len(modelfile) > 300 else f" {modelfile}") except Exception as e: print(f"✗ Error: {e}")

=== Show Model Details ===
Model: hf.co/NousResearch/Nous-Hermes-2-Mistral-7B-DPO-GGUF:Q4_K_M

Details:
  Family: llama
  Parameter Size: 7.24B
  Quantization: unknown

Model file preview:
  # Modelfile generated by "ollama show"
# To build a new Modelfile based on this, replace FROM with:
# FROM hf.co/NousResearch/Nous-Hermes-2-Mistral-7B-DPO-GGUF:Q4_K_M

FROM /home/jovian/.ollama/models/blobs/sha256-3f99518c1e2c1b2cee14c3cd7c110358ceb89cf2be0be0626d11ebd8571ff0ff
TEMPLATE "<|im_start|...

In [6]:

  Copied!     
 
print("=== Generate Response ===")

try:
    prompt = "What is the capital of France? Answer in one sentence."
    print(f"Prompt: {prompt}")
    print()

    start_time = time.perf_counter()
    result = ollama.generate(
        model=HF_MODEL,
        prompt=prompt
    )
    end_time = time.perf_counter()

    print(f"Response: {result['response']}")
    print()
    print(f"Latency: {end_time - start_time:.2f}s")
    print(f"Eval tokens: {result.get('eval_count', 'N/A')}")
    
    if result.get('eval_count') and result.get('eval_duration'):
        tokens_per_sec = result['eval_count'] / (result['eval_duration'] / 1e9)
        print(f"Tokens/second: {tokens_per_sec:.1f}")
except Exception as e:
    print(f"✗ Error: {e}")
print("=== Generate Response ===") try: prompt = "What is the capital of France? Answer in one sentence." print(f"Prompt: {prompt}") print() start_time = time.perf_counter() result = ollama.generate( model=HF_MODEL, prompt=prompt ) end_time = time.perf_counter() print(f"Response: {result['response']}") print() print(f"Latency: {end_time - start_time:.2f}s") print(f"Eval tokens: {result.get('eval_count', 'N/A')}") if result.get('eval_count') and result.get('eval_duration'): tokens_per_sec = result['eval_count'] / (result['eval_duration'] / 1e9) print(f"Tokens/second: {tokens_per_sec:.1f}") except Exception as e: print(f"✗ Error: {e}")

=== Generate Response ===
Prompt: What is the capital of France? Answer in one sentence.
Response: The capital of France is Paris.

Latency: 0.97s
Eval tokens: 8
Tokens/second: 149.7

In [7]:

  Copied!     
 
print("=== Chat Completion (ChatML) ===")

try:
    # Nous-Hermes-2 uses ChatML format natively
    response = ollama.chat(
        model=HF_MODEL,
        messages=[
            {"role": "system", "content": "You are Hermes 2, a helpful AI assistant."},
            {"role": "user", "content": "Explain quantum computing in two sentences."}
        ]
    )

    print(f"System: You are Hermes 2, a helpful AI assistant.")
    print(f"User: Explain quantum computing in two sentences.")
    print()
    print(f"Hermes: {response['message']['content']}")
except Exception as e:
    print(f"✗ Error: {e}")
print("=== Chat Completion (ChatML) ===") try: # Nous-Hermes-2 uses ChatML format natively response = ollama.chat( model=HF_MODEL, messages=[ {"role": "system", "content": "You are Hermes 2, a helpful AI assistant."}, {"role": "user", "content": "Explain quantum computing in two sentences."} ] ) print(f"System: You are Hermes 2, a helpful AI assistant.") print(f"User: Explain quantum computing in two sentences.") print() print(f"Hermes: {response['message']['content']}") except Exception as e: print(f"✗ Error: {e}")

=== Chat Completion (ChatML) ===System: You are Hermes 2, a helpful AI assistant.
User: Explain quantum computing in two sentences.

Hermes: Quantum computing is a type of computing that utilizes the principles of quantum mechanics to perform operations on data using qubits, which can represent multiple values simultaneously, allowing for potentially faster and more efficient calculations than traditional binary-based computers. It has the potential to solve complex problems that are difficult or impossible for classical computers to handle.

5. Cleanup¶

Delete the model to free disk space (optional).

In [8]:

  Copied!     
 
print("=== Delete Model ===")

# Uncomment the lines below to delete the model
# print(f"Deleting '{HF_MODEL}'...")
# try:
#     ollama.delete(HF_MODEL)
#     print("✓ Model deleted successfully!")
# except Exception as e:
#     print(f"✗ Error: {e}")

print("⚠ Deletion is commented out to preserve the model.")
print(f"  To delete, uncomment the code above or run:")
print(f"  ollama.delete('{HF_MODEL}')")
print("=== Delete Model ===") # Uncomment the lines below to delete the model # print(f"Deleting '{HF_MODEL}'...") # try: # ollama.delete(HF_MODEL) # print("✓ Model deleted successfully!") # except Exception as e: # print(f"✗ Error: {e}") print("⚠ Deletion is commented out to preserve the model.") print(f" To delete, uncomment the code above or run:") print(f" ollama.delete('{HF_MODEL}')")

=== Delete Model ===
⚠ Deletion is commented out to preserve the model.
  To delete, uncomment the code above or run:
  ollama.delete('hf.co/NousResearch/Nous-Hermes-2-Mistral-7B-DPO-GGUF:Q4_K_M')

Summary¶

This notebook demonstrated importing a HuggingFace model into Ollama.

Quick Reference¶

import ollama

# Pull any GGUF model from HuggingFace
HF_MODEL = "hf.co/NousResearch/Nous-Hermes-2-Mistral-7B-DPO-GGUF:Q4_K_M"

# Download with progress
for progress in ollama.pull(HF_MODEL, stream=True):
    print(progress.get("status"))

# Use like any other Ollama model
response = ollama.chat(
    model=HF_MODEL,
    messages=[{"role": "user", "content": "Hello!"}]
)