D1 02 Prompt templates and parsing
Prompt Templates, Few-Shot Learning & Output Parsing¶
This notebook demonstrates how to use LangChain's Prompt Templates, few-shot prompting techniques, and structured output parsing with a local open-source language model.
We will use the instruction-tuned model "NousResearch/Nous-Hermes-2-Mistral-7B-DPO" throughout, and explore:
- Prompt templates for reusability and clarity
- Few-shot prompting to guide the model with examples
- Structured output parsing using Pydantic
Bazzite-AI Setup Required
RunD0_00_Bazzite_AI_Setup.ipynbfirst to configure Ollama and verify GPU access.
Prompt Templates in LangChain¶
LangChain’s PromptTemplate lets you define reusable prompt structures with placeholders for dynamic input.
from langchain_core.prompts import PromptTemplate
prompt = PromptTemplate(
input_variables=["topic"],
template="Explain the following topic in simple terms:
{topic}"
)
print(prompt.format(topic="What is machine learning?"))
This is helpful for keeping prompts clean and consistent across inputs.
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig, pipeline
from langchain_huggingface.llms import HuggingFacePipeline
from langchain_huggingface import ChatHuggingFace
from langchain_core.prompts import PromptTemplate
from langchain_core.prompts.chat import SystemMessagePromptTemplate, HumanMessagePromptTemplate, ChatPromptTemplate, AIMessagePromptTemplate
from langchain_core.messages import SystemMessage, HumanMessage, AIMessage
from langchain_core.output_parsers import PydanticOutputParser
from pydantic import BaseModel
from typing import List
import json
import re
import os
[No output generated]
# Download model from HuggingFace (same base model as D1_01)
HF_LLM_MODEL = "NousResearch/Nous-Hermes-2-Mistral-7B-DPO"
[No output generated]
# 4-bit quantization config for efficient loading
quantization_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16
)
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(HF_LLM_MODEL)
# Load model with 4-bit quantization
model = AutoModelForCausalLM.from_pretrained(
HF_LLM_MODEL,
device_map="auto",
quantization_config=quantization_config,
)
# Create text generation pipeline
text_pipeline = pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
max_new_tokens=512,
do_sample=True,
temperature=0.7,
return_full_text=False,
eos_token_id=tokenizer.eos_token_id,
)
llm = HuggingFacePipeline(pipeline=text_pipeline)
Loading checkpoint shards: 0%| | 0/3 [00:00<?, ?it/s]
Device set to use cuda:0
prompt_template = PromptTemplate(
input_variables=["topic"],
template="Explain the following topic in simple terms:\n\n{topic}"
)
print(prompt_template.format(topic="What is machine learning?"))
Explain the following topic in simple terms: What is machine learning?
response = llm.invoke(prompt_template.format(topic="What is machine learning?"))
print(response)
Machine learning is a subset of AI (Artificial Intelligence) that involves the development of algorithms and models that can learn from data and then make predictions and decisions without being explicitly programmed. It enables systems to improve and adapt their performance based on new data. In simpler terms, machine learning is the ability of a computer system or algorithm to improve its performance at a specific task over time, without being explicitly programmed to do so. This is achieved by using statistical techniques to analyze large datasets, identify patterns, and make predictions and decisions based on this analysis. For example, an algorithm trained on a large dataset of images can learn to recognize and classify objects within those images, such as detecting and identifying different animals or vehicles. This ability to learn and adapt on its own makes machine learning a powerful tool for solving complex problems and making decisions based on data.
Let's have a look at another example:
simplify_prompt = PromptTemplate(
input_variables=["clause"],
template="""
You are a legal assistant that simplifies complex legal clauses into plain, understandable English.
Clause:
{clause}
Simplified Explanation:
"""
)
[No output generated]
legal_clause = (
"The lessee shall indemnify and hold harmless the lessor from any liabilities, damages, "
"or claims arising out of the use of the premises, except in cases of gross negligence."
)
formatted_prompt = simplify_prompt.format(clause=legal_clause)
response = llm.invoke(formatted_prompt)
print("Simplified:\n", response)
Simplified: The person who rents the property (lessee) must protect and compensate the property owner (lessor) from any losses, harm, or legal issues that may arise from using the property, except in cases where a very serious lack of care caused the problem.
# Ollama configuration (no API key needed!)
OLLAMA_HOST = os.getenv("OLLAMA_HOST", "http://ollama:11434")
# === Model Configuration ===
HF_LLM_MODEL = "NousResearch/Nous-Hermes-2-Mistral-7B-DPO-GGUF"
OLLAMA_LLM_MODEL = f"hf.co/{HF_LLM_MODEL}:Q4_K_M"
print(f"Ollama host: {OLLAMA_HOST}")
print(f"Model: {OLLAMA_LLM_MODEL}")
Ollama host: http://ollama:11434 Model: hf.co/NousResearch/Nous-Hermes-2-Mistral-7B-DPO-GGUF:Q4_K_M
from langchain_openai import ChatOpenAI
# Use Ollama as OpenAI-compatible endpoint (no API key required)
llm_ollama = ChatOpenAI(
base_url=f"{OLLAMA_HOST}/v1",
api_key="ollama", # Ollama ignores this but LangChain requires it
model=OLLAMA_LLM_MODEL,
temperature=0.7,
max_tokens=512
)
[No output generated]
response = llm_ollama.invoke(formatted_prompt)
print("Simplified:\n", response)
Simplified:
content="The person renting the property (lessee) promises to protect and not hold responsible the owner of the property (lessor) for any issues or problems that might happen because of using the place, unless it's something very careless." additional_kwargs={'refusal': None} response_metadata={'token_usage': {'completion_tokens': 50, 'prompt_tokens': 95, 'total_tokens': 145, 'completion_tokens_details': None, 'prompt_tokens_details': None}, 'model_provider': 'openai', 'model_name': 'hf.co/NousResearch/Nous-Hermes-2-Mistral-7B-DPO-GGUF:Q4_K_M', 'system_fingerprint': 'fp_ollama', 'id': 'chatcmpl-275', 'finish_reason': 'stop', 'logprobs': None} id='lc_run--019b66a9-c488-7a40-b47e-9c287eaf341e-0' usage_metadata={'input_tokens': 95, 'output_tokens': 50, 'total_tokens': 145, 'input_token_details': {}, 'output_token_details': {}}
Few Shot Prompt Template¶
Let's go over an example where you want a historical conversation to show the LLM Chat Bot a few examples, known as "Few Shot Prompts". We essentially provide some examples before sending the message history to the LLM. Be careful not to make the entire message too long, as you may hit context limits (but the latest models have quite large contexsts.
LangChain distinguishes between:
- PromptTemplates for simple string prompts
- MessagePromptTemplates for structured chat-style prompts using roles like system, user, assistant
So SystemMessagePromptTemplate helps build structured prompts that work with ChatModels
Creating Example Inputs and Outputs
template = "You are a helpful assistant that translates complex legal terms into plain and understandable language."
system_message_prompt = SystemMessagePromptTemplate.from_template(template)
[No output generated]
legal_text_1 = "Notwithstanding any provision to the contrary herein, the indemnitor agrees to indemnify, defend, and hold harmless the indemnitee from and against any and all claims, liabilities, damages, or expenses (including, without limitation, reasonable attorney’s fees) arising out of or related to the indemnitor’s acts or omissions, except to the extent that such claims, liabilities, damages, or expenses result from the gross negligence or willful misconduct of the indemnitee."
example_input_1 = HumanMessagePromptTemplate.from_template(legal_text_1)
plain_text_1 = "One party agrees to cover any costs, claims, or damages that happen because of their actions, including legal fees. However, they do not have to pay if the other party was extremely careless or acted intentionally wrong."
example_output_1 = AIMessagePromptTemplate.from_template(plain_text_1)
legal_text_2 = "This agreement shall be binding upon and inure to the benefit of the parties hereto and their respective heirs, executors, administrators, successors, and assigns, and shall not be assignable by either party without the prior written consent of the other, except that either party may assign its rights and obligations hereunder in connection with a merger, consolidation, or sale of substantially all of its assets."
example_input_2 = HumanMessagePromptTemplate.from_template(legal_text_2)
plain_text_2 = "This agreement applies to both parties and their future representatives, such as heirs or business successors. Neither party can transfer their rights under this agreement to someone else unless they get written permission. However, if one party merges with another company or sells most of its assets, they can transfer their rights without permission."
example_output_2 = AIMessagePromptTemplate.from_template(plain_text_2)
[No output generated]
human_template = "{legal_text}"
human_message_prompt = HumanMessagePromptTemplate.from_template(human_template)
[No output generated]
chat_prompt = ChatPromptTemplate.from_messages(
[system_message_prompt, example_input_1, example_output_1, example_input_2, example_output_2, human_message_prompt]
)
[No output generated]
some_example_text = "Any waiver of any term or condition of this agreement shall not be deemed a continuing waiver of such term or condition, nor shall it be considered a waiver of any other term or condition hereof. No failure or delay by either party in exercising any right, power, or privilege under this agreement shall operate as a waiver thereof, nor shall any single or partial exercise preclude any other or further exercise thereof or the exercise of any other right, power, or privilege."
request = chat_prompt.format_prompt(legal_text=some_example_text).to_messages()
[No output generated]
result = llm_ollama.invoke(request)
[No output generated]
print(result)
content="If one side makes an exception to any part of this agreement, it doesn't mean they're giving up on that part forever or that they'll ignore other parts of the agreement. Also, if one side doesn't act on time about their rights, powers, or privileges under the agreement, it doesn't mean they won't be able to do so later or that they'll give up any other rights." additional_kwargs={'refusal': None} response_metadata={'token_usage': {'completion_tokens': 90, 'prompt_tokens': 479, 'total_tokens': 569, 'completion_tokens_details': None, 'prompt_tokens_details': None}, 'model_provider': 'openai', 'model_name': 'hf.co/NousResearch/Nous-Hermes-2-Mistral-7B-DPO-GGUF:Q4_K_M', 'system_fingerprint': 'fp_ollama', 'id': 'chatcmpl-464', 'finish_reason': 'stop', 'logprobs': None} id='lc_run--019b66a9-ee4f-7900-836e-ef947f81c371-0' usage_metadata={'input_tokens': 479, 'output_tokens': 90, 'total_tokens': 569, 'input_token_details': {}, 'output_token_details': {}}
Parsing output¶
Large language models (LLMs) typically generate free-form text, which is great for human conversation — but not ideal when we want to extract specific information or automate downstream tasks.
The Problem Imagine asking an LLM:
"Summarize this contract and give me the parties involved, the start date, and any penalties."
If the model responds with a long paragraph, it becomes difficult to:
- Reliably extract the pieces you need
- Validate whether the answer is complete
- Feed the output into another system
The Solution Structured Output (e.g. JSON) By instructing the LLM to return data in a structured format like JSON, we can:
- Parse the output automatically, although this does not always work
- Validate that required fields are present
- Integrate with other tools and code seamlessly
Structured output turns the LLM into a more reliable component of your application.
Parsing with tools like Pydantic ensures your data is clean, complete, and ready for automation.
Define format Let's first see if we can get the output in form of a JSON object, by adding that request to the system prompt:
template = "You are a helpful assistant that translates complex legal terms into plain and understandable language. Respond only with a JSON object containing a single key 'translation' and its corresponding value."
system_message_prompt = SystemMessagePromptTemplate.from_template(template)
[No output generated]
chat_prompt = ChatPromptTemplate.from_messages(
[system_message_prompt, example_input_1, example_output_1, example_input_2, example_output_2, human_message_prompt]
)
[No output generated]
some_example_text = "Any waiver of any term or condition of this agreement shall not be deemed a continuing waiver of such term or condition, nor shall it be considered a waiver of any other term or condition hereof. No failure or delay by either party in exercising any right, power, or privilege under this agreement shall operate as a waiver thereof, nor shall any single or partial exercise preclude any other or further exercise thereof or the exercise of any other right, power, or privilege."
request = chat_prompt.format_prompt(legal_text=some_example_text).to_messages()
[No output generated]
result = llm_ollama.invoke(request)
[No output generated]
result
AIMessage(content="A waiver of any part of this agreement doesn't mean that the same thing can be waived later or that all other parts are waived too. If one party doesn't use their rights, it doesn't mean they give up those rights permanently or can't use them in the future.", additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 65, 'prompt_tokens': 499, 'total_tokens': 564, 'completion_tokens_details': None, 'prompt_tokens_details': None}, 'model_provider': 'openai', 'model_name': 'hf.co/NousResearch/Nous-Hermes-2-Mistral-7B-DPO-GGUF:Q4_K_M', 'system_fingerprint': 'fp_ollama', 'id': 'chatcmpl-87', 'finish_reason': 'stop', 'logprobs': None}, id='lc_run--019b66aa-155d-7603-b118-7600620aaa82-0', usage_metadata={'input_tokens': 499, 'output_tokens': 65, 'total_tokens': 564, 'input_token_details': {}, 'output_token_details': {}}) That clearly didn't do the trick.
Pydantic¶
Pydantic is a Python library for defining data models with validation. With LangChain, it allows you to:
- Define the structure you expect from the model
- Automatically parse the raw LLM output
- Catch errors if fields are missing or malformed
Example
- Define a Pydantic model
from pydantic import BaseModel
from typing import List
class ClauseSummary(BaseModel):
parties: List[str]
start_date: str
penalty_clause: str
This defines the structure we want the LLM to return — a JSON object with:
- A list of
parties - A
start_date - A
penalty_clausestring
- Set up a parser using LangChain
from langchain_core.output_parsers import PydanticOutputParser
parser = PydanticOutputParser(pydantic_object=ClauseSummary)
This parser will take a raw string (from the LLM) and try to convert it into a ClauseSummary object.
- Include the schema in the system prompt
format_instructions = parser.get_format_instructions()
prompt = PromptTemplate(
input_variables=["clause", "format_instructions"],
template="""
Extract the following fields from the contract clause below and return them in **valid JSON format ONLY**, with no extra text or explanation.
Clause:
{clause}
{format_instructions}
"""
)
The
format_instructionstells the LLM exactly what JSON structure to return, based on your Pydantic model.
- Run the LLM and parse the output
response = llm.invoke(prompt)
try:
parsed = parser.parse(response.content)
print(parsed.dict())
except Exception as e:
print("Could not parse output.")
print("Raw response:", response.content)
print(e)
If the model returns a correctly structured JSON string, you now get a real Python object with attributes you can use:
parsed.parties
parsed.start_date
parsed.penalty_clause
With this model, you can ensure the LLM responds in a way that fits your expected format — or fail gracefully when it doesn't.
Let's try out this example:
# Define output structure
class ClauseSummary(BaseModel):
parties: List[str]
start_date: str
penalty_clause: str
[No output generated]
# Set up parser
parser = PydanticOutputParser(pydantic_object=ClauseSummary)
format_instructions = parser.get_format_instructions()
[No output generated]
# Create prompt with correct input variables
prompt = PromptTemplate(
input_variables=["clause", "format_instructions"],
template="""
Extract the following fields from the contract clause below and return them in **valid JSON format ONLY**, with no extra text or explanation.
Clause:
{clause}
{format_instructions}
"""
)
[No output generated]
# Clause to parse
clause_text = (
"The agreement between Acme Corp and Beta LLC begins on January 1, 2025. "
"If either party breaks the agreement, a €5,000 penalty applies."
)
[No output generated]
# Format the full prompt
full_prompt = prompt.format(clause=clause_text, format_instructions=format_instructions)
# Run the model
response = llm_ollama.invoke(full_prompt)
[No output generated]
# Parse the response
try:
parsed = parser.parse(response.content)
print(parsed.model_dump())
except Exception as e:
print("Could not parse output.")
print("Raw response:", response)
print(e)
{'parties': ['Acme Corp', 'Beta LLC'], 'start_date': 'January 1, 2025', 'penalty_clause': '€5,000'}
print(parsed.parties)
print(parsed.start_date)
print(parsed.penalty_clause)
['Acme Corp', 'Beta LLC'] January 1, 2025 €5,000
Let's try that on the example which tried to simplify legal clauses and output them in JSON format.
# Define output schema with Pydantic
class LegalSimplification(BaseModel):
translation: str
parser = PydanticOutputParser(pydantic_object=LegalSimplification)
[No output generated]
# Define the system prompt
format_instructions = parser.get_format_instructions()
system_message = SystemMessage(content=f"""You are a helpful assistant that translates complex legal terms into plain and understandable language.
Respond only in this format: {format_instructions}
Do not ask for clarification. Always use the given legal input.""")
[No output generated]
# Define few-shot examples
examples = [
{
"input": "Notwithstanding any provision to the contrary herein, the indemnitor agrees to indemnify, defend, and hold harmless the indemnitee...",
"output": "One party agrees to cover any costs, claims, or damages that happen because of their actions..."
},
{
"input": "This agreement shall be binding upon and inure to the benefit of the parties...",
"output": "This agreement applies to both parties and their future representatives..."
}
]
few_shot_messages = []
for ex in examples:
few_shot_messages.append(HumanMessage(content=ex["input"]))
few_shot_messages.append(AIMessage(content=f'{{"translation": "{ex["output"]}"}}'))
[No output generated]
# Define the legal input text
legal_text = (
"Any waiver of any term or condition of this agreement shall not be deemed a continuing waiver of such term "
"or condition, nor shall it be considered a waiver of any other term or condition hereof. No failure or delay "
"by either party in exercising any right, power, or privilege under this agreement shall operate as a waiver "
"thereof, nor shall any single or partial exercise preclude any other or further exercise thereof or the "
"exercise of any other right, power, or privilege."
)
user_message = HumanMessage(content=legal_text)
[No output generated]
# Build full message list
messages = [system_message] + few_shot_messages + [user_message]
# Sanity check the prompt
print("\n\n===== Prompt Sent to Model =====")
for m in messages:
print(f"{m.type.upper()}: {m.content}\n")
===== Prompt Sent to Model =====
SYSTEM: You are a helpful assistant that translates complex legal terms into plain and understandable language.
Respond only in this format: The output should be formatted as a JSON instance that conforms to the JSON schema below.
As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.
Here is the output schema:
```
{"properties": {"translation": {"title": "Translation", "type": "string"}}, "required": ["translation"]}
```
Do not ask for clarification. Always use the given legal input.
HUMAN: Notwithstanding any provision to the contrary herein, the indemnitor agrees to indemnify, defend, and hold harmless the indemnitee...
AI: {"translation": "One party agrees to cover any costs, claims, or damages that happen because of their actions..."}
HUMAN: This agreement shall be binding upon and inure to the benefit of the parties...
AI: {"translation": "This agreement applies to both parties and their future representatives..."}
HUMAN: Any waiver of any term or condition of this agreement shall not be deemed a continuing waiver of such term or condition, nor shall it be considered a waiver of any other term or condition hereof. No failure or delay by either party in exercising any right, power, or privilege under this agreement shall operate as a waiver thereof, nor shall any single or partial exercise preclude any other or further exercise thereof or the exercise of any other right, power, or privilege.
# Create HF text-generation pipeline with lower temperature for parsing accuracy
# (temperature=0.6 here vs 0.7 in initial setup for more deterministic JSON output)
hf_pipe = pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
max_new_tokens=512,
do_sample=True,
temperature=0.6,
top_p=0.9,
return_full_text=False,
eos_token_id=tokenizer.eos_token_id,
skip_special_tokens=True,
)
Device set to use cuda:0
# Wrap in LangChain-compatible Chat LLM
wrapped_llm = HuggingFacePipeline(pipeline=hf_pipe)
llm_chat = ChatHuggingFace(llm=wrapped_llm)
[No output generated]
# Generate model output
raw_output = llm_chat.invoke(messages)
[No output generated]
print(raw_output.content)
{"translation": "Changing or ignoring any part of this agreement doesn't mean we ignore other parts or future changes. Delaying using any rights, powers, or privileges doesn't mean we can't use them later or other rights."}
# Extract valid JSON from output
def extract_first_json(text):
match = re.search(r'\{.*?\}', text, re.DOTALL)
return match.group(0) if match else text.strip()
try:
output_text = raw_output.content if isinstance(raw_output, AIMessage) else raw_output
clean_output = extract_first_json(output_text)
result = parser.parse(clean_output)
print("\nSimplified translation:")
print(result.translation)
# Combine original and simplified
entry = {
"legal_text": legal_text,
"translation": result.translation
}
# Load existing data if file exists
data = []
output_file = "simplified_output.json"
if os.path.exists(output_file):
with open(output_file, "r") as f:
try:
data = json.load(f)
if not isinstance(data, list):
print("Warning: existing file is not a list. Overwriting.")
data = []
except json.JSONDecodeError:
data = []
# Append new entry
data.append(entry)
# Write back to file
with open(output_file, "w") as f:
json.dump(data, f, indent=2)
print(f"\nAppended to {output_file}")
except Exception as e:
print("\nCould not parse output.")
print("Raw output:", raw_output)
print(e)
Simplified translation: Changing or ignoring any part of this agreement doesn't mean we ignore other parts or future changes. Delaying using any rights, powers, or privileges doesn't mean we can't use them later or other rights. Appended to simplified_output.json
# === Unload Ollama Model & Shutdown Kernel ===
# Unloads the model from GPU memory before shutting down
try:
import ollama
print(f"Unloading Ollama model: {OLLAMA_LLM_MODEL}")
ollama.generate(model=OLLAMA_LLM_MODEL, prompt="", keep_alive=0)
print("Model unloaded from GPU memory")
except Exception as e:
print(f"Model unload skipped: {e}")
# Shut down the kernel to fully release resources
import IPython
app = IPython.Application.instance()
app.kernel.do_shutdown(restart=False)
Unloading Ollama model: hf.co/NousResearch/Nous-Hermes-2-Mistral-7B-DPO-GGUF:Q4_K_M Model unloaded from GPU memory
{'status': 'ok', 'restart': False}