Do LLMs Dream of Prompted Sheep?

Posted Apr 21, 2025

By Erika 3 min read

Do LLMs Dream of Prompted Sheep?

A Basic Example

The Hello World version of this is to make a basic text summarizer

We will be using Phi-2 as an example. Phi-2 is a small language model from Microsoft and hosted on HuggingFace

  
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Load model and tokenizer
model_id = "microsoft/phi-2"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float32)
model = model.to("cpu")  # MPS is optional if on Apple Silicon

# Define summarizer
def summarize(text):
    prompt = f"Summarize this:\n{text}\n\nSummary:"
    inputs = tokenizer(prompt, return_tensors="pt").to("cpu")
    outputs = model.generate(**inputs, max_new_tokens=100)
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

# Example Text
input_text = """
The Eiffel Tower is one of the most recognizable landmarks in the world. Built for the 1889 World's Fair in Paris,
it stands at 324 meters tall and was originally meant to be dismantled after 20 years. However, it became a symbol
of French innovation and remains a popular tourist attraction today.
"""

print(summarize(input_text))

0. Import Libraries

  
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

transformers: open-source Python library made by HuggingFace. Lets you easily download, use, and train AI models such as Phi, Gemma 2B, GPT-2
transformers.AutoTokenizer: convert human-readable text into tokens (numbers) that a machine can understand. Also converts the machine’s output back into readable text.
transformers.AutoModelForCausalLM: Loads a causal language model. This will be the core of the system that generates tokens/words based on our input

“…two types of language modeling, causal and masked. Causal language models are frequently used for text generation, example being creative applications like choosing your own text adventure.”

https://huggingface.co/docs/transformers/en//tasks/language_modeling

As opposed to masked:

“Masked language models (MLM) are a type of large language model (LLM) used to help predict missing words from text in natural language processing (NLP) tasks. Masked language modeling aids many tasks—from sentiment analysis to text generation—by training a model to understand the contextual relationship between words.”

https://www.ibm.com/think/topics/masked-language-model

A Note on Torch

torch: PyTorch is an open-source machine learning framework. You’ll need this for running models locally in memory, as it is for deep learning frameworks for the infrastructure under the hood. What it actually does is handle the tensor operations that power models like Phi-2. Keep in mind that tensor operations are the gasp math underneath it all, aka matrices that eventually generate text

  
 import torch

# Create two tensors
a = torch.tensor([1, 2, 3])
b = torch.tensor([4, 5, 6])

# Add them
c = a + b  # tensor([5, 7, 9])

print(c)

You may encounter cases where you need to run PyTorch on a GPU instead of just the CPU, especially for larger models or faster performance.

1. Load Up Model

This was content that ChatGPT output when I asked it for a basic text summarizer

  
model_id = "microsoft/phi-2"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float32)
model = model.to("cpu")  # MPS is optional if on Apple Silicon

model_id is pretty self explanatory, although you should note that you can pick and choose different models on Microsoft’s HuggingFace page. https://huggingface.co/microsoft
AutoTokenizer.from_pretrained(model_id): Initialize a tokenizer for the model, which converts input texts into tokens aka numbers that the model can process. AutoTokenizer is a way to automatically load the correct tokenizer for a given model.
AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float32): Loads the language model based on the model_id, off of Hugging Face’s model hub. Torch_dtype=torch.float32 specifies the data type of the model weights, which determines the precision used for calculations (32-bit floating point in this case). It allows you to load the model easily without manually configuring model architecture.
model.to(“cpu”) moves the model to the specified hardware. In this case, the device is “cpu”, meaning the model will run on the central processing unit of your machine. If you were using a machine with a GPU (or Apple Silicon with MPS), you could move the model to those devices for faster processing.

Machine Learning, LLMs, AI

This post is licensed under CC BY 4.0 by the author.

Do LLMs Dream of Prompted Sheep?

A Basic Example

0. Import Libraries

A Note on Torch

1. Load Up Model

Trending Tags