Guidance: a Pythonic way to steer and constrain language model output

GitHub RepoJuly 5, 2026 at 02:43 AMImpressions3

Project Description

Guidance: Steering Language Models Without the Black Magic

If you've spent any time working with GPT or similar models, you know the pain: you ask for JSON, you get a rambling paragraph. You ask for a specific format, you get creative interpretations. Prompt engineering helps, but it's more art than science.

Guidance is a Python library that changes this. Instead of wrestling with prompts and hoping for the best, you can now program the output. It's like giving your language model a GPS instead of shouting directions from the passenger seat.

What It Does

Guidance lets you define output structures using Pythonic syntax. You can force the model to output specific fields, follow templates, and even pause generation to insert your own logic. Think of it as templates for language models, but with full control.

Example from their repo:

import guidance
from guidance import models, gen, select, zero_or_more, one_or_more

# Load a model
model = models.Transformers("gpt2")

# Define a structured generation
result = model + "Summarize the following: " + "Quick brown fox. " + gen("summary", max_tokens=50)

But it gets better. You can combine generation with selection, repetition, and even call Python functions mid-generation.

Why It's Cool

Most libraries treat the model like a black box you chat with. Guidance treats it like a function you call with constraints. Here's what stands out:

Deterministic output when you need it: Want a JSON object with exactly 3 keys? Set it once, not hope-prompt-repeat.
Token-steering with regex: You can say "output only digits" or "must be one of these options." No more re-prompting because the model decided to be poetic.
Hybrid generation: Generate text, then run Python logic, then generate more. You can conditionally branch, loop, or hit an API mid-stream.
Model agnostic: Works with OpenAI, Transformers, LlamaCPP, and more. Switch models without rewriting your generation logic.

The killer feature is token-level constraint. Instead of generating a token and checking if it's valid (then punishing the model), Guidance forces the model to only consider valid next tokens. This is huge for reliability.

How to Try It

The simplest way:

pip install guidance

Then open a Python session:

import guidance

# Load a model (works with OpenAI too)
model = guidance.models.OpenAI("gpt-3.5-turbo")

# Structured prompt with constraints
result = model + "Answer with 1-5: " + gen("rating", regex="[1-5]")
print(result["rating"])

You can also jump to their Quick Start guide for more examples. They have a colab notebook linked in the repo if you want to test without installing anything.

Final Thoughts

Guidance is one of those tools that makes you wonder why nobody did it sooner. It bridges the gap between prompt engineering and actual software engineering. If you're building anything more complex than a chatbot (think data extraction, form filling, or structured code generation), this will save you hours of debugging "Why did the model add that extra sentence?"

It's not perfect - token-level constraints can slow down generation, and the syntax takes a minute to get used to. But for reliability, it's a game changer.

Give it a spin. Your future self (and your production code) will thank you.

Follow us at @githubprojects

Contributors

@githubprojects

2

Total PostsPosts

1

ContributorsUsers

July 5

CreatedDate

Back to Projects

Project ID: be450f9a-78a4-4bf8-87e6-994c36c87811Last updated: July 5, 2026 at 02:43 AM