Prompts built like
software, not strings.
Typed blocks. Priority-based assembly. Reasoning strategies. Version control. Testing. Debugging. Token-precise context control. The engineering rigor your prompts deserve.
from promptise.prompts import prompt
from promptise.prompts import PromptBlocks as PB
from promptise.prompts import chain_of_thought, self_critique
@prompt(model="openai:gpt-5-mini",
strategy=chain_of_thought + self_critique)
async def analyze(text: str) -> Analysis:
"""Analyze with composed reasoning strategies."""
return PB.identity("Senior analyst") \
+ PB.rules(["Cite sources", "No PII"]) \
+ PB.output_format(Analysis) \
+ PB.section("Input", text)What is Prompt Engineering in Promptise?
Prompts are code.
Not strings you paste into a chat box.
In most frameworks, your system prompt is a raw string — untestable, unversioned, invisible at runtime. Promptise treats prompts as software components: typed blocks with priority-based assembly, reasoning strategies you compose like functions, and a debugger that traces every decision.
Without Promptise
One giant string. No structure. No tests. When the agent makes a bad decision, you stare at 200 lines and guess which part caused it. Version control means copy-pasting into a new file.
With PromptBlocks
Snap typed blocks together — Identity, Rules, Examples, OutputFormat. Each has a priority. When the token budget is tight, the lowest-priority blocks drop first. Test each block independently with pytest.
What you get
8 block types, conversation flows, 5 composable strategies, 4 perspectives, 11 context providers, 5 guard types, version registry with rollback, PromptInspector debugger, and a full pytest testing framework.
Prompt = blocks + strategy + guards + context + version + tests
"In most frameworks, prompts are strings — hardcoded, untestable, invisible at runtime, impossible to debug. When your agent makes a bad decision, you stare at a 200-line string and guess which part caused it."
PROMPT ENGINEERING ARCHITECTURE
Eight capabilities that make prompts software.
Prompts have types, priorities, versions, tests, and debugging tools. They compose from independent blocks. They adapt at runtime. They are inspectable, traceable, and reproducible.
Typed Blocks
Eight block types with priorities. Identity, rules, output format, context slots, sections, examples, conditionals, composites.
Conversation Flows
Phases with active blocks and lifecycle hooks. Your agent's voice changes across greeting, diagnosis, solution, follow-up.
Reasoning Strategies
Chain of thought, structured reasoning, self-critique, plan and execute, decompose. Compose with + operator.
Context Providers
Eleven built-in providers inject user, tools, memories, tasks, conversation, errors, output schema — before every LLM call.
Output Guards
Content filters, length guards, schema validation, custom validators. Enforce output quality before and after generation.
Prompt Debugging
Trace the assembly pipeline. See which blocks included, excluded, why. Token counts per block. Context provider timing.
Version Control
SemVer registry, YAML prompt files, rollback support, duplicate detection. Prompts diffable in PRs.
Context Engine
Token-precise context control. Priority layers, exact token counting, graceful trimming. Never overflow.
Prompts that assemble themselves from typed blocks.
System prompts are built from eight block types, not concatenated strings. Each block has a priority level. When the context window is tight, the assembler drops lowest-priority blocks first. Your agent degrades gracefully — dropping supplementary content while preserving its core identity and constraints.
Who the agent is — role, expertise, personality. Always highest priority. Always the last thing dropped.
Hard constraints — never share PII, always cite sources, respond in JSON. Non-negotiable.
Expected response structure. JSON schema, markdown format, specific fields.
Dynamic data at runtime — current user info, retrieved data, tool results.
Domain-specific instructions. Custom content for anything that doesn't fit other types.
Few-shot demonstrations. First to drop when context is tight.
Blocks that appear or disappear based on runtime predicates.
Group related content into reusable block collections.
Priority order: Examples go before sections. Sections before output format. Identity and rules are the last to go. Your agent never truncates mid-sentence.
Conversations that change voice across phases.
A customer support conversation has phases — greeting, information gathering, diagnosis, solution delivery, follow-up. Each phase needs different context, different rules, different tone. A static prompt cannot do this.
A conversation flow defines phases with active blocks and lifecycle hooks. Phase transitions happen automatically based on the conversation state. Your agent's voice in the opening message is different from its voice during analysis — not because you wrote two prompts, but because the flow activates different blocks at different times.
Greeting
Information Gathering
Diagnosis
Solution
Follow-up
Reasoning strategies you compose like building blocks.
Five reasoning strategies compose with the + operator. Four perspectives (Analyst, Critic, Advisor, Creative) are orthogonal — any perspective pairs with any strategy. Five chaining operators (chain, parallel, branch, retry, fallback) connect prompts into pipelines.
Chain of Thought
Step-by-step reasoning before the final answer
Structured
Numbered steps with explicit premises and conclusion
Self-Critique
Generate an answer, critique it, then revise
Plan & Execute
Design the approach first, then execute
Decompose
Break complex questions into subquestions
# Chain of thought + self-critique
@prompt(strategy=chain_of_thought + self_critique)
async def analyze(data: str) -> str:
# Reasons step-by-step
# THEN critiques its reasoning
# Before producing final output
...# Decompose + plan and execute
@prompt(strategy=decompose + plan_and_execute)
async def solve(problem: str) -> Solution:
# Breaks problem into parts
# Plans execution order for each
# Solves systematically
...Context that is always current, always relevant.
Before every LLM call, context providers inject up-to-date information automatically. All providers are async-aware. All inject before every call. Build a custom provider with a single function — anything your agent needs to know, injected automatically.
from promptise.context import ContextProvider
class WeatherContext(ContextProvider):
"""Inject current weather for location-aware agents."""
async def get_context(self, state: AgentState) -> str:
location = state.user.location
weather = await self.weather_api.get(location)
return f"Current weather: {weather.summary}"
# Register globally or per-agent
agent.add_context_provider(WeatherContext())Guards that enforce output quality.
Five guard types enforce quality before and after generation. ContentFilterGuard blocks/requires words. LengthGuard enforces size limits. SchemaStrictGuard validates JSON schema with automatic retry. InputValidatorGuard runs custom checks before the LLM call. OutputValidatorGuard runs custom checks after.
ContentFilterGuard
Block specific words (competitor names), require specific phrases (disclaimers). Applied to both input and output.
LengthGuard
Enforce min/max character limits. No single-word answers, no unbounded essays.
SchemaStrictGuard
Output must conform to JSON schema. Auto-retry up to N times until it validates. Structured output guaranteed.
InputValidatorGuard
Custom callable on input BEFORE the LLM call. Reject bad queries early.
OutputValidatorGuard
Custom callable on output AFTER the LLM call. Business rules, compliance, domain checks.
@prompt(
guards=[
ContentFilter(block=["competitor_name"], require=["disclaimer"]),
LengthGuard(min_chars=100, max_chars=2000),
SchemaValidator(schema=ResponseSchema, retries=3),
CustomValidator(check_compliance, stage="post")
]
)
async def respond(query: str) -> Response:
...Debug any prompt decision in seconds.
When your agent makes an unexpected decision, the prompt inspector traces the entire assembly pipeline step-by-step. This is prompt debugging, not prompt guessing.
Assembled prompt (2,847 tokens): ├─ Identity [P10] ✓ 127 tokens ├─ Rules [P9] ✓ 89 tokens ├─ OutputFormat [P8] ✓ 156 tokens ├─ ContextSlot:user [P7] ✓ 234 tokens │ └─ UserContext: 12ms ├─ ContextSlot:memory [P7] ✓ 891 tokens │ └─ MemoryContext: 45ms (3 results) ├─ Section:task [P6] ✓ 312 tokens ├─ Conditional:premium [P5] ✗ predicate=false └─ Examples [P4] ✗ budget exceeded Strategy: chain_of_thought + self_critique └─ Added 247 tokens for reasoning frame Guards: ├─ ContentFilter: passed ├─ LengthGuard: passed (1,247 chars) └─ SchemaValidator: passed (attempt 1/3)
Version control and test your prompts.
Semantic versioning with a global registry. Tag prompt versions with SemVer. Retrieve by name and version. Rollback when new versions underperform. Define prompts in .prompt YAML files — portable, diffable, reviewable in pull requests.
# prompts/analyst.prompt.yaml
version: "2.1.0"
identity: "Senior data analyst with 10 years experience"
rules:
- "Always cite data sources"
- "Quantify claims with numbers"
- "Never speculate without evidence"
output_format: "AnalysisReport"
strategy: "chain_of_thought + self_critique"
guards:
- type: "schema"
retries: 3Test prompts with real assertions.
Mock the LLM for fast, deterministic tests. Inject fake context data. Assert output conforms to schema. Assert required phrases present. Assert latency within bounds. Promptise testing utilities integrate with pytest. Prompts are no longer the untested part of your application.
@pytest.mark.asyncio
async def test_analyst_prompt():
with mock_llm(returns=MOCK_ANALYSIS):
result = await analyze("Q3 revenue data")
assert_schema(result, AnalysisReport)
assert_contains(result, "source:")
assert_length(result, min=500, max=2000)
assert_latency(result, max_ms=100)
assert_guards_passed(result)Token-precise context control.
The Context Engine manages exactly what the LLM sees — with exact token counting, not estimation. Register context layers by priority. Set a token budget. The engine assembles all layers, counts tokens precisely using tiktoken or a configurable tokenizer, and trims lowest-priority content first.
Conversation history is trimmed by removing oldest user/assistant pairs — never splitting a pair mid-conversation. Other layers use binary search with the actual tokenizer for exact cutoff points. Required layers are never touched. Your agent never overflows. It never crashes from context too long.
from promptise.context import ContextEngine
engine = ContextEngine(
token_budget=8192,
response_reserve=1024, # Leave room for output
tokenizer="tiktoken:gpt-4"
)
# Register layers by priority (higher = keep longer)
engine.add_layer("identity", priority=10, required=True)
engine.add_layer("rules", priority=9, required=True)
engine.add_layer("current_task", priority=8)
engine.add_layer("user_context", priority=7)
engine.add_layer("budget_remaining", priority=5)
engine.add_layer("conversation_history", priority=1)
# Assemble with automatic trimming
context = await engine.assemble(state)
# Result: 7,168 tokens, oldest messages trimmedDecision guide
When to use
each level of the system.
You don't have to adopt everything at once. Start with raw strings. Add blocks when your prompt gets complex. Add flows when conversations have phases. Add guards when you ship to production.
Raw string
0 linesWhen: Simple one-off scripts, prototypes
A plain string passed to build_agent(instructions="...")
Typed blocks
5-10 linesWhen: Prompt has 3+ concerns (identity, rules, format)
Identity, Rules, OutputFormat blocks with priorities. When token budget is tight, lowest priority drops first — no crash, graceful degradation.
ConversationFlow
15-25 linesWhen: Chat has distinct phases (greeting → diagnosis → solution)
Each phase activates different blocks. The agent's personality, rules, and examples change as the conversation progresses.
Guards + Testing
10-20 linesWhen: Production deployment, compliance requirements
ContentFilter blocks bad words, LengthGuard caps output, SchemaStrictGuard validates JSON. mock_llm() tests the full pipeline deterministically.
Full stack
50+ linesWhen: Enterprise, regulated industries, high-stakes
Blocks + Flow + Strategies + Guards + Context Providers + Inspector + Version Control + YAML loader. Every prompt is versioned, tested, debuggable, and auditable.
Debugging story
Your agent made a
bad decision. Why?
Without PromptBlocks, you stare at a 200-line string and guess what went wrong. With PromptInspector, you see exactly which blocks were included, which were dropped (and why), what context providers returned, and which guards passed or failed.
Agent gave wrong financial advice
Customer reported incorrect tax calculation
Open PromptInspector trace
See: ContextProvider "tax_rules" timed out (3.2s > 2s budget)
Rules block dropped due to token budget
Priority 9 block dropped because conversation history (priority 1) consumed too many tokens
Root cause: wrong priority assignment
Fix: set tax_rules to priority 10 (required=True). Caught in 30 seconds instead of 3 hours.
PromptInspector output
Block Assembly Trace:
─────────────────────────
✓ Identity "financial_advisor"
priority=10 tokens=45 INCLUDED
✓ OutputFormat "json_response"
priority=8 tokens=30 INCLUDED
✗ Rules "tax_rules"
priority=9 tokens=120 DROPPED
reason: context_provider timeout (3.2s)
✓ Examples "tax_examples"
priority=4 tokens=85 INCLUDED
✗ ContextSlot "conversation_history"
priority=1 tokens=2400 TRIMMED→800
Context Providers:
✓ UserContextProvider 12ms ok
✓ TaskContextProvider 3ms ok
✗ TaxRulesProvider 3200ms TIMEOUT
Guards:
✓ ContentFilterGuard passed
✗ SchemaStrictGuard FAILED
missing: "tax_rate" field
Total: 960 tokens / 8192 budget
Time: 3.4s (3.2s in TaxRulesProvider)Comparison
What you get that
raw strings don't.
| Capability | Promptise | Raw strings | LangChain |
|---|---|---|---|
| Priority-based token budgeting | — | — | |
| Graceful block degradation | — | — | |
| Conversation phase management | — | — | |
| Composable strategies (CoT + Critique) | — | — | |
| 11 auto-injecting context providers | — | — | |
| Output guards with auto-retry | — | — | |
| Assembly trace / debugger | — | — | |
| Semantic version control | — | — | |
| YAML prompt loader | — | — | |
| pytest integration | — | — | |
| Template variables | |||
| Chat message formatting |