Prompt & Context Engineering

Prompts built like
software, not strings.

Typed blocks. Priority-based assembly. Reasoning strategies. Version control. Testing. Debugging. Token-precise context control. The engineering rigor your prompts deserve.

prompt.py

from promptise.prompts import prompt
from promptise.prompts import PromptBlocks as PB
from promptise.prompts import chain_of_thought, self_critique

@prompt(model="openai:gpt-5-mini",
        strategy=chain_of_thought + self_critique)
async def analyze(text: str) -> Analysis:
    """Analyze with composed reasoning strategies."""
    return PB.identity("Senior analyst") \
         + PB.rules(["Cite sources", "No PII"]) \
         + PB.output_format(Analysis) \
         + PB.section("Input", text)

What is Prompt Engineering in Promptise?

Prompts are code.
Not strings you paste into a chat box.

In most frameworks, your system prompt is a raw string — untestable, unversioned, invisible at runtime. Promptise treats prompts as software components: typed blocks with priority-based assembly, reasoning strategies you compose like functions, and a debugger that traces every decision.

Without Promptise

One giant string. No structure. No tests. When the agent makes a bad decision, you stare at 200 lines and guess which part caused it. Version control means copy-pasting into a new file.

▣

With PromptBlocks

Snap typed blocks together — Identity, Rules, Examples, OutputFormat. Each has a priority. When the token budget is tight, the lowest-priority blocks drop first. Test each block independently with pytest.

⟡

What you get

8 block types, conversation flows, 5 composable strategies, 4 perspectives, 11 context providers, 5 guard types, version registry with rollback, PromptInspector debugger, and a full pytest testing framework.

Prompt = blocks + strategy + guards + context + version + tests

"In most frameworks, prompts are strings — hardcoded, untestable, invisible at runtime, impossible to debug. When your agent makes a bad decision, you stare at a 200-line string and guess which part caused it."

PROMPT ENGINEERING ARCHITECTURE

Eight capabilities that make prompts software.

Prompts have types, priorities, versions, tests, and debugging tools. They compose from independent blocks. They adapt at runtime. They are inspectable, traceable, and reproducible.

Typed Blocks

Eight block types with priorities. Identity, rules, output format, context slots, sections, examples, conditionals, composites.

Conversation Flows

Phases with active blocks and lifecycle hooks. Your agent's voice changes across greeting, diagnosis, solution, follow-up.

Reasoning Strategies

Chain of thought, structured reasoning, self-critique, plan and execute, decompose. Compose with + operator.

Context Providers

Eleven built-in providers inject user, tools, memories, tasks, conversation, errors, output schema — before every LLM call.

Output Guards

Content filters, length guards, schema validation, custom validators. Enforce output quality before and after generation.

Prompt Debugging

Trace the assembly pipeline. See which blocks included, excluded, why. Token counts per block. Context provider timing.

Version Control

SemVer registry, YAML prompt files, rollback support, duplicate detection. Prompts diffable in PRs.

Context Engine

Token-precise context control. Priority layers, exact token counting, graceful trimming. Never overflow.

TYPED BLOCKS

Prompts that assemble themselves from typed blocks.

System prompts are built from eight block types, not concatenated strings. Each block has a priority level. When the context window is tight, the assembler drops lowest-priority blocks first. Your agent degrades gracefully — dropping supplementary content while preserving its core identity and constraints.

P10

Identity

Who the agent is — role, expertise, personality. Always highest priority. Always the last thing dropped.

Rules

Hard constraints — never share PII, always cite sources, respond in JSON. Non-negotiable.

Output Format

Expected response structure. JSON schema, markdown format, specific fields.

Pvar

Context Slots

Dynamic data at runtime — current user info, retrieved data, tool results.

Pvar

Sections

Domain-specific instructions. Custom content for anything that doesn't fit other types.

Examples

Few-shot demonstrations. First to drop when context is tight.

Pvar

Conditional

Blocks that appear or disappear based on runtime predicates.

Pvar

Composite

Group related content into reusable block collections.

Graceful degradation

Token budget:

4,096

✓ Identity [P10] — 127 tokens

✓ Rules [P9] — 89 tokens

✓ Output Format [P8] — 156 tokens

✓ Context [P7] — 891 tokens

✓ Task Section [P6] — 234 tokens

✗ Examples [P4] — dropped (budget)

Priority order: Examples go before sections. Sections before output format. Identity and rules are the last to go. Your agent never truncates mid-sentence.

CONVERSATION FLOWS

Conversations that change voice across phases.

A customer support conversation has phases — greeting, information gathering, diagnosis, solution delivery, follow-up. Each phase needs different context, different rules, different tone. A static prompt cannot do this.

A conversation flow defines phases with active blocks and lifecycle hooks. Phase transitions happen automatically based on the conversation state. Your agent's voice in the opening message is different from its voice during analysis — not because you wrote two prompts, but because the flow activates different blocks at different times.

Greeting

Friendly identityCompany policy rules

Information Gathering

Clarifying questionsData collection rules

Diagnosis

Technical knowledgeDeactivate greeting tone

Solution

Action-oriented identityEscalation rules

Follow-up

Satisfaction checkFeedback collection

REASONING STRATEGIES

Reasoning strategies you compose like building blocks.

Five reasoning strategies compose with the + operator. Four perspectives (Analyst, Critic, Advisor, Creative) are orthogonal — any perspective pairs with any strategy. Five chaining operators (chain, parallel, branch, retry, fallback) connect prompts into pipelines.

Chain of Thought

Step-by-step reasoning before the final answer

Structured

Numbered steps with explicit premises and conclusion

Self-Critique

Generate an answer, critique it, then revise

Plan & Execute

Design the approach first, then execute

Decompose

Break complex questions into subquestions

Strategy composition

# Chain of thought + self-critique
@prompt(strategy=chain_of_thought + self_critique)
async def analyze(data: str) -> str:
    # Reasons step-by-step
    # THEN critiques its reasoning
    # Before producing final output
    ...

Complex composition

# Decompose + plan and execute
@prompt(strategy=decompose + plan_and_execute)
async def solve(problem: str) -> Solution:
    # Breaks problem into parts
    # Plans execution order for each
    # Solves systematically
    ...

CONTEXT PROVIDERS

Context that is always current, always relevant.

Before every LLM call, context providers inject up-to-date information automatically. All providers are async-aware. All inject before every call. Build a custom provider with a single function — anything your agent needs to know, injected automatically.

User identityTool descriptionsRelevant memoriesTask objectivesConversation historySystem infoTeam agentsRecent errorsOutput schemaExternal factsKnowledge bases+ Custom

Custom context provider

from promptise.context import ContextProvider

class WeatherContext(ContextProvider):
    """Inject current weather for location-aware agents."""
    
    async def get_context(self, state: AgentState) -> str:
        location = state.user.location
        weather = await self.weather_api.get(location)
        return f"Current weather: {weather.summary}"

# Register globally or per-agent
agent.add_context_provider(WeatherContext())

OUTPUT GUARDS

Guards that enforce output quality.

Five guard types enforce quality before and after generation. ContentFilterGuard blocks/requires words. LengthGuard enforces size limits. SchemaStrictGuard validates JSON schema with automatic retry. InputValidatorGuard runs custom checks before the LLM call. OutputValidatorGuard runs custom checks after.

ContentFilterGuard

Block specific words (competitor names), require specific phrases (disclaimers). Applied to both input and output.

LengthGuard

Enforce min/max character limits. No single-word answers, no unbounded essays.

SchemaStrictGuard

Output must conform to JSON schema. Auto-retry up to N times until it validates. Structured output guaranteed.

InputValidatorGuard

Custom callable on input BEFORE the LLM call. Reject bad queries early.

OutputValidatorGuard

Custom callable on output AFTER the LLM call. Business rules, compliance, domain checks.

Guard composition

@prompt(
    guards=[
        ContentFilter(block=["competitor_name"], require=["disclaimer"]),
        LengthGuard(min_chars=100, max_chars=2000),
        SchemaValidator(schema=ResponseSchema, retries=3),
        CustomValidator(check_compliance, stage="post")
    ]
)
async def respond(query: str) -> Response:
    ...

PROMPT DEBUGGING

Debug any prompt decision in seconds.

When your agent makes an unexpected decision, the prompt inspector traces the entire assembly pipeline step-by-step. This is prompt debugging, not prompt guessing.

Which blocks were included, which excluded and why

Token budget overflow, conditional predicate false, priority below cutoff

Which context providers ran, timing for each, what they returned

Which guards passed and which failed

How many tokens each block consumed

PromptInspector trace

Assembled prompt (2,847 tokens):
├─ Identity [P10] ✓ 127 tokens
├─ Rules [P9] ✓ 89 tokens  
├─ OutputFormat [P8] ✓ 156 tokens
├─ ContextSlot:user [P7] ✓ 234 tokens
│   └─ UserContext: 12ms
├─ ContextSlot:memory [P7] ✓ 891 tokens
│   └─ MemoryContext: 45ms (3 results)
├─ Section:task [P6] ✓ 312 tokens
├─ Conditional:premium [P5] ✗ predicate=false
└─ Examples [P4] ✗ budget exceeded

Strategy: chain_of_thought + self_critique
  └─ Added 247 tokens for reasoning frame

Guards:
  ├─ ContentFilter: passed
  ├─ LengthGuard: passed (1,247 chars)
  └─ SchemaValidator: passed (attempt 1/3)

VERSION CONTROL

Version control and test your prompts.

Semantic versioning with a global registry. Tag prompt versions with SemVer. Retrieve by name and version. Rollback when new versions underperform. Define prompts in .prompt YAML files — portable, diffable, reviewable in pull requests.

# prompts/analyst.prompt.yaml
version: "2.1.0"
identity: "Senior data analyst with 10 years experience"
rules:
  - "Always cite data sources"
  - "Quantify claims with numbers"
  - "Never speculate without evidence"
output_format: "AnalysisReport"
strategy: "chain_of_thought + self_critique"
guards:
  - type: "schema"
    retries: 3

TESTING

Test prompts with real assertions.

Mock the LLM for fast, deterministic tests. Inject fake context data. Assert output conforms to schema. Assert required phrases present. Assert latency within bounds. Promptise testing utilities integrate with pytest. Prompts are no longer the untested part of your application.

@pytest.mark.asyncio
async def test_analyst_prompt():
    with mock_llm(returns=MOCK_ANALYSIS):
        result = await analyze("Q3 revenue data")
        
        assert_schema(result, AnalysisReport)
        assert_contains(result, "source:")
        assert_length(result, min=500, max=2000)
        assert_latency(result, max_ms=100)
        assert_guards_passed(result)

CONTEXT ENGINE

Token-precise context control.

The Context Engine manages exactly what the LLM sees — with exact token counting, not estimation. Register context layers by priority. Set a token budget. The engine assembles all layers, counts tokens precisely using tiktoken or a configurable tokenizer, and trims lowest-priority content first.

Conversation history is trimmed by removing oldest user/assistant pairs — never splitting a pair mid-conversation. Other layers use binary search with the actual tokenizer for exact cutoff points. Required layers are never touched. Your agent never overflows. It never crashes from context too long.

Context Engine configuration

from promptise.context import ContextEngine

engine = ContextEngine(
    token_budget=8192,
    response_reserve=1024,  # Leave room for output
    tokenizer="tiktoken:gpt-4"
)

# Register layers by priority (higher = keep longer)
engine.add_layer("identity", priority=10, required=True)
engine.add_layer("rules", priority=9, required=True)
engine.add_layer("current_task", priority=8)
engine.add_layer("user_context", priority=7)
engine.add_layer("budget_remaining", priority=5)
engine.add_layer("conversation_history", priority=1)

# Assemble with automatic trimming
context = await engine.assemble(state)
# Result: 7,168 tokens, oldest messages trimmed

Decision guide

When to use
each level of the system.

You don't have to adopt everything at once. Start with raw strings. Add blocks when your prompt gets complex. Add flows when conversations have phases. Add guards when you ship to production.

Level 0

Raw string

0 lines

When: Simple one-off scripts, prototypes

A plain string passed to build_agent(instructions="...")

Level 1

Typed blocks

5-10 lines

When: Prompt has 3+ concerns (identity, rules, format)

Identity, Rules, OutputFormat blocks with priorities. When token budget is tight, lowest priority drops first — no crash, graceful degradation.

Level 2

ConversationFlow

15-25 lines

When: Chat has distinct phases (greeting → diagnosis → solution)

Each phase activates different blocks. The agent's personality, rules, and examples change as the conversation progresses.

Level 3

Guards + Testing

10-20 lines

When: Production deployment, compliance requirements

ContentFilter blocks bad words, LengthGuard caps output, SchemaStrictGuard validates JSON. mock_llm() tests the full pipeline deterministically.

Level 4

Full stack

50+ lines

When: Enterprise, regulated industries, high-stakes

Blocks + Flow + Strategies + Guards + Context Providers + Inspector + Version Control + YAML loader. Every prompt is versioned, tested, debuggable, and auditable.

Debugging story

Your agent made a
bad decision. Why?

Without PromptBlocks, you stare at a 200-line string and guess what went wrong. With PromptInspector, you see exactly which blocks were included, which were dropped (and why), what context providers returned, and which guards passed or failed.

Agent gave wrong financial advice

Customer reported incorrect tax calculation

Open PromptInspector trace

See: ContextProvider "tax_rules" timed out (3.2s > 2s budget)

Rules block dropped due to token budget

Priority 9 block dropped because conversation history (priority 1) consumed too many tokens

Root cause: wrong priority assignment

Fix: set tax_rules to priority 10 (required=True). Caught in 30 seconds instead of 3 hours.

PromptInspector output

inspector trace

Block Assembly Trace:
─────────────────────────
✓ Identity "financial_advisor"
    priority=10  tokens=45  INCLUDED
✓ OutputFormat "json_response"
    priority=8   tokens=30  INCLUDED
✗ Rules "tax_rules"
    priority=9   tokens=120 DROPPED
    reason: context_provider timeout (3.2s)
✓ Examples "tax_examples"
    priority=4   tokens=85  INCLUDED
✗ ContextSlot "conversation_history"
    priority=1   tokens=2400 TRIMMED→800

Context Providers:
  ✓ UserContextProvider    12ms   ok
  ✓ TaskContextProvider    3ms    ok
  ✗ TaxRulesProvider       3200ms TIMEOUT

Guards:
  ✓ ContentFilterGuard     passed
  ✗ SchemaStrictGuard      FAILED
    missing: "tax_rate" field

Total: 960 tokens / 8192 budget
Time:  3.4s (3.2s in TaxRulesProvider)

Comparison

What you get that
raw strings don't.

Capability	Raw strings	LangChain
Priority-based token budgeting	—	—
Graceful block degradation	—	—
Conversation phase management	—	—
Composable strategies (CoT + Critique)	—	—
11 auto-injecting context providers	—	—
Output guards with auto-retry	—	—
Assembly trace / debugger	—	—
Semantic version control	—	—
YAML prompt loader	—	—
pytest integration	—	—
Template variables
Chat message formatting

Runtime

Back to

Foundry Overview

promptise

Prompt & Context Engineering

Prompts built like
software, not strings.

Typed blocks. Priority-based assembly. Reasoning strategies. Version control. Testing. Debugging. Token-precise context control. The engineering rigor your prompts deserve.

prompt.py

from promptise.prompts import prompt
from promptise.prompts import PromptBlocks as PB
from promptise.prompts import chain_of_thought, self_critique

@prompt(model="openai:gpt-5-mini",
        strategy=chain_of_thought + self_critique)
async def analyze(text: str) -> Analysis:
    """Analyze with composed reasoning strategies."""
    return PB.identity("Senior analyst") \
         + PB.rules(["Cite sources", "No PII"]) \
         + PB.output_format(Analysis) \
         + PB.section("Input", text)

What is Prompt Engineering in Promptise?

Prompts are code.
Not strings you paste into a chat box.

Without Promptise

One giant string. No structure. No tests. When the agent makes a bad decision, you stare at 200 lines and guess which part caused it. Version control means copy-pasting into a new file.

▣

With PromptBlocks

⟡

What you get

Prompt = blocks + strategy + guards + context + version + tests

"In most frameworks, prompts are strings — hardcoded, untestable, invisible at runtime, impossible to debug. When your agent makes a bad decision, you stare at a 200-line string and guess which part caused it."

PROMPT ENGINEERING ARCHITECTURE

Eight capabilities that make prompts software.

Prompts have types, priorities, versions, tests, and debugging tools. They compose from independent blocks. They adapt at runtime. They are inspectable, traceable, and reproducible.

Typed Blocks

Eight block types with priorities. Identity, rules, output format, context slots, sections, examples, conditionals, composites.

Conversation Flows

Phases with active blocks and lifecycle hooks. Your agent's voice changes across greeting, diagnosis, solution, follow-up.

Reasoning Strategies

Chain of thought, structured reasoning, self-critique, plan and execute, decompose. Compose with + operator.

Context Providers

Eleven built-in providers inject user, tools, memories, tasks, conversation, errors, output schema — before every LLM call.

Output Guards

Content filters, length guards, schema validation, custom validators. Enforce output quality before and after generation.

Prompt Debugging

Trace the assembly pipeline. See which blocks included, excluded, why. Token counts per block. Context provider timing.

Version Control

SemVer registry, YAML prompt files, rollback support, duplicate detection. Prompts diffable in PRs.

Context Engine

Token-precise context control. Priority layers, exact token counting, graceful trimming. Never overflow.

TYPED BLOCKS

Prompts that assemble themselves from typed blocks.

P10

Identity

Who the agent is — role, expertise, personality. Always highest priority. Always the last thing dropped.

Rules

Hard constraints — never share PII, always cite sources, respond in JSON. Non-negotiable.

Output Format

Expected response structure. JSON schema, markdown format, specific fields.

Pvar

Context Slots

Dynamic data at runtime — current user info, retrieved data, tool results.

Pvar

Sections

Domain-specific instructions. Custom content for anything that doesn't fit other types.

Examples

Few-shot demonstrations. First to drop when context is tight.

Pvar

Conditional

Blocks that appear or disappear based on runtime predicates.

Pvar

Composite

Group related content into reusable block collections.

Graceful degradation

Token budget:

4,096

✓ Identity [P10] — 127 tokens

✓ Rules [P9] — 89 tokens

✓ Output Format [P8] — 156 tokens

✓ Context [P7] — 891 tokens

✓ Task Section [P6] — 234 tokens

✗ Examples [P4] — dropped (budget)

Priority order: Examples go before sections. Sections before output format. Identity and rules are the last to go. Your agent never truncates mid-sentence.

CONVERSATION FLOWS

Conversations that change voice across phases.

Greeting

Friendly identityCompany policy rules

Information Gathering

Clarifying questionsData collection rules

Diagnosis

Technical knowledgeDeactivate greeting tone

Solution

Action-oriented identityEscalation rules

Follow-up

Satisfaction checkFeedback collection

REASONING STRATEGIES

Reasoning strategies you compose like building blocks.

Chain of Thought

Step-by-step reasoning before the final answer

Structured

Numbered steps with explicit premises and conclusion

Self-Critique

Generate an answer, critique it, then revise

Plan & Execute

Design the approach first, then execute

Decompose

Break complex questions into subquestions

Strategy composition

# Chain of thought + self-critique
@prompt(strategy=chain_of_thought + self_critique)
async def analyze(data: str) -> str:
    # Reasons step-by-step
    # THEN critiques its reasoning
    # Before producing final output
    ...

Complex composition

# Decompose + plan and execute
@prompt(strategy=decompose + plan_and_execute)
async def solve(problem: str) -> Solution:
    # Breaks problem into parts
    # Plans execution order for each
    # Solves systematically
    ...

CONTEXT PROVIDERS

Context that is always current, always relevant.

User identityTool descriptionsRelevant memoriesTask objectivesConversation historySystem infoTeam agentsRecent errorsOutput schemaExternal factsKnowledge bases+ Custom

Custom context provider

from promptise.context import ContextProvider

class WeatherContext(ContextProvider):
    """Inject current weather for location-aware agents."""
    
    async def get_context(self, state: AgentState) -> str:
        location = state.user.location
        weather = await self.weather_api.get(location)
        return f"Current weather: {weather.summary}"

# Register globally or per-agent
agent.add_context_provider(WeatherContext())

OUTPUT GUARDS

Guards that enforce output quality.

ContentFilterGuard

Block specific words (competitor names), require specific phrases (disclaimers). Applied to both input and output.

LengthGuard

Enforce min/max character limits. No single-word answers, no unbounded essays.

SchemaStrictGuard

Output must conform to JSON schema. Auto-retry up to N times until it validates. Structured output guaranteed.

InputValidatorGuard

Custom callable on input BEFORE the LLM call. Reject bad queries early.

OutputValidatorGuard

Custom callable on output AFTER the LLM call. Business rules, compliance, domain checks.

Guard composition

@prompt(
    guards=[
        ContentFilter(block=["competitor_name"], require=["disclaimer"]),
        LengthGuard(min_chars=100, max_chars=2000),
        SchemaValidator(schema=ResponseSchema, retries=3),
        CustomValidator(check_compliance, stage="post")
    ]
)
async def respond(query: str) -> Response:
    ...

PROMPT DEBUGGING

Debug any prompt decision in seconds.

When your agent makes an unexpected decision, the prompt inspector traces the entire assembly pipeline step-by-step. This is prompt debugging, not prompt guessing.

Which blocks were included, which excluded and why

Token budget overflow, conditional predicate false, priority below cutoff

Which context providers ran, timing for each, what they returned

Which guards passed and which failed

How many tokens each block consumed

PromptInspector trace

Assembled prompt (2,847 tokens):
├─ Identity [P10] ✓ 127 tokens
├─ Rules [P9] ✓ 89 tokens  
├─ OutputFormat [P8] ✓ 156 tokens
├─ ContextSlot:user [P7] ✓ 234 tokens
│   └─ UserContext: 12ms
├─ ContextSlot:memory [P7] ✓ 891 tokens
│   └─ MemoryContext: 45ms (3 results)
├─ Section:task [P6] ✓ 312 tokens
├─ Conditional:premium [P5] ✗ predicate=false
└─ Examples [P4] ✗ budget exceeded

Strategy: chain_of_thought + self_critique
  └─ Added 247 tokens for reasoning frame

Guards:
  ├─ ContentFilter: passed
  ├─ LengthGuard: passed (1,247 chars)
  └─ SchemaValidator: passed (attempt 1/3)

VERSION CONTROL

Version control and test your prompts.

# prompts/analyst.prompt.yaml
version: "2.1.0"
identity: "Senior data analyst with 10 years experience"
rules:
  - "Always cite data sources"
  - "Quantify claims with numbers"
  - "Never speculate without evidence"
output_format: "AnalysisReport"
strategy: "chain_of_thought + self_critique"
guards:
  - type: "schema"
    retries: 3

TESTING

Test prompts with real assertions.

@pytest.mark.asyncio
async def test_analyst_prompt():
    with mock_llm(returns=MOCK_ANALYSIS):
        result = await analyze("Q3 revenue data")
        
        assert_schema(result, AnalysisReport)
        assert_contains(result, "source:")
        assert_length(result, min=500, max=2000)
        assert_latency(result, max_ms=100)
        assert_guards_passed(result)

CONTEXT ENGINE

Token-precise context control.

Context Engine configuration

from promptise.context import ContextEngine

engine = ContextEngine(
    token_budget=8192,
    response_reserve=1024,  # Leave room for output
    tokenizer="tiktoken:gpt-4"
)

# Register layers by priority (higher = keep longer)
engine.add_layer("identity", priority=10, required=True)
engine.add_layer("rules", priority=9, required=True)
engine.add_layer("current_task", priority=8)
engine.add_layer("user_context", priority=7)
engine.add_layer("budget_remaining", priority=5)
engine.add_layer("conversation_history", priority=1)

# Assemble with automatic trimming
context = await engine.assemble(state)
# Result: 7,168 tokens, oldest messages trimmed

Decision guide

When to use
each level of the system.

You don't have to adopt everything at once. Start with raw strings. Add blocks when your prompt gets complex. Add flows when conversations have phases. Add guards when you ship to production.

Level 0

Raw string

0 lines

When: Simple one-off scripts, prototypes

A plain string passed to build_agent(instructions="...")

Level 1

Typed blocks

5-10 lines

When: Prompt has 3+ concerns (identity, rules, format)

Identity, Rules, OutputFormat blocks with priorities. When token budget is tight, lowest priority drops first — no crash, graceful degradation.

Level 2

ConversationFlow

15-25 lines

When: Chat has distinct phases (greeting → diagnosis → solution)

Each phase activates different blocks. The agent's personality, rules, and examples change as the conversation progresses.

Level 3

Guards + Testing

10-20 lines

When: Production deployment, compliance requirements

ContentFilter blocks bad words, LengthGuard caps output, SchemaStrictGuard validates JSON. mock_llm() tests the full pipeline deterministically.

Level 4

Full stack

50+ lines

When: Enterprise, regulated industries, high-stakes

Blocks + Flow + Strategies + Guards + Context Providers + Inspector + Version Control + YAML loader. Every prompt is versioned, tested, debuggable, and auditable.

Debugging story

Your agent made a
bad decision. Why?

Agent gave wrong financial advice

Customer reported incorrect tax calculation

Open PromptInspector trace

See: ContextProvider "tax_rules" timed out (3.2s > 2s budget)

Rules block dropped due to token budget

Priority 9 block dropped because conversation history (priority 1) consumed too many tokens

Root cause: wrong priority assignment

Fix: set tax_rules to priority 10 (required=True). Caught in 30 seconds instead of 3 hours.

PromptInspector output

inspector trace

Block Assembly Trace:
─────────────────────────
✓ Identity "financial_advisor"
    priority=10  tokens=45  INCLUDED
✓ OutputFormat "json_response"
    priority=8   tokens=30  INCLUDED
✗ Rules "tax_rules"
    priority=9   tokens=120 DROPPED
    reason: context_provider timeout (3.2s)
✓ Examples "tax_examples"
    priority=4   tokens=85  INCLUDED
✗ ContextSlot "conversation_history"
    priority=1   tokens=2400 TRIMMED→800

Context Providers:
  ✓ UserContextProvider    12ms   ok
  ✓ TaskContextProvider    3ms    ok
  ✗ TaxRulesProvider       3200ms TIMEOUT

Guards:
  ✓ ContentFilterGuard     passed
  ✗ SchemaStrictGuard      FAILED
    missing: "tax_rate" field

Total: 960 tokens / 8192 budget
Time:  3.4s (3.2s in TaxRulesProvider)

Comparison

What you get that
raw strings don't.

Capability	Raw strings	LangChain
Priority-based token budgeting	—	—
Graceful block degradation	—	—
Conversation phase management	—	—
Composable strategies (CoT + Critique)	—	—
11 auto-injecting context providers	—	—
Output guards with auto-retry	—	—
Assembly trace / debugger	—	—
Semantic version control	—	—
YAML prompt loader	—	—
pytest integration	—	—
Template variables
Chat message formatting

Runtime

Back to

Foundry Overview

Prompts built likesoftware, not strings.

Prompts are code.Not strings you paste into a chat box.

Without Promptise

With PromptBlocks

What you get

Eight capabilities that make prompts software.

Typed Blocks

Conversation Flows

Reasoning Strategies

Context Providers

Output Guards

Prompt Debugging

Version Control

Context Engine

Prompts that assemble themselves from typed blocks.

Conversations that change voice across phases.

Greeting

Information Gathering

Diagnosis

Solution

Follow-up

Reasoning strategies you compose like building blocks.

Chain of Thought

Structured

Self-Critique

Plan & Execute

Decompose

Context that is always current, always relevant.

Guards that enforce output quality.

ContentFilterGuard

LengthGuard

SchemaStrictGuard

InputValidatorGuard

OutputValidatorGuard

Debug any prompt decision in seconds.

Version control and test your prompts.

Test prompts with real assertions.

Token-precise context control.

When to useeach level of the system.

Raw string

Typed blocks

ConversationFlow

Guards + Testing

Full stack

Your agent made abad decision. Why?

PromptInspector output

What you get thatraw strings don't.

Prompts built likesoftware, not strings.

Prompts are code.Not strings you paste into a chat box.

Without Promptise

With PromptBlocks

What you get

Eight capabilities that make prompts software.

Typed Blocks

Conversation Flows

Reasoning Strategies

Context Providers

Output Guards

Prompt Debugging

Version Control

Context Engine

Prompts that assemble themselves from typed blocks.

Conversations that change voice across phases.

Greeting

Information Gathering

Diagnosis

Solution

Follow-up

Reasoning strategies you compose like building blocks.

Chain of Thought

Structured

Self-Critique

Plan & Execute

Decompose

Context that is always current, always relevant.

Guards that enforce output quality.

ContentFilterGuard

LengthGuard

SchemaStrictGuard

InputValidatorGuard

Prompts built like
software, not strings.

Prompts are code.
Not strings you paste into a chat box.

When to use
each level of the system.

Your agent made a
bad decision. Why?

What you get that
raw strings don't.

Prompts built like
software, not strings.

Prompts are code.
Not strings you paste into a chat box.

When to use
each level of the system.

Your agent made a
bad decision. Why?

What you get that
raw strings don't.