← System Design Library / architecture-agentic-orchestration
Stable v1.0 Last Updated: 1/5/2026

Agentic Orchestration Patterns

Designing the operating system for AI: Durable execution, memory kernels, and cognitive architectures.

Summary

  • What it is: A blueprint for building reliable, long-running autonomous agents that survive infrastructure failures.
  • Why now: As agents move from “chatbots” to “workers,” they need state persistence, retry logic, and structured memory.
  • Who it’s for: Platform engineers building the “Agent Cloud” (e.g., Strategos).

The Core Problem: Stateless vs. Stateful

Traditional ML serving (Hyperion) is stateless: Input -> Model -> Output. Agentic workloads are stateful loops: Observe -> Plan -> Act -> Observe.

The Challenge:

  1. Timeouts: Agent tasks (e.g., “Research X”) can take minutes or hours. HTTP connections fail.
  2. Failure: If a pod crashes mid-thought, the agent’s context is lost.
  3. Context Window: Infinite history crashes the model.

Pattern 1: Durable Execution (The Kernel)

Instead of storing “current state” in a mutable database row, we use Event Sourcing.

The Event Log

We persist every significant step as an immutable event:

  • WorkflowStarted
  • ActivityScheduled (Tool Call)
  • ActivityCompleted (Tool Result)
  • TimerFired

The Replay Mechanism

When a worker crashes and restarts:

  1. Load the full Event History.
  2. Replay the code from the beginning.
  3. Skip side-effects (tools) that are already marked Completed in the log.
  4. Resume execution exactly where it died.

Result: Effectively “infinite” uptime for agents, guaranteeing Exactly-Once Execution for critical tools.


Pattern 2: The Memory Hierarchy (The MMU)

Just like an OS manages RAM, an Agent OS must manage Context.

TierAnalogyDescriptionLatency
WorkingL1 CacheThe immediate prompt context. Expensive ($), fast.ms
EpisodicRAM / SwapVector DB (RAG) for recent interactions.~100ms
StructuredDiskSQL/Graph DB for permanent facts (“User is Admin”).~10ms

Context Paging: The orchestrator automatically “pages out” old turns from Working Memory to Episodic Memory, and “pages in” relevant facts based on the current Goal.


Pattern 3: Cognitive Architectures

The “Brain” logic should be pluggable.

ReAct (Reason + Act)

Interleaved thinking and doing. Good for dynamic environments. Thought -> Action -> Observation -> Thought...

Plan-and-Solve

Generate a full Dependency Graph (DAG) of tasks first, then execute. Good for complex, deterministic goals.

Reflection

A secondary loop where the agent critiques its own output before finalizing it. Increases quality at the cost of latency.


Architecture Reference

graph TD
    Client -->|Goal| Gateway
    Gateway -->|Start Workflow| Orchestrator[Strategos: Durable Engine]
    
    subgraph "The Agent Loop"
        Orchestrator <-->|Replay/Persist| EventLog[(SQLite/Postgres)]
        Orchestrator -->|Context| Memory[Memory Kernel]
        Orchestrator -->|Prompt| LLM[Hyperion: Inference]
        Orchestrator -->|Validate| Guardian[Safety Layer]
        Guardian -->|Execute| Tools[Tool Registry]
    end

Risks & Mitigations

  • Infinite Loops: Agents getting stuck repeating “I need to check status”.
    • Fix: Step limits and semantic loop detection (embedding similarity of last N thoughts).
  • Context Pollution: Retrieving irrelevant memories confuses the model.
    • Fix: Strict relevance thresholds and query rewriting.
  • Cost Runaway:
    • Fix: Token quotas per workflow and per tenant.

  • Strategos: The reference implementation of this orchestration pattern.
  • Hyperion: The inference engine powering the LLM calls.