Skip to content

PolicyCache Recipe

New to Agent Memory? Start with the Quickstart Guide for a progressive adoption path.

A reference recipe composing all shipped Popoto memory primitives into an RL-style action selection cache. Agents accumulate state-action-outcome events; a crystallization handler detects repeated successful patterns and creates PolicyEntry records. Agents query policies for action selection, and outcomes update Q-values via temporal difference learning.

Quick Start

from popoto.recipes.policy_cache import (
    PolicyEntry,
    compute_fingerprint,
    update_q_value,
    initialize_q_value,
    crystallization_handler,
    temporal_discovery_handler,
)

# Create a policy entry
fp = compute_fingerprint({"task": "deploy", "env": "staging"})
policy = PolicyEntry(
    agent_id="agent-1",
    state_fingerprint=fp,
    state_features={"task": "deploy", "env": "staging"},
    action_type="run_playbook",
    action_spec={"playbook": "deploy.yml"},
)
policy.save()

# Set initial Q-value (overrides DecayingSortedField timestamp)
initialize_q_value(policy, initial_q=0.5)

# Update Q-value after observing a reward
td_error = update_q_value(policy, reward=1.0)

Architecture

PolicyEntry composes these primitives:

Primitive Role in PolicyCache
AutoKeyField Unique entry ID
KeyField Agent partitioning, state fingerprinting, action type
DecayingSortedField Q-value storage with temporal decay
ConfidenceField Bayesian confidence from outcome history
CoOccurrenceField Weighted graph between related policies
ExistenceFilter Bloom filter for fast state lookup
EventStreamMixin Mutation log via Redis Streams
AccessTrackerMixin Read pattern tracking
PredictionLedgerMixin Outcome prediction and resolution

Crystallization

The crystallization_handler is an async function designed for use with StreamConsumer. It:

  1. Groups incoming events by (state_fingerprint, action_type)
  2. Counts successes and failures
  3. Computes Wilson CI lower bound for conservative success rate estimation
  4. Creates a PolicyEntry when evidence exceeds thresholds:
  5. Minimum events: MIN_EVENTS_FOR_CRYSTALLIZATION (default: 3)
  6. Wilson CI lower bound > WILSON_CI_THRESHOLD (default: 0.6)
  7. Uses ExistenceFilter (Bloom filter) to skip likely-duplicate entries
from popoto.streams import StreamConsumer

consumer = StreamConsumer(
    stream_key="stream:policy_mutations",
    group_name="crystallizer",
    consumer_name="worker-1",
    handler=crystallization_handler,
)

Temporal Discovery

The temporal_discovery_handler identifies cyclical patterns in event timestamps:

  • Day of week (7 buckets) — weekly patterns
  • Week of month (4 buckets) — monthly patterns
  • Month of year (12 buckets) — yearly patterns

Uses chi-squared test against uniform distribution. Significant clusters (p < 0.05) are returned as (period, amplitude, phase) tuples suitable for CyclicDecayField.

Q-Value Updates

The update_q_value function performs atomic TD(0) updates via Lua script:

Q(s,a) <- Q(s,a) + alpha * [reward + gamma * max_Q(s',a') - Q(s,a)]
  • alpha (learning rate): How much new information overrides old (default: 0.1)
  • gamma (discount factor): Importance of future rewards (default: 0.95)
  • Returns TD error (positive = better than expected)

Tuning Constants

All numeric constants have been validated via parameter sweep (tuning guide):

Constant Default Purpose
MIN_EVENTS_FOR_CRYSTALLIZATION 3 Minimum events before crystallization
WILSON_CI_THRESHOLD 0.6 Required Wilson CI lower bound
TD_ALPHA 0.1 Q-value learning rate
TD_GAMMA 0.95 Q-value discount factor
CHI_SQUARED_P_THRESHOLD 0.05 Temporal pattern significance
INITIAL_CYCLE_AMPLITUDE 0.5 Starting amplitude for discovered cycles

Design Decisions

  • WriteFilterMixin excluded: The crystallization handler IS the write gate. Dual gating makes debugging harder.
  • Recipe, not core: Lives in popoto.recipes to demonstrate composition without coupling to the ORM core.
  • Bloom filter false positives: ~1% of legitimate crystallizations may be skipped due to ExistenceFilter's error rate. Acceptable for reference use; production systems needing zero misses should add a secondary check.