PolicyCache Recipe¶

New to Agent Memory? Start with the Quickstart Guide for a progressive adoption path.

A reference recipe composing all shipped Popoto memory primitives into an RL-style action selection cache. Agents accumulate state-action-outcome events; a crystallization handler detects repeated successful patterns and creates PolicyEntry records. Agents query policies for action selection, and outcomes update Q-values via temporal difference learning.

Quick Start¶

from popoto.recipes.policy_cache import (
    PolicyEntry,
    compute_fingerprint,
    update_q_value,
    initialize_q_value,
    crystallization_handler,
    temporal_discovery_handler,
)

# Create a policy entry
fp = compute_fingerprint({"task": "deploy", "env": "staging"})
policy = PolicyEntry(
    agent_id="agent-1",
    state_fingerprint=fp,
    state_features={"task": "deploy", "env": "staging"},
    action_type="run_playbook",
    action_spec={"playbook": "deploy.yml"},
)
policy.save()

# Set initial Q-value (overrides DecayingSortedField timestamp)
initialize_q_value(policy, initial_q=0.5)

# Update Q-value after observing a reward
td_error = update_q_value(policy, reward=1.0)

Architecture¶

PolicyEntry composes these primitives:

Primitive	Role in PolicyCache
`AutoKeyField`	Unique entry ID
`KeyField`	Agent partitioning, state fingerprinting, action type
`DecayingSortedField`	Q-value storage with temporal decay
`ConfidenceField`	Capped-evidence confidence from outcome history
`CoOccurrenceField`	Weighted graph between related policies
`ExistenceFilter`	Bloom filter for fast state lookup
`EventStreamMixin`	Mutation log via Redis Streams
`AccessTrackerMixin`	Read pattern tracking
`PredictionLedgerMixin`	Outcome prediction and resolution

Crystallization¶

The crystallization_handler is an async function designed for use with StreamConsumer. It:

Groups incoming events by (state_fingerprint, action_type)
Counts successes and failures
Computes Wilson CI lower bound for conservative success rate estimation
Creates a PolicyEntry when evidence exceeds thresholds:
Minimum events: MIN_EVENTS_FOR_CRYSTALLIZATION (default: 3)
Wilson CI lower bound > WILSON_CI_THRESHOLD (default: 0.6)
Uses ExistenceFilter (Bloom filter) to skip likely-duplicate entries

from popoto.streams import StreamConsumer

consumer = StreamConsumer(
    stream_key="stream:policy_mutations",
    group_name="crystallizer",
    consumer_name="worker-1",
    handler=crystallization_handler,
)

Temporal Discovery¶

The temporal_discovery_handler identifies cyclical patterns in event timestamps:

Day of week (7 buckets) — weekly patterns
Week of month (4 buckets) — monthly patterns
Month of year (12 buckets) — yearly patterns

Uses chi-squared test against uniform distribution. Significant clusters (p < 0.05) are returned as (period, amplitude, phase) tuples suitable for CyclicDecayField.

Q-Value Updates¶

The update_q_value function performs atomic TD(0) updates via Lua script:

Q(s,a) <- Q(s,a) + alpha * [reward + gamma * max_Q(s',a') - Q(s,a)]

alpha (learning rate): How much new information overrides old (default: 0.1)
gamma (discount factor): Importance of future rewards (default: 0.95)
Returns TD error (positive = better than expected)

Tuning Constants¶

All numeric constants have been validated via parameter sweep (tuning guide):

Constant	Default	Purpose
`MIN_EVENTS_FOR_CRYSTALLIZATION`	3	Minimum events before crystallization
`WILSON_CI_THRESHOLD`	0.6	Required Wilson CI lower bound
`TD_ALPHA`	0.1	Q-value learning rate
`TD_GAMMA`	0.95	Q-value discount factor
`CHI_SQUARED_P_THRESHOLD`	0.05	Temporal pattern significance
`INITIAL_CYCLE_AMPLITUDE`	0.5	Starting amplitude for discovered cycles

Design Decisions¶

WriteFilterMixin excluded: The crystallization handler IS the write gate. Dual gating makes debugging harder.
Recipe, not core: Lives in popoto.recipes to demonstrate composition without coupling to the ORM core.
Bloom filter false positives: ~1% of legitimate crystallizations may be skipped due to ExistenceFilter's error rate. Acceptable for reference use; production systems needing zero misses should add a secondary check.