PolicyCache Recipe¶
New to Agent Memory? Start with the Quickstart Guide for a progressive adoption path.
A reference recipe composing all shipped Popoto memory primitives into an RL-style action selection cache. Agents accumulate state-action-outcome events; a crystallization handler detects repeated successful patterns and creates PolicyEntry records. Agents query policies for action selection, and outcomes update Q-values via temporal difference learning.
Quick Start¶
from popoto.recipes.policy_cache import (
PolicyEntry,
compute_fingerprint,
update_q_value,
initialize_q_value,
crystallization_handler,
temporal_discovery_handler,
)
# Create a policy entry
fp = compute_fingerprint({"task": "deploy", "env": "staging"})
policy = PolicyEntry(
agent_id="agent-1",
state_fingerprint=fp,
state_features={"task": "deploy", "env": "staging"},
action_type="run_playbook",
action_spec={"playbook": "deploy.yml"},
)
policy.save()
# Set initial Q-value (overrides DecayingSortedField timestamp)
initialize_q_value(policy, initial_q=0.5)
# Update Q-value after observing a reward
td_error = update_q_value(policy, reward=1.0)
Architecture¶
PolicyEntry composes these primitives:
| Primitive | Role in PolicyCache |
|---|---|
AutoKeyField |
Unique entry ID |
KeyField |
Agent partitioning, state fingerprinting, action type |
DecayingSortedField |
Q-value storage with temporal decay |
ConfidenceField |
Bayesian confidence from outcome history |
CoOccurrenceField |
Weighted graph between related policies |
ExistenceFilter |
Bloom filter for fast state lookup |
EventStreamMixin |
Mutation log via Redis Streams |
AccessTrackerMixin |
Read pattern tracking |
PredictionLedgerMixin |
Outcome prediction and resolution |
Crystallization¶
The crystallization_handler is an async function designed for use with StreamConsumer. It:
- Groups incoming events by
(state_fingerprint, action_type) - Counts successes and failures
- Computes Wilson CI lower bound for conservative success rate estimation
- Creates a PolicyEntry when evidence exceeds thresholds:
- Minimum events:
MIN_EVENTS_FOR_CRYSTALLIZATION(default: 3) - Wilson CI lower bound >
WILSON_CI_THRESHOLD(default: 0.6) - Uses ExistenceFilter (Bloom filter) to skip likely-duplicate entries
from popoto.streams import StreamConsumer
consumer = StreamConsumer(
stream_key="stream:policy_mutations",
group_name="crystallizer",
consumer_name="worker-1",
handler=crystallization_handler,
)
Temporal Discovery¶
The temporal_discovery_handler identifies cyclical patterns in event timestamps:
- Day of week (7 buckets) — weekly patterns
- Week of month (4 buckets) — monthly patterns
- Month of year (12 buckets) — yearly patterns
Uses chi-squared test against uniform distribution. Significant clusters (p < 0.05) are returned as (period, amplitude, phase) tuples suitable for CyclicDecayField.
Q-Value Updates¶
The update_q_value function performs atomic TD(0) updates via Lua script:
- alpha (learning rate): How much new information overrides old (default: 0.1)
- gamma (discount factor): Importance of future rewards (default: 0.95)
- Returns TD error (positive = better than expected)
Tuning Constants¶
All numeric constants have been validated via parameter sweep (tuning guide):
| Constant | Default | Purpose |
|---|---|---|
MIN_EVENTS_FOR_CRYSTALLIZATION |
3 | Minimum events before crystallization |
WILSON_CI_THRESHOLD |
0.6 | Required Wilson CI lower bound |
TD_ALPHA |
0.1 | Q-value learning rate |
TD_GAMMA |
0.95 | Q-value discount factor |
CHI_SQUARED_P_THRESHOLD |
0.05 | Temporal pattern significance |
INITIAL_CYCLE_AMPLITUDE |
0.5 | Starting amplitude for discovered cycles |
Design Decisions¶
- WriteFilterMixin excluded: The crystallization handler IS the write gate. Dual gating makes debugging harder.
- Recipe, not core: Lives in
popoto.recipesto demonstrate composition without coupling to the ORM core. - Bloom filter false positives: ~1% of legitimate crystallizations may be skipped due to ExistenceFilter's error rate. Acceptable for reference use; production systems needing zero misses should add a secondary check.