popoto.recipes.policy_cache¶
popoto.recipes.policy_cache
¶
PolicyCache — Learned action selection from crystallized patterns.
A reference recipe composing all shipped Popoto memory primitives into an RL-style action selection cache. Agents accumulate state-action-outcome events; a StreamConsumer crystallization handler detects repeated successful patterns and creates PolicyEntry records. Agents query policies via CompositeScoreQuery for action selection. Outcomes update Q-values via temporal difference learning.
Components
- PolicyEntry: Model composing DecayingSortedField, ConfidenceField, CoOccurrenceField, ExistenceFilter, EventStreamMixin, AccessTrackerMixin, and PredictionLedgerMixin.
- update_q_value(): Atomic Q-value TD update via Lua script.
- compute_fingerprint(): Stable state fingerprint from feature dicts.
- wilson_ci_lower(): Wilson score confidence interval lower bound.
- chi_squared_uniform(): Chi-squared test against uniform distribution.
- crystallization_handler(): StreamConsumer handler for pattern detection.
- temporal_discovery_handler(): StreamConsumer handler for cycle discovery.
Dependencies
All 12 shipped Popoto primitives (Steps 1-10 of the memory roadmap). No external dependencies beyond Popoto itself.
See Also
docs/guides/policy-cache-recipe.md — full guide with architecture, tuning constants, and design decisions.
Example
from popoto.recipes.policy_cache import ( PolicyEntry, compute_fingerprint, update_q_value, crystallization_handler, ) from popoto.streams import StreamConsumer
Create a policy manually¶
policy = PolicyEntry( agent_id="agent-1", state_fingerprint=compute_fingerprint({"task": "deploy", "env": "staging"}), state_features={"task": "deploy", "env": "staging"}, action_type="run_playbook", action_spec={"playbook": "deploy.yml"}, ) policy.save()
Or let crystallization detect patterns automatically¶
consumer = StreamConsumer( stream_key="stream:policy_mutations", group_name="crystallizer", consumer_name="worker-1", handler=crystallization_handler, )
MIN_EVENTS_FOR_CRYSTALLIZATION = Defaults.MIN_EVENTS_FOR_CRYSTALLIZATION
module-attribute
¶
Minimum events with same (state_fingerprint, action_type) before considering crystallization. Can be set as low as 1 for eager mode in high-confidence environments. Optimal range: [1, 10]. Insensitive to retrieval quality in this range.
WILSON_CI_THRESHOLD = Defaults.WILSON_CI_THRESHOLD
module-attribute
¶
Wilson confidence interval lower bound that must be exceeded for crystallization to trigger. Higher values require stronger evidence. Optimal range: [0.3, 0.8]. Insensitive within this range.
TD_ALPHA = Defaults.TD_ALPHA
module-attribute
¶
Q-value learning rate for temporal difference updates. Controls how much new reward information overrides the existing Q-value estimate. Optimal range: [0.01, 0.5]. Insensitive to retrieval quality.
TD_GAMMA = Defaults.TD_GAMMA
module-attribute
¶
Q-value discount factor for temporal difference updates. Controls the importance of future expected rewards relative to immediate reward. Optimal range: [0.8, 0.99). Insensitive to retrieval quality.
CHI_SQUARED_P_THRESHOLD = Defaults.CHI_SQUARED_P_THRESHOLD
module-attribute
¶
p-value threshold for temporal pattern discovery. Clusters must exceed this significance level to be recorded as cyclical patterns.
INITIAL_CYCLE_AMPLITUDE = Defaults.INITIAL_CYCLE_AMPLITUDE
module-attribute
¶
Initial amplitude for discovered temporal cycles. Cycles strengthen or weaken over time via CyclicDecayField entrainment.
CHI_SQUARED_CRITICAL_VALUES = {2: 5.991, 3: 7.815, 6: 12.592, 11: 19.675, 23: 35.172}
module-attribute
¶
Chi-squared critical values at p=0.05 for common degrees of freedom. If the test statistic exceeds the critical value, the null hypothesis (uniform distribution) is rejected — the pattern is significant.
PolicyEntry
¶
Bases: EventStreamMixin, AccessTrackerMixin, PredictionLedgerMixin, Model
Reference model for learned action selection policies.
Composes all shipped Popoto memory primitives into a single model that stores state -> action -> expected_value triples. Agents query policies by state_fingerprint and select actions based on expected_value (Q-value) weighted by confidence and co-occurrence.
Fields
entry_id: Auto-generated unique key. agent_id: Partition key scoping policies per agent. state_fingerprint: SHA-256 hash of state features for fast lookup. state_features: Original state feature dict (JSON-serializable). action_type: Category of the action (e.g., "run_playbook"). action_spec: Full action specification dict (JSON-serializable). expected_value: Q-value with temporal decay, partitioned by agent. confidence: Bayesian confidence growing with successful outcomes. related_policies: Weighted co-occurrence graph between policies. bloom: Bloom filter for fast state_fingerprint pre-checks.
Mixins
EventStreamMixin: Logs all mutations to Redis Streams. AccessTrackerMixin: Tracks read patterns for proactive surfacing. PredictionLedgerMixin: Records and resolves outcome predictions.
Note
WriteFilterMixin is intentionally excluded — the crystallization handler IS the write gate (Wilson CI > threshold). Having gating logic in two places makes debugging harder.
Source code in src/popoto/recipes/policy_cache.py
compute_fingerprint(features, include_fields=None, include_timestamp=False)
¶
Generate a stable fingerprint from state features.
Creates a SHA-256 hash (truncated to 16 hex chars) from a dict of state features. The hash is deterministic for the same input, making it suitable for grouping events by state.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
features
|
dict
|
Dict of state features to fingerprint. Values must be JSON-serializable. |
required |
include_fields
|
list
|
Optional list of field names to include. If None, all fields are included. Useful for per-model customization of which features define "same state." |
None
|
include_timestamp
|
bool
|
If True, includes current hour-bucket timestamp for time-unique fingerprints. The bucket is the current hour (Unix timestamp truncated to 3600s). |
False
|
Returns:
| Name | Type | Description |
|---|---|---|
str |
str
|
SHA-256 truncated to 16 hex chars. |
Examples:
>>> compute_fingerprint({"task": "deploy", "env": "staging"},
... include_fields=["task"])
'f8e7d6c5b4a39281' # Different — only 'task' included
>>> compute_fingerprint({"task": "deploy"}, include_timestamp=True)
'1234567890abcdef' # Different per hour
Source code in src/popoto/recipes/policy_cache.py
wilson_ci_lower(successes, total, z=1.96)
¶
Wilson score confidence interval lower bound.
Computes the lower bound of the Wilson score interval, which gives a conservative estimate of the true success rate. Unlike naive success/total, this handles small sample sizes correctly.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
successes
|
int
|
Number of successful outcomes. |
required |
total
|
int
|
Total number of outcomes. |
required |
z
|
float
|
Z-score for confidence level. 1.96 = 95% CI (default). |
1.96
|
Returns:
| Name | Type | Description |
|---|---|---|
float |
float
|
Lower bound of Wilson CI (0.0 to 1.0). |
Source code in src/popoto/recipes/policy_cache.py
chi_squared_uniform(observed, expected_per_bucket)
¶
Chi-squared statistic against uniform distribution.
Tests whether observed counts across buckets differ significantly from a uniform distribution. Compare the returned statistic against CHI_SQUARED_CRITICAL_VALUES for the appropriate degrees of freedom.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
observed
|
list
|
List of observed counts per bucket. |
required |
expected_per_bucket
|
float
|
Expected count per bucket under uniform distribution (total_events / num_buckets). |
required |
Returns:
| Name | Type | Description |
|---|---|---|
float |
float
|
Chi-squared test statistic. Higher values indicate stronger deviation from uniform. |
Source code in src/popoto/recipes/policy_cache.py
update_q_value(instance, reward, max_future_q=0.0, alpha=TD_ALPHA, gamma=TD_GAMMA)
¶
Update a PolicyEntry's Q-value via temporal difference learning.
Atomically updates the expected_value (Q-value) in the sorted set using the TD(0) update rule:
Q(s,a) ← Q(s,a) + α [r + γ max Q(s',a') - Q(s,a)]
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
instance
|
A saved PolicyEntry instance. |
required | |
reward
|
float
|
Observed reward signal (float). |
required |
max_future_q
|
float
|
Maximum Q-value for next state's best action. Default 0.0 (no future state, terminal). |
0.0
|
alpha
|
float
|
Learning rate. Default TD_ALPHA (0.1). |
TD_ALPHA
|
gamma
|
float
|
Discount factor. Default TD_GAMMA (0.95). |
TD_GAMMA
|
Returns:
| Name | Type | Description |
|---|---|---|
float |
float
|
The TD error (positive = better than expected, negative = worse than expected). |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the instance has no redis_key (unsaved). |
Source code in src/popoto/recipes/policy_cache.py
initialize_q_value(instance, initial_q=0.0)
¶
Set the initial Q-value for a PolicyEntry in the sorted set.
After save(), DecayingSortedField stores the current timestamp as score. This function overrides that with the desired initial Q-value.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
instance
|
A saved PolicyEntry instance. |
required | |
initial_q
|
float
|
The initial Q-value to set. Default 0.0. |
0.0
|
Raises:
| Type | Description |
|---|---|
ValueError
|
If the instance is unsaved. |
Source code in src/popoto/recipes/policy_cache.py
crystallization_handler(entries)
async
¶
StreamConsumer handler that detects repeated patterns and crystallizes PolicyEntry records.
Groups events by (state_fingerprint, action_type), counts successes and failures, and creates a PolicyEntry when evidence threshold is met (min events AND Wilson CI lower bound > threshold).
Expected entry fields (all strings per Redis Streams spec): - state_fingerprint: Hash identifying the state. - action_type: Category of the action taken. - outcome: "success" or "failure". - state_features: JSON string of original state features (optional). - action_spec: JSON string of action specification (optional). - agent_id: Agent identifier (optional, defaults to "default").
Entries missing state_fingerprint or action_type are skipped with a warning.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
entries
|
List of (entry_id, fields_dict) tuples from StreamConsumer. |
required |
Source code in src/popoto/recipes/policy_cache.py
410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 | |
temporal_discovery_handler(entries)
async
¶
StreamConsumer handler that discovers cyclical patterns from event timestamps.
Buckets event timestamps by day-of-week (7 buckets), week-of-month (4 buckets), and month-of-year (12 buckets). Performs chi-squared test against uniform distribution. Significant clusters (p < 0.05) are logged as discovered temporal patterns.
This handler identifies WHEN events tend to occur, which can be used to add cycles to CyclicDecayField instances in application code.
Expected entry fields
- ts: Unix timestamp string (from EventStreamMixin).
- state_fingerprint: Hash identifying the state (optional).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
entries
|
List of (entry_id, fields_dict) tuples from StreamConsumer. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
list |
Discovered cycles as (period, amplitude, phase) tuples. Empty list if no significant patterns found. |
Source code in src/popoto/recipes/policy_cache.py
523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 | |