popoto.recipes.context_assembler¶
popoto.recipes.context_assembler
¶
ContextAssembler — Retrieval-to-injection bridge with token budgets.
A capstone recipe composing all shipped Popoto memory primitives into a
single assemble() call. Orchestrates pull-path (query-driven) and
push-path (proactive surfacing) retrieval, applies token budgets, and
formats output for LLM context injection.
Metacognitive extensions (opt-in, off-by-default):
RetrievalQualitydataclass surfaces avg confidence, score spread, feeling-of-knowing (FOK), and staleness for the retrieval.ContextAssembler.assess(query_cues)— pre-retrieval FOK probe.ContextAssembler.assemble(..., assess_quality=True)— attaches aRetrievalQualitytoAssemblyResult.metadata["quality"]without changing the default behavior.
Pipeline
Pull path: ExistenceFilter pre-check → CompositeScoreQuery → CoOccurrence propagation Push path: CyclicDecayField temporal scan above surfacing threshold Merge: Deduplicate, re-rank, budget-select, post-effects, format
Synergy with Popoto Primitives
┌────────────────────────┬───────────────────────────────────────┐ │ Primitive │ Role in ContextAssembler │ ├────────────────────────┼───────────────────────────────────────┤ │ DecayingSortedField │ Score index for CompositeScoreQuery │ │ CyclicDecayField │ Push-path proactive surfacing │ │ ConfidenceField │ Score index + competitive suppression │ │ CoOccurrenceField │ Pull-path candidate expansion │ │ ExistenceFilter │ Pull-path pre-check (skip if absent) │ │ AccessTrackerMixin │ on_read post-effect tracking │ │ ObservationProtocol │ on_read / on_surfaced dispatch │ │ RecallProposal │ Created for push-path records │ │ WriteFilterMixin │ Priority score in composite │ │ EventStreamMixin │ Mutation logging (via model save) │ │ PredictionLedgerMixin │ Outcome tracking (via model save) │ │ CompositeScoreQuery │ Multi-factor ranked retrieval │ └────────────────────────┴───────────────────────────────────────┘
Dependencies
All 12 shipped Popoto primitives (Steps 1-12 of the memory roadmap). No external dependencies beyond Popoto itself.
Example
from popoto.recipes.context_assembler import ContextAssembler
assembler = ContextAssembler( model_class=Memory, score_weights={"relevance": 0.6, "confidence": 0.3}, max_items=10, max_tokens=4000, ) result = assembler.assemble( query_cues={"topic": "deployment"}, agent_id="agent-1", )
result.records — selected instances¶
result.proactive — push-path subset¶
result.formatted — LLM-ready string¶
result.metadata — scores, timing, token counts¶
COMPETITIVE_SUPPRESSION_SIGNAL = Defaults.COMPETITIVE_SUPPRESSION_SIGNAL
module-attribute
¶
Signal strength for competitive suppression of non-selected pull-path candidates. Applied via ConfidenceField.update_confidence(). Values < 0.5 act as contradiction signals, mildly reducing future ranking. Optimal range: [0.1, 0.7]. Insensitive to retrieval quality.
DEFAULT_SURFACING_THRESHOLD = Defaults.DEFAULT_SURFACING_THRESHOLD
module-attribute
¶
Minimum score for push-path records to be surfaced. Records from CyclicDecayField scan below this threshold are filtered out. Optimal range: [0.1, 0.9]. Insensitive to retrieval quality.
DEFAULT_MAX_ITEMS = 10
module-attribute
¶
Default maximum number of records returned by assemble().
DEFAULT_PROPAGATION_DEPTH = 2
module-attribute
¶
Default BFS depth for CoOccurrence propagation.
AssemblyResult
dataclass
¶
Return type for ContextAssembler.assemble().
Attributes:
| Name | Type | Description |
|---|---|---|
records |
list
|
All selected instances (pull + push, deduplicated). |
proactive |
list
|
Push-path subset of records (proactively surfaced). |
formatted |
str
|
LLM-ready formatted string (JSON, XML, or natural). |
metadata |
dict
|
Dict with scores, token_count, timing_ms, pull_count, push_count. |
Source code in src/popoto/recipes/context_assembler.py
RetrievalQuality
dataclass
¶
Metacognitive signal describing retrieval trustworthiness.
Surfaces four machine-readable metrics about a retrieval so an agent can decide whether to trust its context, retry with different cues, widen scope, or caveat its downstream answer. This is a purely mechanical signal — no LLM self-reporting — following the research finding that GPT-4's self-reported confidence reflects output structure rather than internal uncertainty.
Attributes:
| Name | Type | Description |
|---|---|---|
avg_confidence |
float
|
Mean of |
score_spread |
float
|
Coefficient of variation (stddev / mean) of the
per-record composite scores. High spread means one or two records
dominate; low spread means results are roughly equivalent.
Falls back to |
fok_score |
float
|
Feeling-of-knowing — 0.4 * cue_familiarity + 0.4 *
partial_retrieval_count + 0.2 * subthreshold_activation,
averaged across query cues. |
staleness_ratio |
float
|
Fraction of selected records with
DecayingSortedField score below the field's decay threshold.
|
score_distribution |
list
|
Optional full list of per-record composite scores for histogram analysis; empty when unavailable. |
per_cue_fok |
dict
|
Optional dict mapping cue value -> dict with the three FOK components for that cue (for debugging). |
Example
quality = assembler.assess({"topic": "deploy"}) if quality.fok_score < 0.3: # Skip the expensive retrieval; we don't know this domain return result = assembler.assemble({"topic": "deploy"}, assess_quality=True) if result.metadata["quality"].avg_confidence < 0.4: # Caveat the downstream response ...
Source code in src/popoto/recipes/context_assembler.py
153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 | |
from_records(records, query_cues=None, score_weights=None, max_items=DEFAULT_MAX_ITEMS, surfacing_threshold=DEFAULT_SURFACING_THRESHOLD)
classmethod
¶
Build a RetrievalQuality over an already-retrieved list of records.
Intended for custom retrieval pipelines (BM25, RRF, hybrid) that
want the metacognitive layer without adopting
:class:ContextAssembler. All model capabilities (ConfidenceField,
ExistenceFilter, DecayingSortedField) are introspected from
records[0]._meta.fields. Heterogeneous record lists are
rejected with TypeError — score weights and capability field
names are per-model-class, so a mixed list would silently produce
incorrect FOK / score_spread / staleness_ratio values.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
records
|
Non-empty list of Popoto Model instances of a single
concrete class. When empty, returns a zero-valued
:class: |
required | |
query_cues
|
Optional dict of query cues — same shape as
|
None
|
|
score_weights
|
Optional dict mapping sorted-field names to
weights. Used for |
None
|
|
max_items
|
Denominator for |
DEFAULT_MAX_ITEMS
|
|
surfacing_threshold
|
Threshold for subthreshold_activation and
staleness_ratio. Default matches |
DEFAULT_SURFACING_THRESHOLD
|
Returns:
| Name | Type | Description |
|---|---|---|
A |
RetrievalQuality
|
class: |
RetrievalQuality
|
the assembler path exactly — see class docstring. |
Raises:
| Type | Description |
|---|---|
TypeError
|
If |
Example
from popoto import RetrievalQuality records = my_bm25_pipeline(query) # custom retrieval quality = RetrievalQuality.from_records( ... records, ... query_cues={"topic": query}, ... score_weights={"relevance": 1.0}, ... ) if quality.fok_score < 0.3: ... return "low confidence retrieval"
Source code in src/popoto/recipes/context_assembler.py
202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 | |
ContextAssembler
¶
Orchestrates memory retrieval into a single assemble() call.
Combines pull-path (query-driven via CompositeScoreQuery) and push-path (proactive via CyclicDecayField) retrieval, applies token budgets, and formats output for LLM context injection.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model_class
|
Popoto Model class to query. |
required | |
score_weights
|
Dict mapping field names to weights for CompositeScoreQuery (e.g., {"relevance": 0.6, "confidence": 0.3}). |
required | |
max_items
|
Maximum records to return. Default 10. |
DEFAULT_MAX_ITEMS
|
|
max_tokens
|
Optional soft token budget. Records are dropped to fit. |
None
|
|
surfacing_threshold
|
Minimum score for push-path records. Default 0.5. |
DEFAULT_SURFACING_THRESHOLD
|
|
propagation_depth
|
BFS depth for CoOccurrence. Default 2. |
DEFAULT_PROPAGATION_DEPTH
|
|
output_format
|
"structured" (JSON), "xml", or "natural". Default "structured". |
'structured'
|
|
token_counter
|
Optional callable(record) -> int. Default: len(str(r)) // 4. |
None
|
Source code in src/popoto/recipes/context_assembler.py
672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 | |
assemble(query_cues=None, agent_id=None, partition_filters=None, assess_quality=False)
¶
Execute the full retrieval pipeline.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query_cues
|
Optional dict of query cues (e.g., {"topic": "deploy"}). If None, pull path is skipped. |
None
|
|
agent_id
|
Optional agent ID for partition filtering. Added to partition_filters as {"agent_id": agent_id}. |
None
|
|
partition_filters
|
Optional dict of partition key-value pairs for filtering queries. |
None
|
|
assess_quality
|
When True, compute a |
False
|
Returns:
| Type | Description |
|---|---|
|
AssemblyResult with records, proactive, formatted, and metadata. |
Source code in src/popoto/recipes/context_assembler.py
738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 | |
assess(query_cues=None, partition_filters=None, probe_limit=None)
¶
Probe retrieval quality without running the full pipeline.
Runs a cheap pre-retrieval check: ExistenceFilter lookups for cue_familiarity + a single low-limit composite_score probe to gather pull candidates for FOK computation. Does NOT run CoOccurrence propagation, does NOT run the push path, does NOT apply post-effects.
Intended use: call assess() before assemble() to decide
whether the full retrieval is worth the round-trip cost. When
assess().fok_score < some_threshold, the agent can skip the
retrieval entirely and widen the cue or caveat its answer.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query_cues
|
Optional dict of query cues. When empty, all metrics default to 0.0 with a logged warning. |
None
|
|
partition_filters
|
Optional dict of partition filters. Same
semantics as |
None
|
|
probe_limit
|
Optional cap on the number of candidates fetched
for the probe. Defaults to |
None
|
Returns:
| Type | Description |
|---|---|
|
RetrievalQuality. |
|
|
reflects what's available for retrieval, not what was |
|
|
actually retrieved. |
Source code in src/popoto/recipes/context_assembler.py
1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 | |
format_structured(records)
¶
format_xml(records)
¶
Format records as XML tags.
Source code in src/popoto/recipes/context_assembler.py
format_natural(records)
¶
Format records as natural language summary.