popoto.utils.list_search¶
popoto.utils.list_search
¶
Sequence Gap Detection Utilities.
This module provides efficient algorithms for identifying missing elements in integer sequences. These utilities support data integrity verification in Redis-backed systems where sequential IDs or timestamps may have gaps due to deletions, failed transactions, or data migration issues.
Design Philosophy:¶
In Redis ORM systems, gaps in sequences often indicate important data events: - Deleted records that left holes in auto-incrementing IDs - Failed pub/sub messages in time-series data streams - Incomplete migrations or synchronization issues
Rather than scanning the entire keyspace to detect these issues, this module provides memory-efficient generators that identify gaps by examining only adjacent pairs in a sorted sequence.
The implementation uses a sliding window approach adapted from a well-known Stack Overflow solution, optimized for: 1. Memory efficiency: Generators yield results lazily without materializing the entire sequence 2. Generality: Works with any iterable of comparable elements 3. Simplicity: Clear, maintainable code over micro-optimizations
Use Cases in Popoto:¶
- AutoKeyField validation: Verify that auto-generated sequential IDs have no unexpected gaps after bulk operations
- Time-series analysis: Detect missing data points in financial indicators (e.g., OHLCV bars with timestamp gaps)
- Pub/sub message ordering: Identify dropped messages by checking sequence numbers in subscriber streams
Example
from popoto.utils.list_search import missing_elements timestamps = [1, 2, 3, 7, 8, 10] # Missing 4, 5, 6, 9 missing_elements(timestamps) [4, 5, 6, 9]
See Also:¶
popoto.fields.auto_field_mixin.AutoFieldMixin- Auto-generated unique keyspopoto.finance.models.ohlcv- Time-series data with potential gaps
References:¶
https://stackoverflow.com/questions/16974047/efficient-way-to-find-missing-elements-in-an-integer-sequence/16974075#16974075
window(seq, n=2)
¶
Generate a sliding window over an iterable sequence.
This function creates overlapping tuples of consecutive elements, enabling pairwise comparisons without loading the entire sequence into memory. It forms the foundation for gap detection by allowing examination of adjacent elements.
The sliding window pattern is fundamental to many sequence analysis tasks in time-series and ordered data processing. By yielding tuples lazily, this implementation handles arbitrarily large sequences with constant memory overhead.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
seq
|
Any iterable of elements. For gap detection, this should be a sorted sequence of integers or comparable values. |
required | |
n
|
Window width (default 2). For gap detection, pairs (n=2) are sufficient. Larger windows support more complex pattern matching. |
2
|
Yields:
| Name | Type | Description |
|---|---|---|
tuple |
Consecutive n-element tuples from the sequence. For seq=[1,2,3,4] with n=2: (1,2), (2,3), (3,4) |
Example
list(window([10, 20, 30, 40], n=2)) [(10, 20), (20, 30), (30, 40)]
list(window([1, 2, 3, 4, 5], n=3)) [(1, 2, 3), (2, 3, 4), (3, 4, 5)]
Note
The sequence must have at least n elements to yield any results. An empty or too-short sequence produces no output.
Source code in src/popoto/utils/list_search.py
missing_elements(L)
¶
Find all missing integers in a sorted sequence.
Given a sorted list of integers, this function identifies all values that would appear in a continuous sequence but are absent. This is essential for detecting data gaps in systems that rely on sequential identifiers or timestamps.
The algorithm works by examining adjacent pairs via the sliding window function. When the gap between neighbors exceeds 1, all intermediate values are collected as missing elements. This approach is O(n) in time complexity and processes the list in a single pass.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
L
|
A sorted list of integers representing the existing sequence. The list must be sorted in ascending order for correct results. Duplicates are tolerated but may indicate other data issues. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
list |
All integer values missing from the continuous sequence defined by the minimum and maximum values in L. Returns an empty list if no gaps exist. |
Example
missing_elements([1, 2, 3, 4, 5]) []
missing_elements([1, 5]) [2, 3, 4]
missing_elements([10, 11, 15, 20]) [12, 13, 14, 16, 17, 18, 19]
Warning
For very large gaps (e.g., [1, 1000000]), the returned list will contain all intermediate values. Consider the memory implications when working with sequences that may have large discontinuities.
See Also
window: The underlying sliding window generator used for pair iteration.