Skip to content

popoto.utils.list_search

Sequence Gap Detection Utilities.

This module provides efficient algorithms for identifying missing elements in integer sequences. These utilities support data integrity verification in Redis-backed systems where sequential IDs or timestamps may have gaps due to deletions, failed transactions, or data migration issues.

Design Philosophy:

In Redis ORM systems, gaps in sequences often indicate important data events: - Deleted records that left holes in auto-incrementing IDs - Failed pub/sub messages in time-series data streams - Incomplete migrations or synchronization issues

Rather than scanning the entire keyspace to detect these issues, this module provides memory-efficient generators that identify gaps by examining only adjacent pairs in a sorted sequence.

The implementation uses a sliding window approach adapted from a well-known Stack Overflow solution, optimized for: 1. Memory efficiency: Generators yield results lazily without materializing the entire sequence 2. Generality: Works with any iterable of comparable elements 3. Simplicity: Clear, maintainable code over micro-optimizations

Use Cases in Popoto:

  • AutoKeyField validation: Verify that auto-generated sequential IDs have no unexpected gaps after bulk operations
  • Time-series analysis: Detect missing data points in financial indicators (e.g., OHLCV bars with timestamp gaps)
  • Pub/sub message ordering: Identify dropped messages by checking sequence numbers in subscriber streams
Example

from popoto.utils.list_search import missing_elements timestamps = [1, 2, 3, 7, 8, 10] # Missing 4, 5, 6, 9 missing_elements(timestamps) [4, 5, 6, 9]

See Also:

  • popoto.fields.auto_field_mixin.AutoFieldMixin - Auto-generated unique keys
  • popoto.finance.models.ohlcv - Time-series data with potential gaps

References:

https://stackoverflow.com/questions/16974047/efficient-way-to-find-missing-elements-in-an-integer-sequence/16974075#16974075

window(seq, n=2)

Generate a sliding window over an iterable sequence.

This function creates overlapping tuples of consecutive elements, enabling pairwise comparisons without loading the entire sequence into memory. It forms the foundation for gap detection by allowing examination of adjacent elements.

The sliding window pattern is fundamental to many sequence analysis tasks in time-series and ordered data processing. By yielding tuples lazily, this implementation handles arbitrarily large sequences with constant memory overhead.

Parameters:

Name Type Description Default
seq

Any iterable of elements. For gap detection, this should be a sorted sequence of integers or comparable values.

required
n

Window width (default 2). For gap detection, pairs (n=2) are sufficient. Larger windows support more complex pattern matching.

2

Yields:

Name Type Description
tuple

Consecutive n-element tuples from the sequence. For seq=[1,2,3,4] with n=2: (1,2), (2,3), (3,4)

Example

list(window([10, 20, 30, 40], n=2)) [(10, 20), (20, 30), (30, 40)]

list(window([1, 2, 3, 4, 5], n=3)) [(1, 2, 3), (2, 3, 4), (3, 4, 5)]

Note

The sequence must have at least n elements to yield any results. An empty or too-short sequence produces no output.

Source code in src/popoto/utils/list_search.py
def window(seq, n=2):
    """
    Generate a sliding window over an iterable sequence.

    This function creates overlapping tuples of consecutive elements, enabling
    pairwise comparisons without loading the entire sequence into memory. It
    forms the foundation for gap detection by allowing examination of adjacent
    elements.

    The sliding window pattern is fundamental to many sequence analysis tasks
    in time-series and ordered data processing. By yielding tuples lazily, this
    implementation handles arbitrarily large sequences with constant memory
    overhead.

    Args:
        seq: Any iterable of elements. For gap detection, this should be a
            sorted sequence of integers or comparable values.
        n: Window width (default 2). For gap detection, pairs (n=2) are
            sufficient. Larger windows support more complex pattern matching.

    Yields:
        tuple: Consecutive n-element tuples from the sequence.
            For seq=[1,2,3,4] with n=2: (1,2), (2,3), (3,4)

    Example:
        >>> list(window([10, 20, 30, 40], n=2))
        [(10, 20), (20, 30), (30, 40)]

        >>> list(window([1, 2, 3, 4, 5], n=3))
        [(1, 2, 3), (2, 3, 4), (3, 4, 5)]

    Note:
        The sequence must have at least n elements to yield any results.
        An empty or too-short sequence produces no output.
    """
    it = iter(seq)
    result = tuple(islice(it, n))
    if len(result) == n:
        yield result
    for elem in it:
        result = result[1:] + (elem,)
        yield result

missing_elements(L)

Find all missing integers in a sorted sequence.

Given a sorted list of integers, this function identifies all values that would appear in a continuous sequence but are absent. This is essential for detecting data gaps in systems that rely on sequential identifiers or timestamps.

The algorithm works by examining adjacent pairs via the sliding window function. When the gap between neighbors exceeds 1, all intermediate values are collected as missing elements. This approach is O(n) in time complexity and processes the list in a single pass.

Parameters:

Name Type Description Default
L

A sorted list of integers representing the existing sequence. The list must be sorted in ascending order for correct results. Duplicates are tolerated but may indicate other data issues.

required

Returns:

Name Type Description
list

All integer values missing from the continuous sequence defined by the minimum and maximum values in L. Returns an empty list if no gaps exist.

Example

missing_elements([1, 2, 3, 4, 5]) []

missing_elements([1, 5]) [2, 3, 4]

missing_elements([10, 11, 15, 20]) [12, 13, 14, 16, 17, 18, 19]

Warning

For very large gaps (e.g., [1, 1000000]), the returned list will contain all intermediate values. Consider the memory implications when working with sequences that may have large discontinuities.

See Also

window: The underlying sliding window generator used for pair iteration.

Source code in src/popoto/utils/list_search.py
def missing_elements(L):
    """
    Find all missing integers in a sorted sequence.

    Given a sorted list of integers, this function identifies all values that
    would appear in a continuous sequence but are absent. This is essential for
    detecting data gaps in systems that rely on sequential identifiers or
    timestamps.

    The algorithm works by examining adjacent pairs via the sliding window
    function. When the gap between neighbors exceeds 1, all intermediate values
    are collected as missing elements. This approach is O(n) in time complexity
    and processes the list in a single pass.

    Args:
        L: A sorted list of integers representing the existing sequence.
            The list must be sorted in ascending order for correct results.
            Duplicates are tolerated but may indicate other data issues.

    Returns:
        list: All integer values missing from the continuous sequence defined
            by the minimum and maximum values in L. Returns an empty list if
            no gaps exist.

    Example:
        >>> missing_elements([1, 2, 3, 4, 5])
        []

        >>> missing_elements([1, 5])
        [2, 3, 4]

        >>> missing_elements([10, 11, 15, 20])
        [12, 13, 14, 16, 17, 18, 19]

    Warning:
        For very large gaps (e.g., [1, 1000000]), the returned list will
        contain all intermediate values. Consider the memory implications
        when working with sequences that may have large discontinuities.

    See Also:
        window: The underlying sliding window generator used for pair iteration.
    """
    missing = chain.from_iterable(range(x + 1, y) for x, y in window(L) if (y - x) > 1)
    return list(missing)