foundryts.functions.time_series_search

foundryts.functions.time_series_search(predicate, labels=None, start=None, end=None, interval_values=None, before='nearest', internal='default', after='nearest', min_duration=None, max_duration=None)

Returns a function that will search intervals on a time series using the provided predicate.

The function will return time intervals where the predicate is true.

Each returned interval is evaluated to the associated statistics(). The dsl() formula in interval_values will be used for evaluating the final statistics().

The specified interpolation strategies are used for filling in missing timestamps. See interpolate() for more details on interpolation and strategies.

The intervals produced by this function are equivalent to events in Quiver. This is particularly useful when a time series demonstrates intervaled behavior and analysis on the time series requires access the intervals. Each time series can then be split into ranges using time_range() such that each interval is a new time_range() and operations can be applied independently on each time-range.

  • Parameters:
    • predicate (str) – The predicate to search for intervals using a dsl() conditional program.
    • labels (Union [str , List [str ] ] , optional) – Aliases for each input time series to refer to them in predicate and interval_values (default is [‘a’, ‘b’, …, ‘aa’, ‘ab’, …]).
    • start (int | datetime | str , optional) – Timestamp (inclusive) to start evaluating intervals in the time series. For an interval overlapping with the start timestamp, the full interval will be included in the output (default is pandas.Timestamp.min).
    • end (int | datetime | str , optional) – Timestamp (exclusive) to end evaluating intervals in the time series. For an interval overlapping with the end timestamp, the full interval will be included in the output (default is pandas.Timestamp.max`).
    • interval_values (str , optional) – dsl() program to transform the values that the interval statistics are computed over. This is required for a non-numeric input time series since statistics cannot be computed over non-numeric data. (default is the first input time series).
    • before (Union [str , List [str ] ] , optional) – Strategy for interpolating points before the first point in the series, which can be a list per series, use a valid strategy from interpolate() (default is NEAREST).
    • internal (Union [str , List [str ] ] , optional) – Strategy for interpolating points between existing points, which can be a list per series, use a valid value from interpolate() (default is LINEAR for numeric and PREVIOUS for enum time series).
    • after (Union [str , List [str ] ] , optional) – Strategy for interpolating points after the last point in the series, which can be a list per series, use a valid strategy from interpolate() (default is NEAREST).
    • min_duration (int | str | datetime.timedelta , optional) – Minimum duration for which predicate must be true for the time-range to qualify as an interval.
    • max_duration (int | str | datetime.timedelta , optional) – Maximum duration for which predicate must be true for the time-range to qualify as an interval.
  • Returns: A function that returns the statistics over intervals satisfying the predicate for input time series.
  • Return type: (Union[FunctionNode, NodeCollections]) -> SummaryNode

Dataframe schema

Column nameTypeDescription
countintNumber of data points in the interval.
earliest_point.timestampdatetimeTimestamp of the first data point in the interval.
earliest_point.valuefloatValue of the first data point in the interval.
end_timestampdatetimeTimestamp (exclusive) of the end of the interval.
largest_point.timestampdatetimeTimestamp of the data point with the largest value
in the interval.
largest_point.valuefloatLargest value in the interval.
latest_point.timestampdatetimeTimestamp of the most recent data point in the
interval.
latest_point.valuefloatValue of the most recent data point in the interval.
meanfloatAverage value of all data points in the interval.
smallest_point.timestampdatetimeTimestamp of the data point with the smallest value
in the interval.
smallest_point.valuefloatSmallest value in the interval.
start_timestampdatetimeTimestamp of the first data point in the interval.
standard_deviationfloatStandard deviation of the data points in the
interval.
duration.secondsintDuration of the interval in seconds.
duration.subsecond_nanosintDuration of the interval in nanoseconds.

Examples

Copied!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 >>> discrete_series = F.points( ... (0, 1.0), ... (1, 2.0), ... (2, 2.0), ... (3, 3.0), ... (4, 5.0), ... (5, 6.0), ... (6, 4.0), ... (7, 2.0), ... (8, 6.0), ... (9, 7.0), ... (10, 8.0), ... (11, 10.0), ... (12, 11.0), ... name="discrete", ... ) >>> discrete_series.to_pandas() timestamp value 0 1970-01-01 00:00:00.000000000 1.0 1 1970-01-01 00:00:00.000000001 2.0 2 1970-01-01 00:00:00.000000002 2.0 3 1970-01-01 00:00:00.000000003 3.0 4 1970-01-01 00:00:00.000000004 5.0 5 1970-01-01 00:00:00.000000005 6.0 6 1970-01-01 00:00:00.000000006 4.0 7 1970-01-01 00:00:00.000000007 2.0 8 1970-01-01 00:00:00.000000008 6.0 9 1970-01-01 00:00:00.000000009 7.0 10 1970-01-01 00:00:00.000000010 8.0 11 1970-01-01 00:00:00.000000011 10.0 12 1970-01-01 00:00:00.000000012 11.0
Copied!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 >>> even_search = F.time_series_search( ... predicate="discrete % 2 == 0", ... interval_values="discrete", ... labels="discrete", ... )(discrete_series) # 3 Intervals with points: # Interval 1: [(1, 2.0), (2, 2.0)] # Interval 2: [(5, 6.0), (6, 4.0), (7, 2.0), (8, 6.0)] # Interval 3: [(10, 8.0), (11, 10.0)] >>> even_search.to_pandas() count duration.seconds duration.subsecond_nanos earliest_point.timestamp earliest_point.value end_time largest_point.timestamp largest_point.value latest_point.timestamp latest_point.value mean smallest_point.timestamp smallest_point.value standard_deviation start_time 0 2 0 2 1970-01-01 00:00:00.000000001 2.0 1970-01-01 00:00:00.000000003 1970-01-01 00:00:00.000000002 2.0 1970-01-01 00:00:00.000000002 2.0 2.0 1970-01-01 00:00:00.000000002 2.0 0.000000 1970-01-01 00:00:00.000000001 1 4 0 4 1970-01-01 00:00:00.000000005 6.0 1970-01-01 00:00:00.000000009 1970-01-01 00:00:00.000000008 6.0 1970-01-01 00:00:00.000000008 6.0 4.5 1970-01-01 00:00:00.000000007 2.0 1.658312 1970-01-01 00:00:00.000000005 2 2 0 2 1970-01-01 00:00:00.000000010 8.0 1970-01-01 00:00:00.000000012 1970-01-01 00:00:00.000000011 10.0 1970-01-01 00:00:00.000000011 10.0 9.0 1970-01-01 00:00:00.000000010 8.0 1.000000 1970-01-01 00:00:00.000000010
Copied!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 >>> search_formula = F.time_series_search( ... predicate="discrete % 2 == 0", ... interval_values="discrete * 2", ... labels="discrete", ... )(discrete_series) # 3 Intervals with points (with doubled values): # Interval 1: [(1, 4.0), (2, 4.0)] # Interval 2: [(5, 12.0), (6, 8.0), (7, 4.0), (8, 12.0)] # Interval 3: [(10, 16.0), (11, 20.0)] >>> search_formula.to_pandas() count duration.seconds duration.subsecond_nanos earliest_point.timestamp earliest_point.value end_time largest_point.timestamp largest_point.value latest_point.timestamp latest_point.value mean smallest_point.timestamp smallest_point.value standard_deviation start_time 0 2 0 2 1970-01-01 00:00:00.000000001 4.0 1970-01-01 00:00:00.000000003 1970-01-01 00:00:00.000000002 4.0 1970-01-01 00:00:00.000000002 4.0 4.0 1970-01-01 00:00:00.000000002 4.0 0.000000 1970-01-01 00:00:00.000000001 1 4 0 4 1970-01-01 00:00:00.000000005 12.0 1970-01-01 00:00:00.000000009 1970-01-01 00:00:00.000000008 12.0 1970-01-01 00:00:00.000000008 12.0 9.0 1970-01-01 00:00:00.000000007 4.0 3.316625 1970-01-01 00:00:00.000000005 2 2 0 2 1970-01-01 00:00:00.000000010 16.0 1970-01-01 00:00:00.000000012 1970-01-01 00:00:00.000000011 20.0 1970-01-01 00:00:00.000000011 20.0 18.0 1970-01-01 00:00:00.000000010 16.0 2.000000 1970-01-01 00:00:00.000000010
Copied!
1 2 3 4 5 6 7 8 9 10 11 12 >>> min_duration_search = F.time_series_search( ... predicate="discrete % 2 == 0", ... interval_values="discrete", ... labels="discrete", ... min_duration="3ns", ... )(discrete_series) # The first and last intervals are filtered due to duration < 3 # 1 Interval with points: # [(5, 6.0), (6, 4.0), (7, 2.0), (8, 6.0)] >>> min_duration_search.to_pandas() count duration.seconds duration.subsecond_nanos earliest_point.timestamp earliest_point.value end_time largest_point.timestamp largest_point.value latest_point.timestamp latest_point.value mean smallest_point.timestamp smallest_point.value standard_deviation start_time 0 4 0 4 1970-01-01 00:00:00.000000005 6.0 1970-01-01 00:00:00.000000009 1970-01-01 00:00:00.000000008 6.0 1970-01-01 00:00:00.000000008 6.0 4.5 1970-01-01 00:00:00.000000007 2.0 1.658312 1970-01-01 00:00:00.000000005
Copied!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 >>> max_duration_search = F.time_series_search( ... predicate="discrete % 2 == 0", ... interval_values="discrete", ... labels="discrete", ... max_duration="3ns", ... )(discrete_series) # Second interval is filtered due to duration > 3 # 2 Intervals with points: # Interval 1: [(1, 2.0), (2, 2.0)] # Interval 2: [(10, 8.0), (11, 10.0)] >>> max_duration_search.to_pandas() count duration.seconds duration.subsecond_nanos earliest_point.timestamp earliest_point.value end_time largest_point.timestamp largest_point.value latest_point.timestamp latest_point.value mean smallest_point.timestamp smallest_point.value standard_deviation start_time 0 2 0 2 1970-01-01 00:00:00.000000001 2.0 1970-01-01 00:00:00.000000003 1970-01-01 00:00:00.000000002 2.0 1970-01-01 00:00:00.000000002 2.0 2.0 1970-01-01 00:00:00.000000002 2.0 0.0 1970-01-01 00:00:00.000000001 1 2 0 2 1970-01-01 00:00:00.000000010 8.0 1970-01-01 00:00:00.000000012 1970-01-01 00:00:00.000000011 10.0 1970-01-01 00:00:00.000000011 10.0 9.0 1970-01-01 00:00:00.000000010 8.0 1.0 1970-01-01 00:00:00.000000010
Copied!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 >>> toggle_series = F.points( ... (0, "OFF"), ... (1, "ON"), ... (2, "OFF"), ... (3, "OFF"), ... (4, "ON"), ... (5, "ON"), ... (6, "ON"), ... (7, "OFF"), ... (8, "ON"), ... (9, "ON"), ... (10, "OFF"), ... (11, "OFF"), ... (12, "ON"), ... name="toggle", ... ) >>> toggle_series.to_pandas() timestamp value 0 1970-01-01 00:00:00.000000000 OFF 1 1970-01-01 00:00:00.000000001 ON 2 1970-01-01 00:00:00.000000002 OFF 3 1970-01-01 00:00:00.000000003 OFF 4 1970-01-01 00:00:00.000000004 ON 5 1970-01-01 00:00:00.000000005 ON 6 1970-01-01 00:00:00.000000006 ON 7 1970-01-01 00:00:00.000000007 OFF 8 1970-01-01 00:00:00.000000008 ON 9 1970-01-01 00:00:00.000000009 ON 10 1970-01-01 00:00:00.000000010 OFF 11 1970-01-01 00:00:00.000000011 OFF 12 1970-01-01 00:00:00.000000012 ON >>> cross_series_search = F.time_series_search( ... predicate='toggle == "ON"', ... interval_values="discrete", ... labels=["toggle", "discrete"], ... )([toggle_series, discrete_series]) # 4 Intervals in discrete_series created from intervals in toggle_series where predicate is true: # Interval 1: [(1, 2.0)] # Interval 2: [(4, 5.0), (5, 6.0), (6, 4.0)] # Interval 3: [(8, 6.0), (9, 7.0)] # Interval 4: [(12, 11.0)] >>> cross_series_search.to_pandas() count duration.seconds duration.subsecond_nanos earliest_point.timestamp earliest_point.value end_time largest_point.timestamp largest_point.value latest_point.timestamp latest_point.value mean smallest_point.timestamp smallest_point.value standard_deviation start_time 0 1 0 1 1970-01-01 00:00:00.000000001 2.0 1970-01-01 00:00:00.000000002 1970-01-01 00:00:00.000000001 2.0 1970-01-01 00:00:00.000000001 2.0 2.0 1970-01-01 00:00:00.000000001 2.0 0.000000 1970-01-01 00:00:00.000000001 1 3 0 3 1970-01-01 00:00:00.000000004 5.0 1970-01-01 00:00:00.000000007 1970-01-01 00:00:00.000000005 6.0 1970-01-01 00:00:00.000000006 4.0 5.0 1970-01-01 00:00:00.000000006 4.0 0.816497 1970-01-01 00:00:00.000000004 2 2 0 2 1970-01-01 00:00:00.000000008 6.0 1970-01-01 00:00:00.000000010 1970-01-01 00:00:00.000000009 7.0 1970-01-01 00:00:00.000000009 7.0 6.5 1970-01-01 00:00:00.000000008 6.0 0.500000 1970-01-01 00:00:00.000000008 3 1 0 1 1970-01-01 00:00:00.000000012 11.0 1970-01-01 00:00:00.000000013 1970-01-01 00:00:00.000000012 11.0 1970-01-01 00:00:00.000000012 11.0 11.0 1970-01-01 00:00:00.000000012 11.0 0.000000 1970-01-01 00:00:00.000000012