Returns a function that will search intervals on a time series using the provided predicate.
The function will return time intervals where the predicate is true.
Each returned interval is evaluated to the associated statistics()
. The dsl()
formula in
interval_values
will be used for evaluating the final statistics()
.
The specified interpolation strategies are used for filling in missing timestamps. See interpolate()
for more details on interpolation and strategies.
The intervals produced by this function are equivalent to events in Quiver. This is particularly
useful when a time series demonstrates intervaled behavior and analysis on the time series requires
access the intervals. Each time series can then be split into ranges using time_range()
such that
each interval is a new time_range()
and operations can be applied independently on each time-range.
dsl()
conditional program.predicate
and interval_values
(default is [‘a’, ‘b’, …, ‘aa’, ‘ab’, …]).start
timestamp, the full interval will be included in the output (default is pandas.Timestamp.min).end
timestamp, the full interval will be included in the output (default is pandas.Timestamp.max`).dsl()
program to transform the values that the interval statistics are computed over. This is required
for a non-numeric input time series since statistics cannot be computed over non-numeric data.
(default is the first input time series).interpolate()
(default is NEAREST
).interpolate()
(default is LINEAR
for numeric
and PREVIOUS
for enum time series).interpolate()
(default is NEAREST
).Column name | Type | Description |
---|---|---|
count | int | Number of data points in the interval. |
earliest_point.timestamp | datetime | Timestamp of the first data point in the interval. |
earliest_point.value | float | Value of the first data point in the interval. |
end_timestamp | datetime | Timestamp (exclusive) of the end of the interval. |
largest_point.timestamp | datetime | Timestamp of the data point with the largest value in the interval. |
largest_point.value | float | Largest value in the interval. |
latest_point.timestamp | datetime | Timestamp of the most recent data point in the interval. |
latest_point.value | float | Value of the most recent data point in the interval. |
mean | float | Average value of all data points in the interval. |
smallest_point.timestamp | datetime | Timestamp of the data point with the smallest value in the interval. |
smallest_point.value | float | Smallest value in the interval. |
start_timestamp | datetime | Timestamp of the first data point in the interval. |
standard_deviation | float | Standard deviation of the data points in the interval. |
duration.seconds | int | Duration of the interval in seconds. |
duration.subsecond_nanos | int | Duration of the interval in nanoseconds. |
Copied!1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
>>> discrete_series = F.points( ... (0, 1.0), ... (1, 2.0), ... (2, 2.0), ... (3, 3.0), ... (4, 5.0), ... (5, 6.0), ... (6, 4.0), ... (7, 2.0), ... (8, 6.0), ... (9, 7.0), ... (10, 8.0), ... (11, 10.0), ... (12, 11.0), ... name="discrete", ... ) >>> discrete_series.to_pandas() timestamp value 0 1970-01-01 00:00:00.000000000 1.0 1 1970-01-01 00:00:00.000000001 2.0 2 1970-01-01 00:00:00.000000002 2.0 3 1970-01-01 00:00:00.000000003 3.0 4 1970-01-01 00:00:00.000000004 5.0 5 1970-01-01 00:00:00.000000005 6.0 6 1970-01-01 00:00:00.000000006 4.0 7 1970-01-01 00:00:00.000000007 2.0 8 1970-01-01 00:00:00.000000008 6.0 9 1970-01-01 00:00:00.000000009 7.0 10 1970-01-01 00:00:00.000000010 8.0 11 1970-01-01 00:00:00.000000011 10.0 12 1970-01-01 00:00:00.000000012 11.0
Copied!1 2 3 4 5 6 7 8 9 10 11 12 13 14
>>> even_search = F.time_series_search( ... predicate="discrete % 2 == 0", ... interval_values="discrete", ... labels="discrete", ... )(discrete_series) # 3 Intervals with points: # Interval 1: [(1, 2.0), (2, 2.0)] # Interval 2: [(5, 6.0), (6, 4.0), (7, 2.0), (8, 6.0)] # Interval 3: [(10, 8.0), (11, 10.0)] >>> even_search.to_pandas() count duration.seconds duration.subsecond_nanos earliest_point.timestamp earliest_point.value end_time largest_point.timestamp largest_point.value latest_point.timestamp latest_point.value mean smallest_point.timestamp smallest_point.value standard_deviation start_time 0 2 0 2 1970-01-01 00:00:00.000000001 2.0 1970-01-01 00:00:00.000000003 1970-01-01 00:00:00.000000002 2.0 1970-01-01 00:00:00.000000002 2.0 2.0 1970-01-01 00:00:00.000000002 2.0 0.000000 1970-01-01 00:00:00.000000001 1 4 0 4 1970-01-01 00:00:00.000000005 6.0 1970-01-01 00:00:00.000000009 1970-01-01 00:00:00.000000008 6.0 1970-01-01 00:00:00.000000008 6.0 4.5 1970-01-01 00:00:00.000000007 2.0 1.658312 1970-01-01 00:00:00.000000005 2 2 0 2 1970-01-01 00:00:00.000000010 8.0 1970-01-01 00:00:00.000000012 1970-01-01 00:00:00.000000011 10.0 1970-01-01 00:00:00.000000011 10.0 9.0 1970-01-01 00:00:00.000000010 8.0 1.000000 1970-01-01 00:00:00.000000010
Copied!1 2 3 4 5 6 7 8 9 10 11 12 13 14
>>> search_formula = F.time_series_search( ... predicate="discrete % 2 == 0", ... interval_values="discrete * 2", ... labels="discrete", ... )(discrete_series) # 3 Intervals with points (with doubled values): # Interval 1: [(1, 4.0), (2, 4.0)] # Interval 2: [(5, 12.0), (6, 8.0), (7, 4.0), (8, 12.0)] # Interval 3: [(10, 16.0), (11, 20.0)] >>> search_formula.to_pandas() count duration.seconds duration.subsecond_nanos earliest_point.timestamp earliest_point.value end_time largest_point.timestamp largest_point.value latest_point.timestamp latest_point.value mean smallest_point.timestamp smallest_point.value standard_deviation start_time 0 2 0 2 1970-01-01 00:00:00.000000001 4.0 1970-01-01 00:00:00.000000003 1970-01-01 00:00:00.000000002 4.0 1970-01-01 00:00:00.000000002 4.0 4.0 1970-01-01 00:00:00.000000002 4.0 0.000000 1970-01-01 00:00:00.000000001 1 4 0 4 1970-01-01 00:00:00.000000005 12.0 1970-01-01 00:00:00.000000009 1970-01-01 00:00:00.000000008 12.0 1970-01-01 00:00:00.000000008 12.0 9.0 1970-01-01 00:00:00.000000007 4.0 3.316625 1970-01-01 00:00:00.000000005 2 2 0 2 1970-01-01 00:00:00.000000010 16.0 1970-01-01 00:00:00.000000012 1970-01-01 00:00:00.000000011 20.0 1970-01-01 00:00:00.000000011 20.0 18.0 1970-01-01 00:00:00.000000010 16.0 2.000000 1970-01-01 00:00:00.000000010
Copied!1 2 3 4 5 6 7 8 9 10 11 12
>>> min_duration_search = F.time_series_search( ... predicate="discrete % 2 == 0", ... interval_values="discrete", ... labels="discrete", ... min_duration="3ns", ... )(discrete_series) # The first and last intervals are filtered due to duration < 3 # 1 Interval with points: # [(5, 6.0), (6, 4.0), (7, 2.0), (8, 6.0)] >>> min_duration_search.to_pandas() count duration.seconds duration.subsecond_nanos earliest_point.timestamp earliest_point.value end_time largest_point.timestamp largest_point.value latest_point.timestamp latest_point.value mean smallest_point.timestamp smallest_point.value standard_deviation start_time 0 4 0 4 1970-01-01 00:00:00.000000005 6.0 1970-01-01 00:00:00.000000009 1970-01-01 00:00:00.000000008 6.0 1970-01-01 00:00:00.000000008 6.0 4.5 1970-01-01 00:00:00.000000007 2.0 1.658312 1970-01-01 00:00:00.000000005
Copied!1 2 3 4 5 6 7 8 9 10 11 12 13 14
>>> max_duration_search = F.time_series_search( ... predicate="discrete % 2 == 0", ... interval_values="discrete", ... labels="discrete", ... max_duration="3ns", ... )(discrete_series) # Second interval is filtered due to duration > 3 # 2 Intervals with points: # Interval 1: [(1, 2.0), (2, 2.0)] # Interval 2: [(10, 8.0), (11, 10.0)] >>> max_duration_search.to_pandas() count duration.seconds duration.subsecond_nanos earliest_point.timestamp earliest_point.value end_time largest_point.timestamp largest_point.value latest_point.timestamp latest_point.value mean smallest_point.timestamp smallest_point.value standard_deviation start_time 0 2 0 2 1970-01-01 00:00:00.000000001 2.0 1970-01-01 00:00:00.000000003 1970-01-01 00:00:00.000000002 2.0 1970-01-01 00:00:00.000000002 2.0 2.0 1970-01-01 00:00:00.000000002 2.0 0.0 1970-01-01 00:00:00.000000001 1 2 0 2 1970-01-01 00:00:00.000000010 8.0 1970-01-01 00:00:00.000000012 1970-01-01 00:00:00.000000011 10.0 1970-01-01 00:00:00.000000011 10.0 9.0 1970-01-01 00:00:00.000000010 8.0 1.0 1970-01-01 00:00:00.000000010
Copied!1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
>>> toggle_series = F.points( ... (0, "OFF"), ... (1, "ON"), ... (2, "OFF"), ... (3, "OFF"), ... (4, "ON"), ... (5, "ON"), ... (6, "ON"), ... (7, "OFF"), ... (8, "ON"), ... (9, "ON"), ... (10, "OFF"), ... (11, "OFF"), ... (12, "ON"), ... name="toggle", ... ) >>> toggle_series.to_pandas() timestamp value 0 1970-01-01 00:00:00.000000000 OFF 1 1970-01-01 00:00:00.000000001 ON 2 1970-01-01 00:00:00.000000002 OFF 3 1970-01-01 00:00:00.000000003 OFF 4 1970-01-01 00:00:00.000000004 ON 5 1970-01-01 00:00:00.000000005 ON 6 1970-01-01 00:00:00.000000006 ON 7 1970-01-01 00:00:00.000000007 OFF 8 1970-01-01 00:00:00.000000008 ON 9 1970-01-01 00:00:00.000000009 ON 10 1970-01-01 00:00:00.000000010 OFF 11 1970-01-01 00:00:00.000000011 OFF 12 1970-01-01 00:00:00.000000012 ON >>> cross_series_search = F.time_series_search( ... predicate='toggle == "ON"', ... interval_values="discrete", ... labels=["toggle", "discrete"], ... )([toggle_series, discrete_series]) # 4 Intervals in discrete_series created from intervals in toggle_series where predicate is true: # Interval 1: [(1, 2.0)] # Interval 2: [(4, 5.0), (5, 6.0), (6, 4.0)] # Interval 3: [(8, 6.0), (9, 7.0)] # Interval 4: [(12, 11.0)] >>> cross_series_search.to_pandas() count duration.seconds duration.subsecond_nanos earliest_point.timestamp earliest_point.value end_time largest_point.timestamp largest_point.value latest_point.timestamp latest_point.value mean smallest_point.timestamp smallest_point.value standard_deviation start_time 0 1 0 1 1970-01-01 00:00:00.000000001 2.0 1970-01-01 00:00:00.000000002 1970-01-01 00:00:00.000000001 2.0 1970-01-01 00:00:00.000000001 2.0 2.0 1970-01-01 00:00:00.000000001 2.0 0.000000 1970-01-01 00:00:00.000000001 1 3 0 3 1970-01-01 00:00:00.000000004 5.0 1970-01-01 00:00:00.000000007 1970-01-01 00:00:00.000000005 6.0 1970-01-01 00:00:00.000000006 4.0 5.0 1970-01-01 00:00:00.000000006 4.0 0.816497 1970-01-01 00:00:00.000000004 2 2 0 2 1970-01-01 00:00:00.000000008 6.0 1970-01-01 00:00:00.000000010 1970-01-01 00:00:00.000000009 7.0 1970-01-01 00:00:00.000000009 7.0 6.5 1970-01-01 00:00:00.000000008 6.0 0.500000 1970-01-01 00:00:00.000000008 3 1 0 1 1970-01-01 00:00:00.000000012 11.0 1970-01-01 00:00:00.000000013 1970-01-01 00:00:00.000000012 11.0 1970-01-01 00:00:00.000000012 11.0 11.0 1970-01-01 00:00:00.000000012 11.0 0.000000 1970-01-01 00:00:00.000000012