foundryts.functions.time_series_search

foundryts.functions.time_series_search(predicate, labels=None, start=None, end=None, interval_values=None, before='nearest', internal='default', after='nearest', min_duration=None, max_duration=None)

指定された述語を使用して、time series 上の間隔を検索する関数を返します。

この関数は、述語が真である時間間隔を返します。

返された各間隔は、関連するstatistics()に評価されます。interval_valuesのdsl()式は、最終的なstatistics()を評価するために使用されます。

指定された補間戦略は、欠落しているタイムスタンプを埋めるために使用されます。補間と戦略の詳細については、interpolate()を参照してください。

この関数によって生成された間隔は、Quiver のイベントと同等です。これは、time series が間隔的な動作を示し、time series の分析に間隔へのアクセスが必要な場合に特に有用です。各time seriesは、time_range()を使用して範囲に分割でき、それぞれの時間範囲で独立して操作を適用できます。

パラメーター:
- predicate (str) – dsl()条件プログラムを使用して間隔を検索するための述語。
- labels (Union [str , List [str ] ] , optional) – predicateとinterval_values内で参照するための各入力time seriesのエイリアス (デフォルトは [‘a’, ‘b’, …, ‘aa’, ‘ab’, …])。
- start (int | datetime | str , optional) – time seriesの間隔を評価し始めるタイムスタンプ（含む）。startタイムスタンプと重なる間隔の場合、完全な間隔が出力に含まれます（デフォルトは pandas.Timestamp.min）。
- end (int | datetime | str , optional) – time seriesの間隔を評価し終えるタイムスタンプ（除く）。endタイムスタンプと重なる間隔の場合、完全な間隔が出力に含まれます（デフォルトは pandas.Timestamp.max`）。
- interval_values (str , optional) – 間隔統計が計算される値をトランスフォームするためのdsl()プログラム。統計が非数値データに対して計算できないため、非数値入力time seriesには必要です。 (デフォルトは最初の入力time series)。
- before (Union [str , List [str ] ] , optional) – シリーズ内の最初のポイントより前のポイントを補間するための戦略で、シリーズごとにリストで指定できます。interpolate()から有効な戦略を使用します（デフォルトは NEAREST）。
- internal (Union [str , List [str ] ] , optional) – 既存のポイント間のポイントを補間するための戦略で、シリーズごとにリストで指定できます。interpolate()から有効な値を使用します（数値の場合は LINEAR、列挙time seriesの場合は PREVIOUS がデフォルト）。
- after (Union [str , List [str ] ] , optional) – シリーズ内の最後のポイントより後のポイントを補間するための戦略で、シリーズごとにリストで指定できます。interpolate()から有効な戦略を使用します（デフォルトは NEAREST）。
- min_duration (int | str | datetime.timedelta , optional) – 時間範囲が間隔として適格とみなされるために、述語が真でなければならない最小期間。
- max_duration (int | str | datetime.timedelta , optional) – 時間範囲が間隔として適格とみなされるために、述語が真でなければならない最大期間。
戻り値: 入力time seriesの述語を満たす間隔の統計を返す関数。
戻り値の型: (Union[FunctionNode, NodeCollections]) -> SummaryNode

データフレームスキーマ

列名	型	説明
count	int	間隔内のデータポイントの数。
earliest_point.timestamp	datetime	間隔内の最初のデータポイントのタイムスタンプ。
earliest_point.value	float	間隔内の最初のデータポイントの値。
end_timestamp	datetime	間隔の終了タイムスタンプ（除く）。
largest_point.timestamp	datetime	間隔内で最も大きな値を持つデータポイントのタイムスタンプ。
largest_point.value	float	間隔内の最大値。
latest_point.timestamp	datetime	間隔内の最新のデータポイントのタイムスタンプ。
latest_point.value	float	間隔内の最新のデータポイントの値。
mean	float	間隔内のすべてのデータポイントの平均値。
smallest_point.timestamp	datetime	間隔内で最も小さな値を持つデータポイントのタイムスタンプ。
smallest_point.value	float	間隔内の最小値。
start_timestamp	datetime	間隔内の最初のデータポイントのタイムスタンプ。
standard_deviation	float	間隔内のデータポイントの標準偏差。
duration.seconds	int	間隔の秒単位の期間。
duration.subsecond_nanos	int	間隔のナノ秒単位の期間。

例

Copied!1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
>>> discrete_series = F.points(
...     (0, 1.0),
...     (1, 2.0),
...     (2, 2.0),
...     (3, 3.0),
...     (4, 5.0),
...     (5, 6.0),
...     (6, 4.0),
...     (7, 2.0),
...     (8, 6.0),
...     (9, 7.0),
...     (10, 8.0),
...     (11, 10.0),
...     (12, 11.0),
...     name="discrete",
... )
>>> discrete_series.to_pandas()
                       timestamp  value
0  1970-01-01 00:00:00.000000000    1.0
1  1970-01-01 00:00:00.000000001    2.0
2  1970-01-01 00:00:00.000000002    2.0
3  1970-01-01 00:00:00.000000003    3.0
4  1970-01-01 00:00:00.000000004    5.0
5  1970-01-01 00:00:00.000000005    6.0
6  1970-01-01 00:00:00.000000006    4.0
7  1970-01-01 00:00:00.000000007    2.0
8  1970-01-01 00:00:00.000000008    6.0
9  1970-01-01 00:00:00.000000009    7.0
10 1970-01-01 00:00:00.000000010    8.0
11 1970-01-01 00:00:00.000000011   10.0
12 1970-01-01 00:00:00.000000012   11.0

このコードは、F.pointsを使用して離散系列のデータポイントを定義しています。データポイントは、(x, y)の形式で指定され、それぞれのx値に対応するy値が与えられています。このデータは"discrete"という名前で保存されます。

次に、discrete_series.to_pandas()を呼び出すことで、この離散系列をPandas DataFrameに変換しています。変換後のDataFrameは、timestamp列とvalue列を持ち、各データポイントのx値がタイムスタンプに、y値がvalueに対応しています。タイムスタンプはUNIXエポック時間として扱われています。

Copied!1
2
3
4
5
6
7
8
9
10
11
12
13
14
>>> even_search = F.time_series_search(
...     predicate="discrete % 2 == 0",  # 偶数である条件を指定
...     interval_values="discrete",     # インターバルの値として 'discrete' を使用
...     labels="discrete",              # ラベルとして 'discrete' を使用
... )(discrete_series)
# 3つのインターバルが見つかりました:
# Interval 1: [(1, 2.0), (2, 2.0)]
# Interval 2: [(5, 6.0), (6, 4.0), (7, 2.0), (8, 6.0)]
# Interval 3: [(10, 8.0), (11, 10.0)]
>>> even_search.to_pandas()
   count  duration.seconds  duration.subsecond_nanos      earliest_point.timestamp  earliest_point.value                      end_time       largest_point.timestamp  largest_point.value        latest_point.timestamp  latest_point.value  mean      smallest_point.timestamp  smallest_point.value  standard_deviation                    start_time
0      2                 0                         2 1970-01-01 00:00:00.000000001                   2.0 1970-01-01 00:00:00.000000003 1970-01-01 00:00:00.000000002                  2.0 1970-01-01 00:00:00.000000002                 2.0   2.0 1970-01-01 00:00:00.000000002                   2.0            0.000000 1970-01-01 00:00:00.000000001
1      4                 0                         4 1970-01-01 00:00:00.000000005                   6.0 1970-01-01 00:00:00.000000009 1970-01-01 00:00:00.000000008                  6.0 1970-01-01 00:00:00.000000008                 6.0   4.5 1970-01-01 00:00:00.000000007                   2.0            1.658312 1970-01-01 00:00:00.000000005
2      2                 0                         2 1970-01-01 00:00:00.000000010                   8.0 1970-01-01 00:00:00.000000012 1970-01-01 00:00:00.000000011                 10.0 1970-01-01 00:00:00.000000011                10.0   9.0 1970-01-01 00:00:00.000000010                   8.0            1.000000 1970-01-01 00:00:00.000000010

このコードは、discrete_series 内の偶数を持つデータポイントを時間的なインターバルごとに検索し、結果を Pandas データフレームに変換して表示しています。それぞれのインターバルについて、データポイントの数 (count)、持続時間 (duration)、最も早いポイント (earliest_point)、最も遅いポイント (latest_point) などの情報が含まれています。

Copied!1
2
3
4
5
6
7
8
9
10
11
12
13
14
>>> search_formula = F.time_series_search(
...     predicate="discrete % 2 == 0",  # 離散変数が偶数であるかを判定する述語
...     interval_values="discrete * 2",  # 離散変数の値を2倍にしたものを間隔値とする
...     labels="discrete",  # ラベルとして離散変数を使用
... )(discrete_series)
# 3つの間隔が見つかりました（値は2倍されています）:
# Interval 1: [(1, 4.0), (2, 4.0)]
# Interval 2: [(5, 12.0), (6, 8.0), (7, 4.0), (8, 12.0)]
# Interval 3: [(10, 16.0), (11, 20.0)]
>>> search_formula.to_pandas()
   count  duration.seconds  duration.subsecond_nanos      earliest_point.timestamp  earliest_point.value                      end_time       largest_point.timestamp  largest_point.value        latest_point.timestamp  latest_point.value  mean      smallest_point.timestamp  smallest_point.value  standard_deviation                    start_time
0      2                 0                         2 1970-01-01 00:00:00.000000001                   4.0 1970-01-01 00:00:00.000000003 1970-01-01 00:00:00.000000002                  4.0 1970-01-01 00:00:00.000000002                 4.0   4.0 1970-01-01 00:00:00.000000002                   4.0            0.000000 1970-01-01 00:00:00.000000001
1      4                 0                         4 1970-01-01 00:00:00.000000005                  12.0 1970-01-01 00:00:00.000000009 1970-01-01 00:00:00.000000008                 12.0 1970-01-01 00:00:00.000000008                12.0   9.0 1970-01-01 00:00:00.000000007                   4.0            3.316625 1970-01-01 00:00:00.000000005
2      2                 0                         2 1970-01-01 00:00:00.000000010                  16.0 1970-01-01 00:00:00.000000012 1970-01-01 00:00:00.000000011                 20.0 1970-01-01 00:00:00.000000011                20.0  18.0 1970-01-01 00:00:00.000000010                  16.0            2.000000 1970-01-01 00:00:00.000000010

Copied!1
2
3
4
5
6
7
8
9
10
11
12
>>> min_duration_search = F.time_series_search(
...     predicate="discrete % 2 == 0",
...     interval_values="discrete",
...     labels="discrete",
...     min_duration="3ns",
... )(discrete_series)
# 最初と最後のインターバルは、期間が3未満のためフィルタリングされます
# ポイントを含む1つのインターバル:
# [(5, 6.0), (6, 4.0), (7, 2.0), (8, 6.0)]
>>> min_duration_search.to_pandas()
   count  duration.seconds  duration.subsecond_nanos      earliest_point.timestamp  earliest_point.value                      end_time       largest_point.timestamp  largest_point.value        latest_point.timestamp  latest_point.value  mean      smallest_point.timestamp  smallest_point.value  standard_deviation                    start_time
0      4                 0                         4 1970-01-01 00:00:00.000000005                   6.0 1970-01-01 00:00:00.000000009 1970-01-01 00:00:00.000000008                  6.0 1970-01-01 00:00:00.000000008                 6.0   4.5 1970-01-01 00:00:00.000000007                   2.0            1.658312 1970-01-01 00:00:00.000000005

Copied!1
2
3
4
5
6
7
8
9
10
11
12
13
14
>>> max_duration_search = F.time_series_search(
...     predicate="discrete % 2 == 0",
...     interval_values="discrete",
...     labels="discrete",
...     max_duration="3ns",
... )(discrete_series)
# 2つ目のインターバルは持続時間が3を超えているためフィルタリングされる
# ポイントを持つ2つのインターバル:
# インターバル 1: [(1, 2.0), (2, 2.0)]
# インターバル 2: [(10, 8.0), (11, 10.0)]
>>> max_duration_search.to_pandas()
   count  duration.seconds  duration.subsecond_nanos      earliest_point.timestamp  earliest_point.value                      end_time       largest_point.timestamp  largest_point.value        latest_point.timestamp  latest_point.value  mean      smallest_point.timestamp  smallest_point.value  standard_deviation                    start_time
0      2                 0                         2 1970-01-01 00:00:00.000000001                   2.0 1970-01-01 00:00:00.000000003 1970-01-01 00:00:00.000000002                  2.0 1970-01-01 00:00:00.000000002                 2.0   2.0 1970-01-01 00:00:00.000000002                   2.0                 0.0 1970-01-01 00:00:00.000000001
1      2                 0                         2 1970-01-01 00:00:00.000000010                   8.0 1970-01-01 00:00:00.000000012 1970-01-01 00:00:00.000000011                 10.0 1970-01-01 00:00:00.000000011                10.0   9.0 1970-01-01 00:00:00.000000010                   8.0                 1.0 1970-01-01 00:00:00.000000010

Copied!1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
>>> toggle_series = F.points(
...     (0, "OFF"),
...     (1, "ON"),
...     (2, "OFF"),
...     (3, "OFF"),
...     (4, "ON"),
...     (5, "ON"),
...     (6, "ON"),
...     (7, "OFF"),
...     (8, "ON"),
...     (9, "ON"),
...     (10, "OFF"),
...     (11, "OFF"),
...     (12, "ON"),
...     name="toggle",
... )
>>> toggle_series.to_pandas()
                       timestamp value
0  1970-01-01 00:00:00.000000000   OFF
1  1970-01-01 00:00:00.000000001    ON
2  1970-01-01 00:00:00.000000002   OFF
3  1970-01-01 00:00:00.000000003   OFF
4  1970-01-01 00:00:00.000000004    ON
5  1970-01-01 00:00:00.000000005    ON
6  1970-01-01 00:00:00.000000006    ON
7  1970-01-01 00:00:00.000000007   OFF
8  1970-01-01 00:00:00.000000008    ON
9  1970-01-01 00:00:00.000000009    ON
10 1970-01-01 00:00:00.000000010   OFF
11 1970-01-01 00:00:00.000000011   OFF
12 1970-01-01 00:00:00.000000012    ON
>>> cross_series_search = F.time_series_search(
...     predicate='toggle == "ON"',
...     interval_values="discrete",
...     labels=["toggle", "discrete"],
... )([toggle_series, discrete_series])
# toggle_seriesで条件がtrueの区間から生成されたdiscrete_seriesの4つの区間:
# Interval 1: [(1, 2.0)]
# Interval 2: [(4, 5.0), (5, 6.0), (6, 4.0)]
# Interval 3: [(8, 6.0), (9, 7.0)]
# Interval 4: [(12, 11.0)]
>>> cross_series_search.to_pandas()
   count  duration.seconds  duration.subsecond_nanos      earliest_point.timestamp  earliest_point.value                      end_time       largest_point.timestamp  largest_point.value        latest_point.timestamp  latest_point.value  mean      smallest_point.timestamp  smallest_point.value  standard_deviation                    start_time
0      1                 0                         1 1970-01-01 00:00:00.000000001                   2.0 1970-01-01 00:00:00.000000002 1970-01-01 00:00:00.000000001                  2.0 1970-01-01 00:00:00.000000001                 2.0   2.0 1970-01-01 00:00:00.000000001                   2.0            0.000000 1970-01-01 00:00:00.000000001
1      3                 0                         3 1970-01-01 00:00:00.000000004                   5.0 1970-01-01 00:00:00.000000007 1970-01-01 00:00:00.000000005                  6.0 1970-01-01 00:00:00.000000006                 4.0   5.0 1970-01-01 00:00:00.000000006                   4.0            0.816497 1970-01-01 00:00:00.000000004
2      2                 0                         2 1970-01-01 00:00:00.000000008                   6.0 1970-01-01 00:00:00.000000010 1970-01-01 00:00:00.000000009                  7.0 1970-01-01 00:00:00.000000009                 7.0   6.5 1970-01-01 00:00:00.000000008                   6.0            0.500000 1970-01-01 00:00:00.000000008
3      1                 0                         1 1970-01-01 00:00:00.000000012                  11.0 1970-01-01 00:00:00.000000013 1970-01-01 00:00:00.000000012                 11.0 1970-01-01 00:00:00.000000012                11.0  11.0 1970-01-01 00:00:00.000000012                  11.0            0.000000 1970-01-01 00:00:00.000000012

←

PREVIOUSfoundryts.functions.time_range

NEXTfoundryts.functions.time_shift

→