foundryts.functions.statistics

foundryts.functions.statistics(start=None, end=None, window=None, **kwargs)

単一のシリーズをウィンドウに分割し、各ウィンドウに対して統計を計算する関数を返します。

シリーズは window 引数を使用してウィンドウに分割されます。各ウィンドウには、データフレームスキーマに記載されている統計が含まれます。統計は、周期性（各ウィンドウの幅）に基づいて作成されたローリングウィンドウ、または総ポイント数をバケット数で割った数を使用して幅を計算する固定数のウィンドウのいずれかで計算されます。ローリングウィンドウに使用されるオプションは、window または window_count 引数が渡されたかによって決まります。

パラメーター:
- start (Union [int , datetime , str ] , optional) – 提供されたシリーズからウィンドウの分割を開始するタイムスタンプ（含む）。（デフォルトはシリーズ全体）
- end (Union [int , datetime , str ] , optional) – 提供されたシリーズからウィンドウの分割を終了するタイムスタンプ（含む）。（デフォルトはシリーズ全体）
- window (Union [int , datetime , str ] , optional) – 各ウィンドウの幅であるタイムデルタであり、シリーズをウィンドウ数に分割するために使用されるサイズ。（デフォルトはシリーズ全体）
- **kwargs – ウィンドウの動作と出力タイプを決定するためのフラグ。
キーワード引数:
- include_std_dev (bool , False) – True に設定すると、出力に標準偏差が含まれます。
- window_count (int , optional) – 統計を計算するウィンドウの数（各ウィンドウのサイズの代わりに）。
戻り値: 単一のシリーズを入力として受け取り、ウィンドウに分割し、各ウィンドウに対して統計を提供する関数を返します。
戻り値の型: (FunctionNode) -> SummarizerNode

データフレームスキーマ

列名	型	説明
count	int	入力シリーズのウィンドウ内のデータポイント数。
earliest_point.timestamp	datetime	入力シリーズのウィンドウ内の最初のデータポイントのタイムスタンプ。
earliest_point.value	float	入力シリーズのウィンドウ内の最初のデータポイントの値。
end_timestamp	datetime	最後のデータポイントのタイムスタンプ
largest_point.timestamp	datetime	入力シリーズのウィンドウ内で最も大きな値を持つデータポイントのタイムスタンプ。
largest_point.value	float	入力シリーズのウィンドウ内の最大値。
latest_point.timestamp	datetime	入力シリーズのウィンドウ内の最新のデータポイントのタイムスタンプ。
latest_point.value	float	入力シリーズのウィンドウ内の最新のデータポイントの値。
mean	float	入力シリーズのウィンドウ内のすべてのデータポイントの平均値。
smallest_point.timestamp	datetime	入力シリーズのウィンドウ内で最も小さな値を持つデータポイントのタイムスタンプ。
smallest_point.value	float	入力シリーズのウィンドウ内の最小値。
start_timestamp	datetime	最初のデータポイントのタイムスタンプ

注意事項

この関数は数値シリーズにのみ適用されます。

将来的に、include_std_dev 引数はデフォルトで機能が有効になるため廃止されます。

window_count は include_std_dev と一緒にしか使用できません。これにより window が上書きされます。include_std_dev なしで渡された場合、window_count は無視されます。

例

Copied!1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
>>> series = F.points(
...     (1, 8.0),
...     (101, 4.0),
...     (200, 2.0),
...     (201, 1.0),
...     (299, 35.0),
...     (300, 16.0),
...     (350, 32.0),
...     (1000, 64.0),
... )
                      timestamp  value
0 1970-01-01 00:00:00.000000001    8.0
1 1970-01-01 00:00:00.000000101    4.0
2 1970-01-01 00:00:00.000000200    2.0
3 1970-01-01 00:00:00.000000201    1.0
4 1970-01-01 00:00:00.000000299   35.0
5 1970-01-01 00:00:00.000000300   16.0
6 1970-01-01 00:00:00.000000350   32.0
7 1970-01-01 00:00:00.000001000   64.0

このコードでは、F.points 関数を使用して、タイムスタンプと値のペアのリストを作成しています。各ペアは (タイムスタンプ, 値) の形式をとり、タイムスタンプはナノ秒単位で表されています。表は、各タイムスタンプと対応する値を示しています。タイムスタンプは 1970年1月1日からの経過時間を表しています。

Copied!1
2
3
4
5
6
7
8
>>> stats = F.statistics(window="100ns")(series) # 時間ベースのウィンドウを使用して統計情報を計算
>>> stats.to_pandas()
   count      earliest_point.timestamp  earliest_point.value                 end_timestamp       largest_point.timestamp  largest_point.value        latest_point.timestamp  latest_point.value       mean      smallest_point.timestamp  smallest_point.value               start_timestamp
0      1 1970-01-01 00:00:00.000000001                   8.0 1970-01-01 00:00:00.000000100 1970-01-01 00:00:00.000000001                  8.0 1970-01-01 00:00:00.000000001                 8.0   8.000000 1970-01-01 00:00:00.000000001                   8.0 1970-01-01 00:00:00.000000000
1      1 1970-01-01 00:00:00.000000101                   4.0 1970-01-01 00:00:00.000000200 1970-01-01 00:00:00.000000101                  4.0 1970-01-01 00:00:00.000000101                 4.0   4.000000 1970-01-01 00:00:00.000000101                   4.0 1970-01-01 00:00:00.000000100
2      3 1970-01-01 00:00:00.000000200                   2.0 1970-01-01 00:00:00.000000300 1970-01-01 00:00:00.000000299                 35.0 1970-01-01 00:00:00.000000299                35.0  12.666667 1970-01-01 00:00:00.000000201                   1.0 1970-01-01 00:00:00.000000200
3      2 1970-01-01 00:00:00.000000300                  16.0 1970-01-01 00:00:00.000000400 1970-01-01 00:00:00.000000350                 32.0 1970-01-01 00:00:00.000000350                32.0  24.000000 1970-01-01 00:00:00.000000300                  16.0 1970-01-01 00:00:00.000000300
4      1 1970-01-01 00:00:00.000001000                  64.0 1970-01-01 00:00:00.000001100 1970-01-01 00:00:00.000001000                 64.0 1970-01-01 00:00:00.000001000                64.0  64.000000 1970-01-01 00:00:00.000001000                  64.0 1970-01-01 00:00:00.000001000

このコードは、時系列データを100ナノ秒の時間ウィンドウで区切り、それぞれのウィンドウ内での統計情報を計算しています。stats.to_pandas()は、計算結果をPandasのDataFrame形式で出力します。以下の統計情報が含まれています：

count: 各ウィンドウ内のデータポイントの数
earliest_point.timestamp: ウィンドウ内の最も古いデータポイントのタイムスタンプ
earliest_point.value: ウィンドウ内の最も古いデータポイントの値
end_timestamp: ウィンドウの終了タイムスタンプ
largest_point.timestamp: ウィンドウ内の最大値を持つデータポイントのタイムスタンプ
largest_point.value: ウィンドウ内の最大値
latest_point.timestamp: ウィンドウ内の最新のデータポイントのタイムスタンプ
latest_point.value: ウィンドウ内の最新のデータポイントの値
mean: ウィンドウ内の平均値
smallest_point.timestamp: ウィンドウ内の最小値を持つデータポイントのタイムスタンプ
smallest_point.value: ウィンドウ内の最小値
start_timestamp: ウィンドウの開始タイムスタンプ


```pycon
>>> stats_with_std_dev = F.statistics(window="100ns", include_std_dev=True)(series)
# 統計情報を計算し、標準偏差を含めるオプションを有効にしている
# ウィンドウサイズは100ナノ秒
>>> stats_with_std_dev.to_pandas()
# 結果をPandasデータフレームに変換して表示

   count      earliest_point.timestamp  earliest_point.value                 end_timestamp       largest_point.timestamp  largest_point.value        latest_point.timestamp  latest_point.value       mean      smallest_point.timestamp  smallest_point.value  standard_deviation               start_timestamp
0      1 1970-01-01 00:00:00.000000001                   8.0 1970-01-01 00:00:00.000000100 1970-01-01 00:00:00.000000001                  8.0 1970-01-01 00:00:00.000000001                 8.0   8.000000 1970-01-01 00:00:00.000000001                   8.0            0.000000 1970-01-01 00:00:00.000000000
# 各行はウィンドウ内の統計情報を示している
# count: データポイントの数
# earliest_point.timestamp: 最も古いデータポイントのタイムスタンプ
# earliest_point.value: 最も古いデータポイントの値
# end_timestamp: ウィンドウの終了時刻
# largest_point.timestamp: 最大のデータポイントのタイムスタンプ
# largest_point.value: 最大のデータポイントの値
# latest_point.timestamp: 最新のデータポイントのタイムスタンプ
# latest_point.value: 最新のデータポイントの値
# mean: データポイントの平均値
# smallest_point.timestamp: 最小のデータポイントのタイムスタンプ
# smallest_point.value: 最小のデータポイントの値
# standard_deviation: 標準偏差
# start_timestamp: ウィンドウの開始時刻
1      1 1970-01-01 00:00:00.000000101                   4.0 1970-01-01 00:00:00.000000200 1970-01-01 00:00:00.000000101                  4.0 1970-01-01 00:00:00.000000101                 4.0   4.000000 1970-01-01 00:00:00.000000101                   4.0            0.000000 1970-01-01 00:00:00.000000100
2      3 1970-01-01 00:00:00.000000200                   2.0 1970-01-01 00:00:00.000000300 1970-01-01 00:00:00.000000299                 35.0 1970-01-01 00:00:00.000000299                35.0  12.666667 1970-01-01 00:00:00.000000201                   1.0           15.797327 1970-01-01 00:00:00.000000200
3      2 1970-01-01 00:00:00.000000300                  16.0 1970-01-01 00:00:00.000000400 1970-01-01 00:00:00.000000350                 32.0 1970-01-01 00:00:00.000000350                32.0  24.000000 1970-01-01 00:00:00.000000300                  16.0            8.000000 1970-01-01 00:00:00.000000300
4      1 1970-01-01 00:00:00.000001000                  64.0 1970-01-01 00:00:00.000001100 1970-01-01 00:00:00.000001000                 64.0 1970-01-01 00:00:00.000001000                64.0  64.000000 1970-01-01 00:00:00.000001000                  64.0            0.000000 1970-01-01 00:00:00.000001000

Copied!1
2
3
4
5
6
7
8
9
10
11
12
13
14
>>> stats_fixed_window_count = F.statistics(include_std_dev=True, window_count=3)(series)
>>> stats_fixed_window_count.to_pandas()
# このコードは、固定ウィンドウの統計を計算するためのものです。標準偏差を含めて、ウィンドウの数を3に設定しています。
# 結果を pandas のデータフレームとして取得します。

   count      earliest_point.timestamp  earliest_point.value                 end_timestamp       largest_point.timestamp  largest_point.value        latest_point.timestamp  latest_point.value  mean      smallest_point.timestamp  smallest_point.value  standard_deviation               start_timestamp
0      6 1970-01-01 00:00:00.000000001                   8.0 1970-01-01 00:00:00.000000335 1970-01-01 00:00:00.000000299                 35.0 1970-01-01 00:00:00.000000300                16.0  11.0 1970-01-01 00:00:00.000000201                   1.0            11.83216 1970-01-01 00:00:00.000000001
# 最初のウィンドウの統計情報です。データポイントの数は6で、平均値は11.0、標準偏差は11.83216です。

1      1 1970-01-01 00:00:00.000000350                  32.0 1970-01-01 00:00:00.000000669 1970-01-01 00:00:00.000000350                 32.0 1970-01-01 00:00:00.000000350                32.0  32.0 1970-01-01 00:00:00.000000350                  32.0             0.00000 1970-01-01 00:00:00.000000335
# 2番目のウィンドウの統計情報です。データポイントは1つだけで、平均値と最大・最小値が全て32.0です。標準偏差は0.0です。

2      1 1970-01-01 00:00:00.000001000                  64.0 1970-01-01 00:00:00.000001003 1970-01-01 00:00:00.000001000                 64.0 1970-01-01 00:00:00.000001000                64.0  64.0 1970-01-01 00:00:00.000001000                  64.0             0.00000 1970-01-01 00:00:00.000000669
# 3番目のウィンドウの統計情報です。このウィンドウもデータポイントが1つだけで、平均値と最大・最小値が全て64.0です。標準偏差は0.0です。

←

PREVIOUSfoundryts.functions.skip_nonfinite

NEXTfoundryts.functions.sum

→