foundryts.functions.statistics

foundryts.functions.statistics(start=None, end=None, window=None, **kwargs)

Returns a function that will partition a single series into windows and compute statistics over each window.

The series is partitioned into windows using the window arg. Each window contains statistics listed in the Dataframe schema. The statistics are calculated over rolling windows created either based on the periodicity (width of each window) or a fixed number of windows where width is calculated using round(total number of points / number of buckets). The option used for the rolling window is decided by the window or window_count argument is passed.

Parameters:
- start (Union [int , datetime , str ] , optional) – Timestamp (inclusive) to start partitioning windows from the provided series. (default is the entire series)
- end (Union [int , datetime , str ] , optional) – Timestamp (inclusive) to end partitioning windows from the provided series. (default is the entire series)
- window (Union [int , datetime , str ] , optional) –
  
  The timedelta which is the width of each window, and the size of each window is used to divide the series into a : number of windows. (default is the entire series)
- **kwargs – Flags for determining the window behavior and the output type.
Keyword Arguments:
- include_std_dev (bool , False) – If set to True, the output will include the standard deviation.
- window_count (int , optional) – Number of windows to compute the statistics over (instead of the size of each window).
Returns: Returns a function that accepts a single series as input, and partitions it into windows with each window providing statistics over each window.
Return type: (FunctionNode) -> SummarizerNode

Dataframe schema

Column name	Type	Description
count	int	Number of data points in the window of the input series.
earliest_point.timestamp	datetime	Timestamp of the first data point in the window of the input series.
earliest_point.value	float	Value of the first data point in the window of the input series.
end_timestamp	datetime	Timestamp of the last data point
largest_point.timestamp	datetime	Timestamp of the data point with the largest value in the window of the input series.
largest_point.value	float	Largest value in the window of the input series.
latest_point.timestamp	datetime	Timestamp of the most recent data point in the window of the input series.
latest_point.value	float	Value of the most recent data point in the window of the input series.
mean	float	Average value of all data points in the window of the input series.
smallest_point.timestamp	datetime	Timestamp of the data point with the smallest value in the window of the input series.
smallest_point.value	float	Smallest value in the window of the input series.
start_timestamp	datetime	Timestamp of the first data point

Notes

This function is only applicable to numeric series.

In the future, the include_std_dev kwarg will be deprecated as this feature will be made the default.

window_count can only be used with include_std_dev, and this will override window. If passed without include_std_dev, window_count will be ignored.

Examples

Copied!1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
>>> series = F.points(
...     (1, 8.0),
...     (101, 4.0),
...     (200, 2.0),
...     (201, 1.0),
...     (299, 35.0),
...     (300, 16.0),
...     (350, 32.0),
...     (1000, 64.0),
... )
                      timestamp  value
0 1970-01-01 00:00:00.000000001    8.0
1 1970-01-01 00:00:00.000000101    4.0
2 1970-01-01 00:00:00.000000200    2.0
3 1970-01-01 00:00:00.000000201    1.0
4 1970-01-01 00:00:00.000000299   35.0
5 1970-01-01 00:00:00.000000300   16.0
6 1970-01-01 00:00:00.000000350   32.0
7 1970-01-01 00:00:00.000001000   64.0

Copied!1
2
3
4
5
6
7
8
>>> stats = F.statistics(window="100ns")(series) # use time-based window
>>> stats.to_pandas()
   count      earliest_point.timestamp  earliest_point.value                 end_timestamp       largest_point.timestamp  largest_point.value        latest_point.timestamp  latest_point.value       mean      smallest_point.timestamp  smallest_point.value               start_timestamp
0      1 1970-01-01 00:00:00.000000001                   8.0 1970-01-01 00:00:00.000000100 1970-01-01 00:00:00.000000001                  8.0 1970-01-01 00:00:00.000000001                 8.0   8.000000 1970-01-01 00:00:00.000000001                   8.0 1970-01-01 00:00:00.000000000
1      1 1970-01-01 00:00:00.000000101                   4.0 1970-01-01 00:00:00.000000200 1970-01-01 00:00:00.000000101                  4.0 1970-01-01 00:00:00.000000101                 4.0   4.000000 1970-01-01 00:00:00.000000101                   4.0 1970-01-01 00:00:00.000000100
2      3 1970-01-01 00:00:00.000000200                   2.0 1970-01-01 00:00:00.000000300 1970-01-01 00:00:00.000000299                 35.0 1970-01-01 00:00:00.000000299                35.0  12.666667 1970-01-01 00:00:00.000000201                   1.0 1970-01-01 00:00:00.000000200
3      2 1970-01-01 00:00:00.000000300                  16.0 1970-01-01 00:00:00.000000400 1970-01-01 00:00:00.000000350                 32.0 1970-01-01 00:00:00.000000350                32.0  24.000000 1970-01-01 00:00:00.000000300                  16.0 1970-01-01 00:00:00.000000300
4      1 1970-01-01 00:00:00.000001000                  64.0 1970-01-01 00:00:00.000001100 1970-01-01 00:00:00.000001000                 64.0 1970-01-01 00:00:00.000001000                64.0  64.000000 1970-01-01 00:00:00.000001000                  64.0 1970-01-01 00:00:00.000001000

Copied!1
2
3
4
5
6
7
8
>>> stats_with_std_dev = F.statistics(window="100ns", include_std_dev=True)(series)
>>> stats_with_std_dev.to_pandas()
   count      earliest_point.timestamp  earliest_point.value                 end_timestamp       largest_point.timestamp  largest_point.value        latest_point.timestamp  latest_point.value       mean      smallest_point.timestamp  smallest_point.value  standard_deviation               start_timestamp
0      1 1970-01-01 00:00:00.000000001                   8.0 1970-01-01 00:00:00.000000100 1970-01-01 00:00:00.000000001                  8.0 1970-01-01 00:00:00.000000001                 8.0   8.000000 1970-01-01 00:00:00.000000001                   8.0            0.000000 1970-01-01 00:00:00.000000000
1      1 1970-01-01 00:00:00.000000101                   4.0 1970-01-01 00:00:00.000000200 1970-01-01 00:00:00.000000101                  4.0 1970-01-01 00:00:00.000000101                 4.0   4.000000 1970-01-01 00:00:00.000000101                   4.0            0.000000 1970-01-01 00:00:00.000000100
2      3 1970-01-01 00:00:00.000000200                   2.0 1970-01-01 00:00:00.000000300 1970-01-01 00:00:00.000000299                 35.0 1970-01-01 00:00:00.000000299                35.0  12.666667 1970-01-01 00:00:00.000000201                   1.0           15.797327 1970-01-01 00:00:00.000000200
3      2 1970-01-01 00:00:00.000000300                  16.0 1970-01-01 00:00:00.000000400 1970-01-01 00:00:00.000000350                 32.0 1970-01-01 00:00:00.000000350                32.0  24.000000 1970-01-01 00:00:00.000000300                  16.0            8.000000 1970-01-01 00:00:00.000000300
4      1 1970-01-01 00:00:00.000001000                  64.0 1970-01-01 00:00:00.000001100 1970-01-01 00:00:00.000001000                 64.0 1970-01-01 00:00:00.000001000                64.0  64.000000 1970-01-01 00:00:00.000001000                  64.0            0.000000 1970-01-01 00:00:00.000001000

Copied!1
2
3
4
5
6
>>> stats_fixed_window_count = F.statistics(include_std_dev=True, window_count=3)(series)
>>> stats_fixed_window_count.to_pandas()
   count      earliest_point.timestamp  earliest_point.value                 end_timestamp       largest_point.timestamp  largest_point.value        latest_point.timestamp  latest_point.value  mean      smallest_point.timestamp  smallest_point.value  standard_deviation               start_timestamp
0      6 1970-01-01 00:00:00.000000001                   8.0 1970-01-01 00:00:00.000000335 1970-01-01 00:00:00.000000299                 35.0 1970-01-01 00:00:00.000000300                16.0  11.0 1970-01-01 00:00:00.000000201                   1.0            11.83216 1970-01-01 00:00:00.000000001
1      1 1970-01-01 00:00:00.000000350                  32.0 1970-01-01 00:00:00.000000669 1970-01-01 00:00:00.000000350                 32.0 1970-01-01 00:00:00.000000350                32.0  32.0 1970-01-01 00:00:00.000000350                  32.0             0.00000 1970-01-01 00:00:00.000000335
2      1 1970-01-01 00:00:00.000001000                  64.0 1970-01-01 00:00:00.000001003 1970-01-01 00:00:00.000001000                 64.0 1970-01-01 00:00:00.000001000                64.0  64.0 1970-01-01 00:00:00.000001000                  64.0             0.00000 1970-01-01 00:00:00.000000669