foundryts.functions.statistics

foundryts.functions.statistics(start=None, end=None, window=None, **kwargs)

Returns a function that will partition a single series into windows and compute statistics over each window.

The series is partitioned into windows using the window arg. Each window contains statistics listed in the Dataframe schema. The statistics are calculated over rolling windows created either based on the periodicity (width of each window) or a fixed number of windows where width is calculated using round(total number of points / number of buckets). The option used for the rolling window is decided by the window or window_count argument is passed.

  • Parameters:
    • start (Union [int , datetime , str ] , optional) – Timestamp (inclusive) to start partitioning windows from the provided series. (default is the entire series)

    • end (Union [int , datetime , str ] , optional) – Timestamp (inclusive) to end partitioning windows from the provided series. (default is the entire series)

    • window (Union [int , datetime , str ] , optional) –

      The timedelta which is the width of each window, and the size of each window is used to divide the series into a : number of windows. (default is the entire series)

    • **kwargs – Flags for determining the window behavior and the output type.

  • Keyword Arguments:
    • include_std_dev (bool , False) – If set to True, the output will include the standard deviation.
    • window_count (int , optional) – Number of windows to compute the statistics over (instead of the size of each window).
  • Returns: Returns a function that accepts a single series as input, and partitions it into windows with each window providing statistics over each window.
  • Return type: (FunctionNode) -> SummarizerNode

Dataframe schema

Column nameTypeDescription
countintNumber of data points in the window of the input
series.
earliest_point.timestampdatetimeTimestamp of the first data point in the window of
the input series.
earliest_point.valuefloatValue of the first data point in the window of
the input series.
end_timestampdatetimeTimestamp of the last data point
largest_point.timestampdatetimeTimestamp of the data point with the largest value
in the window of the input series.
largest_point.valuefloatLargest value in the window of the input series.
latest_point.timestampdatetimeTimestamp of the most recent data point in the
window of the input series.
latest_point.valuefloatValue of the most recent data point in the window
of the input series.
meanfloatAverage value of all data points in the window of
the input series.
smallest_point.timestampdatetimeTimestamp of the data point with the smallest value
in the window of the input series.
smallest_point.valuefloatSmallest value in the window of the input series.
start_timestampdatetimeTimestamp of the first data point

Notes

This function is only applicable to numeric series.

In the future, the include_std_dev kwarg will be deprecated as this feature will be made the default.

window_count can only be used with include_std_dev, and this will override window. If passed without include_std_dev, window_count will be ignored.

Examples

Copied!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 >>> series = F.points( ... (1, 8.0), ... (101, 4.0), ... (200, 2.0), ... (201, 1.0), ... (299, 35.0), ... (300, 16.0), ... (350, 32.0), ... (1000, 64.0), ... ) timestamp value 0 1970-01-01 00:00:00.000000001 8.0 1 1970-01-01 00:00:00.000000101 4.0 2 1970-01-01 00:00:00.000000200 2.0 3 1970-01-01 00:00:00.000000201 1.0 4 1970-01-01 00:00:00.000000299 35.0 5 1970-01-01 00:00:00.000000300 16.0 6 1970-01-01 00:00:00.000000350 32.0 7 1970-01-01 00:00:00.000001000 64.0
Copied!
1 2 3 4 5 6 7 8 >>> stats = F.statistics(window="100ns")(series) # use time-based window >>> stats.to_pandas() count earliest_point.timestamp earliest_point.value end_timestamp largest_point.timestamp largest_point.value latest_point.timestamp latest_point.value mean smallest_point.timestamp smallest_point.value start_timestamp 0 1 1970-01-01 00:00:00.000000001 8.0 1970-01-01 00:00:00.000000100 1970-01-01 00:00:00.000000001 8.0 1970-01-01 00:00:00.000000001 8.0 8.000000 1970-01-01 00:00:00.000000001 8.0 1970-01-01 00:00:00.000000000 1 1 1970-01-01 00:00:00.000000101 4.0 1970-01-01 00:00:00.000000200 1970-01-01 00:00:00.000000101 4.0 1970-01-01 00:00:00.000000101 4.0 4.000000 1970-01-01 00:00:00.000000101 4.0 1970-01-01 00:00:00.000000100 2 3 1970-01-01 00:00:00.000000200 2.0 1970-01-01 00:00:00.000000300 1970-01-01 00:00:00.000000299 35.0 1970-01-01 00:00:00.000000299 35.0 12.666667 1970-01-01 00:00:00.000000201 1.0 1970-01-01 00:00:00.000000200 3 2 1970-01-01 00:00:00.000000300 16.0 1970-01-01 00:00:00.000000400 1970-01-01 00:00:00.000000350 32.0 1970-01-01 00:00:00.000000350 32.0 24.000000 1970-01-01 00:00:00.000000300 16.0 1970-01-01 00:00:00.000000300 4 1 1970-01-01 00:00:00.000001000 64.0 1970-01-01 00:00:00.000001100 1970-01-01 00:00:00.000001000 64.0 1970-01-01 00:00:00.000001000 64.0 64.000000 1970-01-01 00:00:00.000001000 64.0 1970-01-01 00:00:00.000001000
Copied!
1 2 3 4 5 6 7 8 >>> stats_with_std_dev = F.statistics(window="100ns", include_std_dev=True)(series) >>> stats_with_std_dev.to_pandas() count earliest_point.timestamp earliest_point.value end_timestamp largest_point.timestamp largest_point.value latest_point.timestamp latest_point.value mean smallest_point.timestamp smallest_point.value standard_deviation start_timestamp 0 1 1970-01-01 00:00:00.000000001 8.0 1970-01-01 00:00:00.000000100 1970-01-01 00:00:00.000000001 8.0 1970-01-01 00:00:00.000000001 8.0 8.000000 1970-01-01 00:00:00.000000001 8.0 0.000000 1970-01-01 00:00:00.000000000 1 1 1970-01-01 00:00:00.000000101 4.0 1970-01-01 00:00:00.000000200 1970-01-01 00:00:00.000000101 4.0 1970-01-01 00:00:00.000000101 4.0 4.000000 1970-01-01 00:00:00.000000101 4.0 0.000000 1970-01-01 00:00:00.000000100 2 3 1970-01-01 00:00:00.000000200 2.0 1970-01-01 00:00:00.000000300 1970-01-01 00:00:00.000000299 35.0 1970-01-01 00:00:00.000000299 35.0 12.666667 1970-01-01 00:00:00.000000201 1.0 15.797327 1970-01-01 00:00:00.000000200 3 2 1970-01-01 00:00:00.000000300 16.0 1970-01-01 00:00:00.000000400 1970-01-01 00:00:00.000000350 32.0 1970-01-01 00:00:00.000000350 32.0 24.000000 1970-01-01 00:00:00.000000300 16.0 8.000000 1970-01-01 00:00:00.000000300 4 1 1970-01-01 00:00:00.000001000 64.0 1970-01-01 00:00:00.000001100 1970-01-01 00:00:00.000001000 64.0 1970-01-01 00:00:00.000001000 64.0 64.000000 1970-01-01 00:00:00.000001000 64.0 0.000000 1970-01-01 00:00:00.000001000
Copied!
1 2 3 4 5 6 >>> stats_fixed_window_count = F.statistics(include_std_dev=True, window_count=3)(series) >>> stats_fixed_window_count.to_pandas() count earliest_point.timestamp earliest_point.value end_timestamp largest_point.timestamp largest_point.value latest_point.timestamp latest_point.value mean smallest_point.timestamp smallest_point.value standard_deviation start_timestamp 0 6 1970-01-01 00:00:00.000000001 8.0 1970-01-01 00:00:00.000000335 1970-01-01 00:00:00.000000299 35.0 1970-01-01 00:00:00.000000300 16.0 11.0 1970-01-01 00:00:00.000000201 1.0 11.83216 1970-01-01 00:00:00.000000001 1 1 1970-01-01 00:00:00.000000350 32.0 1970-01-01 00:00:00.000000669 1970-01-01 00:00:00.000000350 32.0 1970-01-01 00:00:00.000000350 32.0 32.0 1970-01-01 00:00:00.000000350 32.0 0.00000 1970-01-01 00:00:00.000000335 2 1 1970-01-01 00:00:00.000001000 64.0 1970-01-01 00:00:00.000001003 1970-01-01 00:00:00.000001000 64.0 1970-01-01 00:00:00.000001000 64.0 64.0 1970-01-01 00:00:00.000001000 64.0 0.00000 1970-01-01 00:00:00.000000669