foundryts.functions.distribution

foundryts.functions.distribution(start=None, end=None, start_value=None, end_value=None, bins=None)

Returns a function that will evaluate the distribution of one or more time-series.

A distribution is a breakdown of points into bins of values that partition the requested range of values. Evaluating the distribution returns a list of the bins which describe the number of points in their range, as well as the start and end of the range.

The distribution can be applied to a single series or multiple series, in which case the distribution function considers a union of values from all series for each bin in the final dataframe.

The delta for the value range for each bin is constant and is calculated using (max value - min value) / (number of bins)

  • Parameters:
    • start (Union [int , datetime , str ] , optional) – Timestamp (inclusive) to start evaluating a distribution over the provided series (default is the earliest timestamp in any of the input time series)
    • end (Union [int , datetime , str ] , optional) – Timestamp (exclusive) to end evaluating a distribution over the provided series (default is the latest timestamp in any of the input time series)
    • start_value (float , optional) – Lower bound (inclusive) of the value range to evaluate the distribution over (default is the minimum value of any of the input time series)
    • end_value (float , optional) – Upper bound (exclusive) of the value range to evaluate the distribution over (default is the maximum value of any of the input time series)
    • bins (int , optional) – Number of value-bins to distribute points over (default is 10).
  • Returns: A function that accepts one or more series as inputs and generates the distribution over all points in the specified or default number of bins.
  • Return type: (Union[FunctionNode, NodeCollection]) -> SummarizerNode

Dataframe schema

Column nameTypeDescription
start_timestampdatetimeStart time of the distribution (inclusive)
end_timestampdatetimeEnd time of the distribution (exclusive)
startfloatLower bound of values (inclusive)
endfloatUpper bound of values (exclusive)
deltafloatThe difference between the min and max values of
each bin. Given how bins are calculated, delta is
fixed for all bins.
distribution_values.startfloatStart value of a distribution bin
distribution_values.endfloatEnd value of a distribution bin
distribution_values.countintNumber of instances in a distribution bin
Note

This function is only applicable to numeric series.

Examples

Copied!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 >>> series_1 = F.points( ... (1, 0.0), ... (101, 10.2), ... (200, 11.3), ... (201, 11.1), ... (299, 11.2), ... (300, 12.0), ... (400, 11.7), ... (500, 16.0), ... (123450, 11.8), ... name="series-1", ... ) >>> series_2 = F.points( ... (1, 0.5), ... (101, 0.2), ... (200, 1.3), ... (201, 0.1), ... (299, 1.2), ... (300, 1.4), ... (400, 1.0), ... (500, 2.0), ... (123450, 1.0), ... name="series-2", ... ) >>> series_1.to_pandas() timestamp value 0 1970-01-01 00:00:00.000000001 0.0 1 1970-01-01 00:00:00.000000101 10.2 2 1970-01-01 00:00:00.000000200 11.3 3 1970-01-01 00:00:00.000000201 11.1 4 1970-01-01 00:00:00.000000299 11.2 5 1970-01-01 00:00:00.000000300 12.0 6 1970-01-01 00:00:00.000000400 11.7 7 1970-01-01 00:00:00.000000500 16.0 8 1970-01-01 00:00:00.000123450 11.8 >>> series_2.to_pandas() timestamp value 0 1970-01-01 00:00:00.000000001 0.5 1 1970-01-01 00:00:00.000000101 0.2 2 1970-01-01 00:00:00.000000200 1.3 3 1970-01-01 00:00:00.000000201 0.1 4 1970-01-01 00:00:00.000000299 1.2 5 1970-01-01 00:00:00.000000300 1.4 6 1970-01-01 00:00:00.000000400 1.0 7 1970-01-01 00:00:00.000000500 2.0 8 1970-01-01 00:00:00.000123450 1.0 >>> nc = NodeCollection(series_1, series_2)
Copied!
1 2 3 4 5 6 >>> single_dist = F.distribution(bins=3)(series_1) # single series distribution >>> single_dist.to_pandas() delta distribution_values.count distribution_values.end distribution_values.start end end_timestamp start start_timestamp 0 5.333333 1 5.333333 0.000000 16.0 2262-01-01 0.0 1677-09-21 00:12:43.145225216 1 5.333333 1 10.666667 5.333333 16.0 2262-01-01 0.0 1677-09-21 00:12:43.145225216 2 5.333333 7 16.000000 10.666667 16.0 2262-01-01 0.0 1677-09-21 00:12:43.145225216
Copied!
1 2 3 4 5 6 >>> multiple_dist = F.distribution(bins=3)(nc) # multiple series distribution >>> multiple_dist.to_pandas() delta distribution_values.count distribution_values.end distribution_values.start end end_timestamp start start_timestamp 0 5.333333 10 5.333333 0.000000 16.0 2262-01-01 0.0 1677-09-21 00:12:43.145225216 1 5.333333 1 10.666667 5.333333 16.0 2262-01-01 0.0 1677-09-21 00:12:43.145225216 2 5.333333 7 16.000000 10.666667 16.0 2262-01-01 0.0 1677-09-21 00:12:43.145225216