Returns a function that will evaluate the distribution of one or more time-series.
A distribution is a breakdown of points into bins of values that partition the requested range of values. Evaluating the distribution returns a list of the bins which describe the number of points in their range, as well as the start and end of the range.
The distribution can be applied to a single series or multiple series, in which case the distribution function considers a union of values from all series for each bin in the final dataframe.
The delta for the value range for each bin is constant and is calculated using (max value - min value) / (number of bins)
Column name | Type | Description |
---|---|---|
start_timestamp | datetime | Start time of the distribution (inclusive) |
end_timestamp | datetime | End time of the distribution (exclusive) |
start | float | Lower bound of values (inclusive) |
end | float | Upper bound of values (exclusive) |
delta | float | The difference between the min and max values of each bin. Given how bins are calculated, delta is fixed for all bins. |
distribution_values.start | float | Start value of a distribution bin |
distribution_values.end | float | End value of a distribution bin |
distribution_values.count | int | Number of instances in a distribution bin |
This function is only applicable to numeric series.
Copied!1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
>>> series_1 = F.points( ... (1, 0.0), ... (101, 10.2), ... (200, 11.3), ... (201, 11.1), ... (299, 11.2), ... (300, 12.0), ... (400, 11.7), ... (500, 16.0), ... (123450, 11.8), ... name="series-1", ... ) >>> series_2 = F.points( ... (1, 0.5), ... (101, 0.2), ... (200, 1.3), ... (201, 0.1), ... (299, 1.2), ... (300, 1.4), ... (400, 1.0), ... (500, 2.0), ... (123450, 1.0), ... name="series-2", ... ) >>> series_1.to_pandas() timestamp value 0 1970-01-01 00:00:00.000000001 0.0 1 1970-01-01 00:00:00.000000101 10.2 2 1970-01-01 00:00:00.000000200 11.3 3 1970-01-01 00:00:00.000000201 11.1 4 1970-01-01 00:00:00.000000299 11.2 5 1970-01-01 00:00:00.000000300 12.0 6 1970-01-01 00:00:00.000000400 11.7 7 1970-01-01 00:00:00.000000500 16.0 8 1970-01-01 00:00:00.000123450 11.8 >>> series_2.to_pandas() timestamp value 0 1970-01-01 00:00:00.000000001 0.5 1 1970-01-01 00:00:00.000000101 0.2 2 1970-01-01 00:00:00.000000200 1.3 3 1970-01-01 00:00:00.000000201 0.1 4 1970-01-01 00:00:00.000000299 1.2 5 1970-01-01 00:00:00.000000300 1.4 6 1970-01-01 00:00:00.000000400 1.0 7 1970-01-01 00:00:00.000000500 2.0 8 1970-01-01 00:00:00.000123450 1.0 >>> nc = NodeCollection(series_1, series_2)
Copied!1 2 3 4 5 6
>>> single_dist = F.distribution(bins=3)(series_1) # single series distribution >>> single_dist.to_pandas() delta distribution_values.count distribution_values.end distribution_values.start end end_timestamp start start_timestamp 0 5.333333 1 5.333333 0.000000 16.0 2262-01-01 0.0 1677-09-21 00:12:43.145225216 1 5.333333 1 10.666667 5.333333 16.0 2262-01-01 0.0 1677-09-21 00:12:43.145225216 2 5.333333 7 16.000000 10.666667 16.0 2262-01-01 0.0 1677-09-21 00:12:43.145225216
Copied!1 2 3 4 5 6
>>> multiple_dist = F.distribution(bins=3)(nc) # multiple series distribution >>> multiple_dist.to_pandas() delta distribution_values.count distribution_values.end distribution_values.start end end_timestamp start start_timestamp 0 5.333333 10 5.333333 0.000000 16.0 2262-01-01 0.0 1677-09-21 00:12:43.145225216 1 5.333333 1 10.666667 5.333333 16.0 2262-01-01 0.0 1677-09-21 00:12:43.145225216 2 5.333333 7 16.000000 10.666667 16.0 2262-01-01 0.0 1677-09-21 00:12:43.145225216