foundryts.functions.scatter

foundryts.functions.scatter(start=None, end=None, before='NONE', internal='LINEAR', after='NONE', regression=None, regression_fit=None)

Returns a function that will generate the list of aligned (x,y) points for exactly two time series.

A scatter plot consists of (x,y) coordinates. For two given time series, an (x,y) coordinate will consist of a point from each series where the timestamps match. For points where the underlying series timestamps do not match, the configured interpolation strategy will be used for the series missing a point at that timestamp.

Read about supported interpolation strategies for internal, before and after in interpolate()

Additionally, you can pass a regression function to find the best fit line across the points in the graph.

  • Parameters:
    • start (int | datetime | str , optional) – Timestamp (inclusive) to start aligning points (default is pandas.Timestamp.min)
    • end (int | datetime | str , optional) – Timestamp (exclusive) to finish aligning points (default is pandas.Timestamp.max)
    • before (str | List [str ] , optional) – Name of the interpolation strategy to use for aligning the first point, use interpolation available strategies from interpolate() (default is NONE)
    • internal (str | List [str ] , optional) – Name of the interpolation strategy to use for aligning all points within the series, use interpolation available strategies from interpolate() (default is LINEAR)
    • after (str | List [str ] , optional) – Name of the interpolation strategy to use for aligning the last point, use interpolation available strategies from interpolate() (default is NONE)
    • regression (linear_regression() | polynomial_regression() | exponential_regression(), optional) – Output of one of the regression functions, this will provide points for the best fit line (as well as other related metrics) line between the two input series (defaults to no regression).
  • Returns: Returns a function that accepts exactly two series as input, and returns the aligned points for the scatterplot. Each row in the resulting dataframe represents an aligned point.
  • Return type: (NodeCollection) -> SummarizerNode

Dataframe schema

Column nameTypeDescription
is_truncatedboolThis field is deprecated and should be ignored.
If the output was truncated for a large series.
points.first_valuefloatValue of the point in the first series.
points.second_valuefloatValue of point in the second series.
points.timestampdatetimeTimestamp of the points.
regression.*floatColumns from the regression function (if
regression is used).
Note

This function is only applicable to numeric series.

Examples

Copied!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 >>> series_1 = F.points((11, 21.0), (13, 23.0), (15, 25.0), (17, 27.0), name="series-1") >>> series_2 = F.points((11, 21.0), (13, 23.0), (17, 37.0), (37, 47.0), name="series-2") >>> series_1.to_pandas() timestamp value 0 1970-01-01 00:00:00.000000011 21.0 1 1970-01-01 00:00:00.000000013 23.0 2 1970-01-01 00:00:00.000000015 25.0 3 1970-01-01 00:00:00.000000017 27.0 >>> series_2.to_pandas() timestamp value 0 1970-01-01 00:00:00.000000011 21.0 1 1970-01-01 00:00:00.000000013 23.0 2 1970-01-01 00:00:00.000000017 37.0 3 1970-01-01 00:00:00.000000037 47.0 >>> nc = NodeCollection([series_1, series_2])
Copied!
1 2 3 4 5 6 7 8 9 10 11 12 >>> scatter_plot = F.scatter( # scatter plot with interpolation ... before="NEAREST", ... internal="LINEAR", ... after="NEAREST", ... )(nc) >>> scatter_plot.to_pandas() is_truncated points.first_value points.second_value points.timestamp 0 False 21.0 21.0 1970-01-01 00:00:00.000000011 1 False 23.0 23.0 1970-01-01 00:00:00.000000013 2 False 25.0 30.0 1970-01-01 00:00:00.000000015 3 False 27.0 37.0 1970-01-01 00:00:00.000000017 4 False 27.0 47.0 1970-01-01 00:00:00.000000037
Copied!
1 2 3 4 5 6 7 8 9 10 11 12 13 >>> lin_regression_scatter_plot = F.scatter( ... before="NEAREST", ... internal="LINEAR", ... after="NEAREST", ... regression=F.linear_regression(), ... )(nc) >>> lin_regression_scatter_plot.to_pandas() is_truncated points.first_value points.second_value points.timestamp regression.max_bounds.first_value regression.max_bounds.second_value regression.min_bounds.first_value regression.min_bounds.second_value regression.regression_fit_function.linear_regression_fit.intercept regression.regression_fit_function.linear_regression_fit.slope regression.regression_fit_function.linear_regression_fit.statistics.rsquared 0 False 21.0 21.0 1970-01-01 00:00:00.000000011 27.0 47.0 21.0 21.0 -59.926471 3.720588 0.827161 1 False 23.0 23.0 1970-01-01 00:00:00.000000013 27.0 47.0 21.0 21.0 -59.926471 3.720588 0.827161 2 False 25.0 30.0 1970-01-01 00:00:00.000000015 27.0 47.0 21.0 21.0 -59.926471 3.720588 0.827161 3 False 27.0 37.0 1970-01-01 00:00:00.000000017 27.0 47.0 21.0 21.0 -59.926471 3.720588 0.827161 4 False 27.0 47.0 1970-01-01 00:00:00.000000037 27.0 47.0 21.0 21.0 -59.926471 3.720588 0.827161