Evaluation suites for Logic functions

The Logic Preview run panel is great for one-off testing, but for greater confidence in your Logic functions, it is important to test against many inputs.

This tutorial will walk you through creating an evaluation suite for a simple Logic function with AIP Evals.

Create an evaluation suite from Logic

For this example, we have a Logic function that takes restaurant reviews as input and categorizes them as positive or negative based on review sentiment. We want to create test cases with review inputs and the expected sentiment output to validate our function.

Start by creating an evaluation suite. Once the Logic function has been saved for the first time, select the flask icon in the right toolbar to open the AIP Evals panel and select Set up tests. This will create an empty evaluation suite. Alternatively, you can create a new evaluation suite and simultaneously add a first test case by selecting Add as test case in the Preview run panel.

If you have not already added test cases directly from the Logic Preview run panel, you will need to save your Logic function before creating an evaluation suite. After saving, you can select Set up tests, which will create the initial evaluation suite.

Create your evaluation suite from AIP Logic.

After creating the evaluation suite, you can add test cases by selecting Edit tests in the AIP Evals panel. This will open the test case editor, where you can add inputs for each test case and save the evaluation suite.

Add test cases to the evaluation suite.

AIP will attempt to provide a descriptive name for your test case based off of the input parameters when you select Add as test case in the Preview run panel. You can also select the purple AIP star icon next to the test case name to generate a suggested name.

Generate a suggested test case name.

In this example, the suggested name of Negative Review On Food Quality adds more information than Test case 1:

Suggested names offer a brief description of the test case parameters.

After adding test cases, you can run the evaluation suite by selecting Run evaluation suite in the AIP Evals panel. This will run all test cases in the suite. When the suite is done running, review the results by selecting the card in the Most recent run section.

Run evaluation suite and review results

By default, AIP Evals will output the function's return value, but will not provide aggregated performance metrics. To scale your evaluation suite, add an evaluator to compare the outputs produced by the Logic function against the expected values and calculate aggregate metrics. For this example, use the built-in Exact string match evaluator. In practice, depending on the nature of your function, you may need to use other evaluators or write custom ones.

To add an evaluator, select + Add in the test case configuration header, then select Exact string match > Add. This will add the evaluator and open the evaluator editor, where you can map evaluator inputs to function outputs and test case columns. In this case, map the function output to the actual value and create a new parameter for the expected value. This will add a new column to the test case editor where you can input the expected sentiment for each test case.

You can configure the objective for each metric. For Boolean metrics, select whether a true or false value is considered a passing result. For numeric metrics, choose whether higher or lower values are better, and set a threshold if needed. The evaluation suite will automatically determine a passed or failed status for each test case based on these objectives.

Add exact string match evaluator.

After saving, you can run the evaluation suite again to view the aggregated metrics for your function and the passed or failed results for each test case based on your configured objectives.

Review results with string match evaluator.

Note that you do not have to run the entire suite every time you make a change to your function. You can run individual test cases by selecting the play icon next to the test case in the sidebar. This is useful for debugging and quickly iterating on your function.

After creating an evaluation suite, learn more about evaluation suite run configurations.

←

PREVIOUSOverview

NEXTCreate an evaluation suite

→