The Logic Preview run panel is great for one-off testing, but for greater confidence in your Logic functions, it is important to test against many inputs.
This tutorial will walk you through creating an evaluation suite for a simple Logic function with AIP Evals.
For this example, we have a Logic function that takes restaurant reviews as input and categorizes them as positive or negative based on review sentiment. We want to create test cases with review inputs and the expected sentiment output to validate our function.
Start by creating an evaluation suite. Once the Logic function has been saved for the first time, select the hammer icon in the right toolbar to open the AIP Evals panel and select Set up tests. This will create an empty evaluation suite. Alternatively, you can create a new evaluation suite and simultaneously add a first test case by selecting Add as test case in the Preview run panel.
After creating the evaluation suite, you can add test cases by selecting Edit tests in the AIP Evals panel. This will open the test case editor, where you can add inputs for each test case and save the evaluation suite.
After adding test cases, you can run the evaluation suite by selecting Run evaluation suite in the AIP Evals panel. This will run all test cases in the suite. When the suite is done running, review the results by selecting the card in the Most recent result section.
By default, AIP Evals will output the function's return value, but will not provide aggregated performance metrics. To scale your evaluation suite, add an evaluator to compare the outputs produced by the Logic function against the expected values and calculate aggregate metrics. For this example, use the built-in Exact string match evaluator. In practice, depending on the nature of your function, you may need to use other evaluators or write custom ones.
To add an evaluator, select + Add in the test case configuration header, then select Exact string match > Add. This will add the evaluator and open the evaluator editor, where you can map evaluator inputs to function outputs and test case columns. In this case, map the function output to the actual value and create a new parameter for the expected value. This will add a new column to the test case editor where you can input the expected sentiment for each test case.
After saving, you can run the evaluation suite again to see the aggregated metrics for your function.
Note that you do not have to run the entire suite every time you make a change to your function. You can run individual test cases by selecting the play icon next to the test case in the sidebar. This is useful for debugging and quickly iterating on your function.
After creating an evaluation suite, learn more about evaluation suite run configurations.