Evaluations

AIP Logic Evaluations enable you to write detailed tests for your Logic functions. You can use Evaluations to:

  • Debug and improve Logic functions and prompts.
  • Compare different models, like GPT-4 vs. GPT-3.5 on your functions.
  • Examine variance across multiple runs of Logic functions.

Core concepts

Evaluation function: The method used when comparing or evaluating the actual output of a Logic function against the expected output(s).

Evaluation suite: The collection of evaluation functions and test cases used to build performance benchmarks for AIP Logic functions.

Test cases: Defined sets of inputs and expected outputs that are passed into evaluation functions during evaluation suite runs.

Metrics: The results of evaluation functions. Metrics are produced per test case and can be compared in aggregate or individually between runs.