Evaluations

AIP Logic Evaluations enable you to write detailed tests for your Logic functions. You can use Evaluations to:

  • Debug and improve Logic functions and prompts.
  • Compare different models, like GPT-4 vs. GPT-3.5 on your functions.
  • Examine variance across multiple runs of Logic functions.

Core concepts

Evaluation function

The method used when comparing or evaluating the actual output of a Logic function against the expected output(s).

Evaluation suite

The collection of Evaluation functions and test cases used to build performance benchmarks for AIP Logic functions.

Test cases

Defined sets of inputs and expected outputs that are passed into Evaluation functions during evaluation suite runs.

Metrics

The results of Evaluation functions. Metrics are produced per test case and can be compared in aggregate or individually between runs.