An experiment is an artifact that represents a collection of metrics produced during a model training job. Experiments allow developers to log hyperparameters and metrics during a training job, visualize them on the model page, and compare between different model versions.
The model development process is inherently iterative, and it can be difficult to keep track of different attempts at producing a model. Experiments provide a lightweight Python API for logging details related to those different attempts, including metrics and hyperparameters. Those metrics and hyperparameters can be visualized and compared between different model versions to provide a better understanding of how different parameters affect the model's performance. Below is an overview of how to create and write to experiments.
The ModelOutput
class used to publish models from Jupyter® Code Workspaces and Code Repositories provides hooks for creating experiments.
Create experiments in Code Workspaces:
Copied!1 2 3 4
from palantir_models.code_workspaces import ModelOutput # `my-alias` is an alias to a model in the current workspace model_output = ModelOutput("my-alias") experiment = model_output.create_experiment(name="my-experiment")
Create experiments in Code Repositories:
Copied!1 2 3 4 5 6 7 8 9
from transforms.api import configure, transform, Input from palantir_models.transforms import ModelOutput @transform( input_data=Input("..."), model_output=ModelOutput("..."), ) def compute(input_data, model_output): experiment = model_output.create_experiment(name="my-experiment")
If any two experiments for a given model use the same name, they will be automatically deduplicated, allowing for the same code to be used multiple times without worrying about renaming the experiment.
Occasionally, model training code may fail due to network errors when writing to the experiment, or overflowing the maximum size of the series. While the Python client aims to handle these errors as gracefully as possible, there may be times where that is not possible. Clients can choose how errors should be handled, selecting from three different error handling variants:
FAIL
- Will instantly re-raise the error and the code will fail.WARN
(default) - Will log a warning for the error, then suppress all future errors.SUPPRESS
- Will not log anything.The error handler mode can be set during experiment creation as shown below:
Copied!1 2
from palantir_models.experiments import ErrorHandlerType experiment = model_output.create_experiment(name="my-experiment", error_handler_type=ErrorHandlerType.FAIL)
In order for experiments to be displayed in the model page, they must be published alongside a model version. Once published, experiments can be viewed in the model page.
Copied!1
model_output.publish(model_adapter, experiment=experiment)
Learn more about visualizing experiments after publishing.
Experiments support three types of logs: hyperparameters, metrics, and images.
Hyperparameters can be logged using the Experiment.log_param
and Experiment.log_params
functions. Hyperparameters are single key-value pairs that are used for storing static data associated with a model training job.
Copied!1 2 3 4 5 6
experiment.log_param("learning_rate", 1e-3) experiment.log_param("model_type", "CNN") experiment.log_params({ "batch_size": 12, "parallel": True })
Experiments currently support logging hyperparameters of the following types:
Metrics can be logged using the Experiment.log_metric
and Experiment.log_metrics
functions. Metrics are logged to a series, where the series tracks each logged value in a time series. Metric values must be numeric and the step must be strictly increasing.
When logging metrics, if the metric series has not been created, a new series will be created. Additionally, callers may pass a step
parameter to set the step to log to.
Copied!1 2 3 4 5 6
experiment.log_metric("train/acc", 1.5) experiment.log_metric("test/acc", 15, step=1) experiment.log_metrics({ "train/acc": 5, "train/loss": 0.9 })
Images can be logged using Experiment.log_image
. Images are logged to a series, where the series tracks each logged image in a time series. Images must be in PNG format or a Pillow ↗ image to be logged; other image formats will be rejected.
When logging images, if the image series has not been created, a new series will be created. Additionally, callers may pass a step
parameter to set the step to log to.
Copied!1 2 3 4 5 6 7
experiment.log_image("train/bounding_boxes", pillow_image) experiment.log_image("test/bounding_boxes", image_bytes_arr) experiment.log_image( "test/segmentation", "path/to/image.png", caption="Segmentation Image", step=1)
Image logging can also serve as a way to log custom charts.
Copied!1 2 3 4 5
import matplotlib.pyplot as plt plt.scatter(x_data, y_data) plt.savefig("path/to/image.png") experiment.log_image("scatter", "path/to/image.png")
MLflow ↗ is an open source toolkit for model training metrics tracking that has a wide range of built-in logging support for different machine learning libraries. Users can leverage MLflow and its autologging capabilities ↗ to streamline the integration of experiments into their model training code with minimal required changes.
After creating an experiment, users can set that experiment as the active MLflow run, and then use MLflow's Python API to write logs to the experiment.
Copied!1 2 3 4 5 6 7 8 9 10
import mlflow experiment = model_output.create_experiment(name="my-experiment") with experiment.as_mlflow_run(): mlflow.sklearn.autolog() # training code model_adapter = MyModelAdapter(trained_model) model_output.publish(model_adapter, experiment=experiment)
MLflow also provides hooks for more advanced machine learning libraries (such as Keras) that may require callbacks to log metrics:
Copied!1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
import keras import mlflow experiment = model_output.create_experiment(name="my-experiment") model = keras.Sequential( [ keras.Input([28, 28, 3]), keras.layers.Flatten(), keras.layers.Dense(2), ] ) model.compile( loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True), optimizer=keras.optimizers.Adam(0.001), metrics=[keras.metrics.SparseCategoricalAccuracy()], ) with experiment.as_mlflow_run(): model.fit( data, label, batch_size=16, epochs=8, callbacks=[mlflow.keras.MlflowCallback()], ) model_adapter = MyModelAdapter(trained_model) model_output.publish(model_adapter, experiment=experiment)
Currently, MLflow integration with experiments does not support the full suite of MLflow tooling. The following limitations apply:
The below table lists limits related to experiments in Foundry.
Description | Limit |
---|---|
Experiment/metric/hyperparameter name max length | 100 characters |
Maximum values across all metric series in an experiment | 100,000 |
Maximum number of hyperparameters per experiment | 500 |
To increase these limits, contact Palantir support.
Review the model experiments Python API reference for more information.