Dataset-backed models

Sunsetted functionality

The below documentation describes the foundry_ml library which is no longer recommended for use in the platform. Instead, use the palantir_models library. You can also learn how to migrate a model from the foundry_ml to the palantir_models framework through an example.

The foundry_ml library will be removed on October 31, 2025, corresponding with the planned deprecation of Python 3.9.

A dataset-backed model is a model that is developed with the foundry_ml Python library in Foundry. Dataset-backed models are units of data transformation capable of being serialized for inference in Foundry.

A model consists of a series of stages, each of which contains a series of stateful transformations, often trained using a machine learning algorithm. Models and stages provide a standardized interface, so that models built with disparate libraries and algorithms can have consistent semantics. This allows models to be used interchangeably across various applications in Foundry.

To create a model in Python, pass the stages you want to apply to the model constructor. For example: Model(stage1, stage2) will transform data by applying stage1 to the input data then stage2 to stage1's output and return stage2's output. Each model saves an API, a technical definition of what the model expects as inputs and what it will produce as output.

model details

Interface

As a model may wrap stages from distinct and incompatible frameworks, the Model interface enables all models to be transparently swappable.

These methods and properties are available on all Foundry ML models:

  • transform(data): Serially applies stages to produce a scored output.
  • append_stage(stage): Appends an additional stage to the model after the current stages.
  • input_spec: Describes the format of the expected input.
  • output_spec: Describes the format of the output.
  • stages: Get the list of stages in the model.

Serialization

Most modeling frameworks have their own serialization format and methodology. In particular, distributed and multi-language frameworks often have serialization methods that are difficult to understand. Foundry's model interface simplifies the serialization process with Architecture V2.

Architecture V2

Architecture V2 is a serialization format that adds support for a number of new features:

  • Typesafe serialization of non-Python models (e.g. containerized, typescript, and external).
  • Typesafe support for new model stage types.
  • Serialization of model API for use with modeling objectives and ensuring safe data inputs.
  • Tracking of model Conda dependencies.
  • Improved file size to reduce memory pressure.
  • Removal of legacy typespec and auto converter system which resulted in uninterpretable errors.

Templates

In Python Transforms, you can use the following templates:

Copied!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 from transforms.api import transform, Input, Output from foundry_ml import Model @transform( training_data=Input("/path/to/input/training/data"), out_model = Output("/path/to/output/model"), ) def create_model(training_data, out_model): df = training_data.dataframe() # train model here model = Model(*model stages) model.save(out_model) @transform( in_model = Input("/path/to/output/model"), test_data = Input("/path/to/input/test/data" ), out_data = Output("/path/to/scores/output"), ) def apply_model(in_model, test_data, out_data): model = Model.load(in_model) output_df = model.transform(test_data.dataframe()) out_data.write_dataframe(output_df)

In a Code Workbook, return an instance of foundry_ml.Model to save a model and load it by defining it as a parameter to your code block's function.

Stage

A model is a linear pipeline of individual transforms known as stages. All stages follow a common interface to perform their transformation.

Stage types

The basic stage contract is minimal, with a singular type.

A stage, called via Stage(), denotes a computational component of a model pipeline that wraps an external object (such as a sklearn model). Every call to transform() is mapped to some registered function on the hosted model.

A stage has the following responsibilities:

  • Host an external model object and provide a consistent interface (Stage.transform).
  • Promote parameters and create a specification of its input and output spec.
  • Serialize and deserialize the hosted model to a file.

Supported stages

In most cases, minimal customization of a stage is required outside the transformation object. Foundry provides a stage registry that automates creation of stages by type for supported libraries.

Leverage the help function to see which library classes are currently supported by foundry_ml and have registered transformation functions and serialization formats.

Parameters

A stage can contain many configurable parameters. Parameters are used to control the application of the transform function.

To support the standardized model and interfaces, parameters can be used to promote relevant information between stages.

For foundry_ml supported stages, required, default, and optional parameters have been dynamically generated. Learn more about available parameters.