The below documentation describes the foundry_ml
library which is no longer recommended for use in the platform. Instead, use the palantir_models
library. You can also learn how to migrate a model from the foundry_ml
to the palantir_models
framework through an example.
The foundry_ml
library will be removed on October 31, 2025, corresponding with the planned deprecation of Python 3.9.
Dataset models can be created from Code Workbooks and Code Repositories. Whereas Code Workbooks are more interactive and come with advanced plotting features, Code Repositories support full Git functionality, PR workflows, and meta-programming.
This workflow assumes familiarity with Code Repositories and Python transforms.
Once you've created a new repository (or opened an existing one), make sure that foundry_ml
and scikit-learn
are available in the conda environment. You can validate the results by inspecting the meta.yaml
file. To run the code below, here is an example of the meta.yaml
file you will need:
Use the transform
decorator to save the model using the model.save(transform_output)
instance-method of your Model
object.
Note: In the code shown below, multiple Stage
objects are created and combined into a single Model
within a single transform, unlike the separately saved stages in the Code Workbook based Getting Started tutorial. Either approach works.
Copied!1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
from sklearn.compose import make_column_transformer from sklearn.linear_model import LogisticRegression from transforms.api import transform, Input, Output from foundry_ml import Model, Stage from foundry_ml_sklearn.utils import extract_matrix @transform( iris=Input("/path/to/input/iris_dataset"), out_model=Output("/path/to/output/model"), ) def create_model(iris, out_model): df = iris.dataframe().toPandas() column_transformer = make_column_transformer( ('passthrough', ['sepal_width', 'sepal_length', 'petal_width', 'petal_length']) ) # Fit the column transformer to act as a vectorizer column_transformer.fit(df[['sepal_width', 'sepal_length', 'petal_width', 'petal_length']]) # Wrap the vectorizer as a Stage to indicate this is the transformation that to be applied in the Model vectorizer = Stage(column_transformer) # Applies vectorizer to produce a dataframe with all original columns and the column of vectorized data, default name is "features" training_df = vectorizer.transform(df) # We have a helper function that can help convert column of vectors into NumPy matrix and handle sparsity X = extract_matrix(training_df, 'features') y = training_df['is_setosa'] # Train a logistic regression model - specify solver to prevent a future warning clf = LogisticRegression(solver='lbfgs') clf.fit(X, y) # Return Model object that now contains pipeline of transformations model = Model(vectorizer, Stage(clf, input_column_name='features')) # Syntax for saving down a Model model.save(out_model)
Use the transform
decorator to load the model using the Model.load(transform_input)
class-method of the Model
class.
Copied!1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
@transform( in_model=Input("/path/to/output/model"), test_data=Input("/path/to/data"), out_data=Output("/path/to/output/data/from/applying/model"), ) def apply_model(in_model, test_data, out_data): # Load the saved model from above model = Model.load(in_model) # Apply the model on the scoring or testing dataset pandas_df = test_data.dataframe().toPandas() output_df = model.transform(pandas_df) # the output of the transformation contains a column that is a numpy array, cast before saving it down output_df['features'] = output_df['features'].apply(lambda x: x.tolist()) # Write the results of applying the model out_data.write_pandas(output_df)
MetricSets are saved and loaded using the same syntax as a Model, i.e. metric_set.save(transform_output)
and MetricSet.load(transform_input)
.
You can generate MetricSets from the same transform that generates a Model (e.g. training-time hold-out metrics), or from a downstream transform that loads and runs the Model.
This is an experimental feature that is being actively developed. The user interface is subject to change in upcoming versions.
In Code Repositories, models can be configured to automatically submit to a Modeling objective when built. When configured, this means the Modeling objective will subscribe to the model and create a new submission whenever a transaction is successfully committed to that model.
To use this feature, you must first:
Create a Transforms Python Code Repository. Refer to the Python transforms tutorial for information on setting up a repository.
Add the foundry_ml
package as a runtime dependency to your meta.yaml
file and commit the changes. Once committed, you will see the Modeling objective helper panel alongside the other panels at the bottom of the interface, as shown in the screenshot below.
Automatic submission can only be configured for one model at a time by selecting each model individually. If multiple models are selected, you can view the list of objectives to which the models are submitting, but cannot configure automatic submission in bulk.
To configure automatic submission for a model, select the model from the Models panel on the left-hand side of the interface, click the Select modeling objective button, and choose the objective to which you want the model to submit.
The model will now automatically submit to that objective once built. To add an additional objective to submit to, click Add additional modeling objective and repeat these steps.
To stop submitting a particular model to an objective, select the model from the Models panel on the left-hand side of the interface and click on the three dots (...) beside the objective that you want to remove.
Select Remove objective and confirm your decision.
If configured, a model is automatically submitted whenever a transaction is successfully committed to that model. However, as it is not always desired to resubmit a model after every retraining, a model retraining and automatic submission can be aborted based on your desired criteria.
As an example, you can stop an automatic submission after retraining if a model has lower hold-out metrics during training time than a previous submission. To do this, perform the following steps in your model retraining transform:
model.save(transform_output)
.The above steps can be easily repurposed to perform many live checks during model retraining.
Automatic submission, in combination with build schedules, enables you to set up systematic model retraining.
Once you've set up automatic submission on a model, add a schedule to that same model. You can also include upstream datasets if needed. For instance:
You can now set any combination of time- and event-based triggers within your schedule.
Common configurations include:
AND
or OR
combinations of the above.We recommend familiarizing yourself with incremental transforms before proceeding with advanced retraining triggers.
In some cases, such as computationally expensive training jobs, you can selectively retrain a model by configuring the model build schedule to trigger off of other datasets along the model lifecycle. This enables you to trigger retraining based on feature drift, score drift, observed model degradation, user feedback, or other indicators.
To do this, we recommend creating and scheduling an "alerts" dataset. This can use a combination of factors that are appropriate for your use case:
You can use code-based or low/no-code tools (such as Contour) to encode your logic.
Downstream of the alerting logic, generate an append-only incremental "alerts" dataset using Python transforms within a code repository. In your transforms logic, abort the transaction whenever there are no new alerts.
You can now trigger your model retraining schedule on a "Transaction committed" event-trigger.
You can also set health checks on the alerts dataset, as well as use it in a custom review application.