Add support for additional libraries

Planned deprecation

The foundry_ml library, which is used to produce dataset-backed models, is in the planned deprecation phase of development and will be unavailable for use starting October 31, 2025. Full support remains available until the deprecation date. At this time, you should use the palantir_models library to produce model assets. You can also learn how to migrate a model from the foundry_ml to the palantir_models framework through an example.

Contact Palantir Support if you require additional help migrating your workflows.

If Foundry's standard model functions are insufficient for a particular use case, or a particular class or library isn't supported, you can overwrite certain functions or register the ones necessary for the Stage interface. For example, you can create a custom transform function of a Stage which is then serialized into Foundry.

If you would like to support a third party library, you can create your own implementations of Stage.

To walk through an example, see the tutorial on how to leverage a pre-trained spaCy model for named entity recognition.

Requirements

These should be written in a shared Transforms Python library that can be added to your Code Repository or Code Workbook environment. Once imported, your custom Stage implementations are automatically integrated into foundry_ml.

The custom implementations options describe:

What operation a model should perform when model.transform() is called.
How a model should be serialized and saved in Foundry.
How a model should be deserialized so that it can be used in downstream transforms.

Since the Stage classes need to be available at deserialization time, they must be available as a module in your Python environment.

Register transform

To use your model class, you will need to ensure that the class has a registered transformation function and a serialization format. Suppose that you want to ensure that whenever we have a model that contains a CustomModel Stage, we need to define the function that is applied when model.transform() is called.

The defined transform function must operate on either a Spark or Pandas DataFrame.

Copied!1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
from foundry_ml.stage.flexible import stage_transform, register_stage_transform_for_class

class CustomModel(object):

    def __init__(self, name):
        ...

    def custom_transform(self, df):
        ...
        return df

# Annotate a function that will wrap the model and data passed between stages.
@stage_transform()
def _transform(model, df):
    # This calls the model's transformation function defined above
    return model.custom_transform(df)

# Call this to send to Foundry ML Stage Registry, force=True to override any existing registered transform
register_stage_transform_for_class(CustomModel, _transform, force=True)

Register serializer and deserializer

Now that you have registered a transformation function for the class, you need to tell Foundry how to serialize and deserialize the model code. When using a custom-written model stage, it's important that the stage be written in a shared Python library and imported as a dependency.

This is because the Stage class needs to be available at deserialization time. Otherwise, if you write the Stage class in a Code Workbook and then try to load the saved model from a different Code Workbook, the model will be unable to load.

The example below assumes that CustomModel can be pickled using dill ↗. The example below leverages two Foundry helper functions load_data and safe_write_data for reading and writing the models safely to and from the filesystem. In the spaCy example we show a different implementation.

Copied!1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
import dill
from foundry_object.utils import safe_write_data, load_data
from foundry_ml.stage.serialization import deserializer, serializer, register_serializer_for_class

# Deserializer decorator
@deserializer("custom_model.dill", force=True)
def _deserializer(filesystem, path):
    # Loading pickled file
   return dill.loads(load_data(filesystem, path, True), encoding='latin1')

# Serializer decorator
@serializer(_deserializer)
def _serializer(filesystem, value):
   path = 'custom_model.dill'
   safe_write_data(filesystem, path, dill.dumps(value), base64_encode=True)
   return path

register_serializer_for_class(CustomModel, _serializer, force=True)

Now that you have properly registered the CustomModel, it can be used just like any other Stage with the syntax model = Model(Stage(CustomModel(...))) and be executed with model.transform(dataframe).

Serialize functions in models

Some model stages (particularly simulation wrappers) contain user-authored functions alongside the main transform function. For the model to be executable, the user-authored functions must also be serialized into the model state. Internally, Python uses the pickle package to save the functions; the pickle package requires additonal considerations to properly serialize functions.

Upon loading the stage, you may get errors such as ModuleNotFoundError: No module named '...'. This may occur when the user-authored function is serialized by reference instead of by value. This means that instead of serializing the Python byte code directly, pickle serialized the name of the function.

To force serialization by value, you can move the code directly into your transform function.

As an example:

class SimModel(SimulationWrapper):
    def run(self, data, parameters):
        ...

@transform(...)
def my_model(...):
  return Model(Stage(
        SimModel(parameters)
    ))

Should be written as:

Copied!1
2
3
4
5
6
7
8
9
@transform(...)
def my_model(...):
    class SimModel(SimulationWrapper):
        def run(self, data, parameters):
            ...

  return Model(Stage(
        SimModel(parameters)
    ))

This rule also applies to any other functions that your custom code may call. If your serialized function or class has many dependencies that are also serialized by value, the recommended path is to pull out the dependencies into a Python library and add it to the model as a dependency.

Configure the shared library

Assuming all the above code is placed in model.py, then the repository will have the following structure:

├── README.md
├── build.gradle
├── ci.yml
├── conda_recipe
│   └── meta.yaml
├── gradle.properties
├── gradlew
├── gradleww
├── settings.gradle
├── src
│   ├──custom_plugin
│   │   ├── __init__.py
│   │   └── model.py
│   ├── setup.cfg
│   └── setup.py
└── templateConfig.json

In order for Foundry to be able to discover the plugin, you must first modify __init__.py to import the contents of model.py to the top-level of the package:

Copied!1
from .model import *

In addition, you need to add the following to setup.py for the Model plugin registry to discover the new plugin:

Copied!1
entry_points={'foundry_ml.plugins': ['plugin = custom_plugin']},

Once you commit, build, and tag a release, your new model class should be available to leverage in Code Workbook or Code Repositories.

Override transform functions in library classes

If Foundry's standard functions are not sufficient for a particular use case, you can override the transform function for an existing Model class. The steps are the same as in the section above to register a transform for a custom class.

However, note that the registry is on a class level. This means that if you override the transform() function for a particular library function (such as sklearn's LogisticRegression), every instance of that library function will use your overridden transform function whenever you import the library containing your overrides.

If this behavior is undesired, you can solve this by:

Creating a wrapper class (for example, LogisticRegressionCustom), that wraps the library function
Following the steps above to register it with the desired transform function

Then, you can use this new class without modifying the behavior of any calls into the library function.

Troubleshooting

No stage_transform registered for stage

When trying to use a custom stage in a serialized model, you may encounter the error foundry_ml_core.stage.flexible._flexible_stage.FlexibleStageException: No stage_transform registered for stage type: <class 'NoneType'>.

This error can often be resolved with the following steps:

Make sure the plugin package containing the custom stage is part of the environment.
Ensure that the package is configured correctly. In particular, be sure that the package is registered as a plugin and that the __init__.py files import the class as described above.
Occasionally, Conda will resolve to an outdated version of the package. To ensure the correct version of the package is used, we recommend pinning the Conda environment to the specific plugin version required.

PyPI packages

We currently do not support PyPI packages in Foundry models as dependencies must be solved from Conda.

←

PREVIOUSSearch natively supported libraries

NEXTPerform a GridSearch

→