The below documentation describes the foundry_ml
library which is no longer recommended for use in the platform. Instead, use the palantir_models
library. You can also learn how to migrate a model from the foundry_ml
to the palantir_models
framework through an example.
The foundry_ml
library will be removed on October 31, 2025, corresponding with the planned deprecation of Python 3.9.
If Foundry's standard Model functions are insufficient for a particular use case, or a particular class or library isn't supported, you can overwrite certain functions or register the ones necessary for the Stage interface. For example, you can create a custom transform
function of a Stage
which is then serialized into Foundry.
If you would like to support a third party library, you can create your own implementations of Stage
.
To walk through an example, see the tutorial on how to leverage a pre-trained spaCy model for named entity recognition.
These should be written in a shared Transforms Python library that can be added to your Code Repository or Code Workbook environment. Once imported, your custom Stage
implementations are automatically integrated into foundry_ml
.
The custom implementations options describe:
model.transform()
is called.Since the Stage
classes need to be available at deserialization time, they must be available as a module in your Python environment.
To use your model class, you will need to ensure that the class has a registered transformation function and a serialization format. Suppose that you want to ensure that whenever we have a model that contains a CustomModel
Stage, we need to define the function that is applied when model.transform()
is called.
The defined transform
function must operate on either a Spark or Pandas DataFrame.
Copied!1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
from foundry_ml.stage.flexible import stage_transform, register_stage_transform_for_class class CustomModel(object): def __init__(self, name): ... def custom_transform(self, df): ... return df # Annotate a function that will wrap the model and data passed between stages. @stage_transform() def _transform(model, df): # This calls the model's transformation function defined above return model.custom_transform(df) # Call this to send to Foundry ML Stage Registry, force=True to override any existing registered transform register_stage_transform_for_class(CustomModel, _transform, force=True)
Now that you have registered a transformation function for the class, you need to tell Foundry how to serialize and deserialize the model code. When using a custom-written model stage, it's important that the stage be written in a shared Python library and imported as a dependency.
This is because the Stage
class needs to be available at deserialization time. Otherwise, if you write the Stage
class in a Code Workbook and then try to load the saved model from a different Code Workbook, the model will be unable to load.
The example below assumes that CustomModel
can be pickled using dill
↗. The example below leverages two Foundry helper functions load_data
and safe_write_data
for reading and writing the models safely to and from the filesystem. In the spaCy example we show a different implementation.
Copied!1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
import dill from foundry_object.utils import safe_write_data, load_data from foundry_ml.stage.serialization import deserializer, serializer, register_serializer_for_class # Deserializer decorator @deserializer("custom_model.dill", force=True) def _deserializer(filesystem, path): # Loading pickled file return dill.loads(load_data(filesystem, path, True), encoding='latin1') # Serializer decorator @serializer(_deserializer) def _serializer(filesystem, value): path = 'custom_model.dill' safe_write_data(filesystem, path, dill.dumps(value), base64_encode=True) return path register_serializer_for_class(CustomModel, _serializer, force=True)
Now that you have properly registered the CustomModel
, it can be used just like any other Stage with the syntax model = Model(Stage(CustomModel(...)))
and be executed with model.transform(dataframe)
.
Some model stages (particularly simulation wrappers) contain user-authored functions alongside the main transform function. For the model to be executable, the user-authored functions must also be serialized into the model state. Internally, Python uses the pickle
package to save the functions; the pickle
package requires additonal considerations to properly serialize functions.
Upon loading the stage, you may get errors such as ModuleNotFoundError: No module named '...'
. This may occur when the user-authored function is serialized by reference instead of by value. This means that instead of serializing the Python byte code directly, pickle
serialized the name of the function.
To force serialization by value, you can move the code directly into your transform function.
As an example:
class SimModel(SimulationWrapper):
def run(self, data, parameters):
...
@transform(...)
def my_model(...):
return Model(Stage(
SimModel(parameters)
))
Should be written as:
Copied!1 2 3 4 5 6 7 8 9
@transform(...) def my_model(...): class SimModel(SimulationWrapper): def run(self, data, parameters): ... return Model(Stage( SimModel(parameters) ))
This rule also applies to any other functions that your custom code may call. If your serialized function or class has many dependencies that are also serialized by value, the recommended path is to pull out the dependencies into a Python library and add it to the model as a dependency.
Assuming all the above code is placed in model.py
, then the repository will have the following structure:
├── README.md
├── build.gradle
├── ci.yml
├── conda_recipe
│ └── meta.yaml
├── gradle.properties
├── gradlew
├── gradleww
├── settings.gradle
├── src
│ ├──custom_plugin
│ │ ├── __init__.py
│ │ └── model.py
│ ├── setup.cfg
│ └── setup.py
└── templateConfig.json
In order for Foundry to be able to discover the plugin, you must first modify __init__.py
to import the contents of model.py
to the top-level of the package:
Copied!1
from .model import *
In addition, you need to add the following to setup.py
for the Model plugin registry to discover the new plugin:
Copied!1
entry_points={'foundry_ml.plugins': ['plugin = custom_plugin']},
Once you commit, build, and tag a release, your new model class should be available to leverage in Code Workbook or Code Repositories.
If Foundry's standard functions are not sufficient for a particular use case, you can override the transform
function for an existing Model class. The steps are the same as in the section above to register a transform for a custom class.
However, note that the registry is on a class level. This means that if you override the transform()
function for a particular library function (such as sklearn's LogisticRegression
), every instance of that library function will use your overridden transform function whenever you import the library containing your overrides.
If this behavior is undesired, you can solve this by:
LogisticRegressionCustom
), that wraps the library functionThen, you can use this new class without modifying the behavior of any calls into the library function.
When trying to use a custom stage in a serialized model, you may encounter the error foundry_ml_core.stage.flexible._flexible_stage.FlexibleStageException: No stage_transform registered for stage type: <class 'NoneType'>
.
This error can often be resolved with the following steps:
__init__.py
files import the class as described above.We currently do not support PyPI packages in Foundry models as dependencies must be solved from Conda.