Model assets do not currently support the SparkML library. We recommend switching to a single-node machine learning framework like PyTorch, TensorFlow, XGBoost, LightGBM, or scikit-learn.
Models can be trained in a Jupyter® notebook in Code Workspaces. To train a model, complete the following steps:
The supervised model training tutorial provides additional instruction on model training in Jupyter® code workspaces.
After establishing a workspace, you can create a new notebook to import data and begin writing model training code.
Code Workspaces grants access to packages available in other Foundry code authoring environments, such as Code Repositories. To add a new package, open the Packages tab available on the left sidebar of your workspace, search for the package you need, and select, then click on Latest or another available version to open a Terminal and run the corresponding install command.
The Code Workspaces application enables users to import existing Foundry datasets for use as training data. Training data used in Code Workspaces will need a human-readable alias as its resource identifier.
pandas DataFrame
. To copy the generated code snippet into your notebook, select the clipboard icon in the upper right corner of the code snippet, then select Done. Below is an example of a code snippet generated by Code Workspaces:Copied!1 2 3
from foundry.transforms import Dataset training_data = Dataset.get("my-alias").read_table(format="pandas")
Launcher
panel by selecting Python [user-default], and pasting the code snippet into the first cell.The open source tools available for model development in Code Workspaces allow you to train your model for a wide array of analytical use cases, such as regression or classification. Below is a sample linear regression model that predicts median household income using scikit-learn
.
scikit-learn
in your workspace by selecting the Packages icon under Data in the left sidebar.Conda
or PyPi
managers in the dropdown to the left of the search bar, then search for scikit-learn
.maestro env conda install
or maestro env pip install
commands.Package installation using the sidebar:
Package installation from the terminal:
After writing and running your model in Code Workspaces, you can publish it to Foundry for integration across other applications. Below is an example of model training code:
Copied!1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
from sklearn.impute import SimpleImputer from sklearn.linear_model import LinearRegression from sklearn.pipeline import Pipeline from sklearn.preprocessing import StandardScaler numeric_features = ['median_income', 'housing_median_age', 'total_rooms'] numeric_transformer = Pipeline( steps=[ ("imputer", SimpleImputer(strategy="median")), ("scaler", StandardScaler()) ] ) model = Pipeline( steps=[ ("preprocessor", numeric_transformer), ("classifier", LinearRegression()) ] ) X_train = training_dataframe[numeric_features] y_train = training_dataframe['median_house_value'] model.fit(X_train, y_train)
To make a model available outside of Code Workspaces, you must add a new model output to the workspace. Code Workspaces will automatically create and store a new .py
file in your existing workspace after you create a new model output, which you can use to implement a model adapter. Model adapters provide a standard interface for all models in Foundry, ensuring the platform's production applications can consume models immediately after they are created. Foundry infrastructure will load the model, configure its Python dependencies, expose its API(s), and enable model interfacing.
After you name and save your model, you will be prompted to Publish a new model in the left panel of your workspace. Complete Step 1: Install palantir_models
by copying the code snippet to your clipboard and running it in your original .ipynb
notebook file.
After you successfully install palantir_models
, create and develop your model adapter in Step 2: Develop your model adapter. A model adapter must implement the following methods:
save
and load
: In order to reuse your model, you need to define how your model should be saved and loaded. Palantir provides default methods of serialization (saving), and in more complex cases, you can implement custom serialization logic.api
: Defines the API of your model and tells Foundry what type of input data your model requires.predict
: Called by Foundry to provide data to your model. This is where you can pass input data to the model and generate inferences (predictions).Refer to the model adapter API reference for more details.
The code sample below implements the functions described above to develop an adapter for a linear regression model using scikit-learn
:
Copied!1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
import palantir_models as pm from palantir_models_serializers import DillSerializer class LinearRegressionModelAdapter(pm.ModelAdapter): @pm.auto_serialize( model=DillSerializer() ) def __init__(self, model): self.model = model @classmethod def api(cls): columns = [ ('median_income', float), ('housing_median_age', float), ('total_rooms', float), ] return {"df_in": pm.Pandas(columns)}, \ {"df_out": pm.Pandas(columns + [('prediction', float)])} def predict(self, df_in): df_in['prediction'] = self.model.predict( df_in[['median_income', 'housing_median_age', 'total_rooms']] ) return df_in
Refer to the model adapter documentation for more guidance.
To publish the model to Foundry, copy the available snippet for the model you wish to publish in the left sidebar under Step 3: Publish your model, paste it in your notebook and run the cell. Here is an example snippet to publish a linear regression model using the LinearRegressionModelAdapter
written above:
Copied!1 2 3 4 5 6 7 8 9 10 11
from palantir_models.code_workspaces import ModelOutput # Model adapter has been defined in linear_regression_model_adapter.py from linear_regression_model_adapter import LinearRegressionModelAdapter # sklearn_model is a model trained in another cell linear_regression_model_adapter = LinearRegressionModelAdapter(sklearn_model) # "linear_regression_model" is the alias for this example model model_output = ModelOutput("linear_regression_model") model_output.publish(linear_regression_model_adapter)
The snippet should work as is, with the exception of having to properly pass the model you trained to the adapter initialization. Once the code is ready, you can run the cell to publish the model to Foundry.
Models can be consumed through submission to a modeling objective. A model can be submitted to a modeling objective for:
Models can also be consumed using model deployments, which represent an alternative model hosting system beyond modeling objectives.
Jupyter®, and JupyterLab®, are trademarks or registered trademarks of NumFOCUS.
All third-party trademarks referenced remain the property of their respective owners. No affiliation or endorsement is implied.