The ModelInput
class allows you to load and use models within Python transforms, making it easy to incorporate model inference logic into your data pipelines. To learn more about using models in code workspaces, you can review details on the ModelInput
class in Jupyter® Code Workspaces.
Copied!1 2 3 4 5 6 7 8
from palantir_models.transforms import ModelInput ModelInput( alias, # (string) Path or RID of model to load model_version=None, # (Optional) RID of specific model version use_sidecar=False, # (Optional) Run model in separate container sidecar_resources=None # (Optional) Resource configuration for sidecar )
Parameter | Type | Description | Version / Notes | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
alias | str | Path or resource ID (RID) of the model resource to load from. | |||||||||||||
model_version | Optional[str] | RID or semantic version of the specific model version to use. If not specified, the latest version will be used. | |||||||||||||
use_sidecar | Optional[bool] | When True , runs the model in a separate container to prevent dependency conflicts between the model adapter and transform environment. Note that Lightweight transforms do not support using sidecars. | Introduced in palantir_models version 0.1673.0 | ||||||||||||
sidecar_resources | Optional[Dict[str, Union[float, int]]] | Resource configuration for the sidecar container. This parameter can only be used when use_sidecar is set to True . Supports the following options:
| Introduced in palantir_models version 0.1673.0 |
The code snippet below demonstrates the usage of a model in a transform. The platform will create an instance of the model adapter class that was defined for the model version, giving you access to the methods defined in the adapter. This following example assumes the adapter for the model has a single Pandas input and a single Pandas output DataFrame called output_df
, specified in its API. The transform method on the model adapter, which leverages your provided predict method, automatically converts data_in
, a TransformInput
instance, into the tabular input (either a Spark or Pandas DataFrame) expected by your model adapter as defined in the API.
Copied!1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
from transforms.api import Input, lightweight, Output, transform, TransformInput, TransformOutput from palantir_models import ModelAdapter from palantir_models.transforms import ModelInput # Use Lightweight if the model does not require Spark @lightweight @transform( out=Output('path/to/output'), model_input=ModelInput( "path/to/my/model", # Use specific model version. # The model version can be copied from the left sidebar on the model page. model_version="ri.models.main.model-version.74b03bd6-5715-4904-85f8-4a29499e05a3" ), data_in=Input("path/to/input") ) def my_transform(out: TransformOutput, model_input: ModelAdapter, data_in: TransformInput) -> None: inference_results = model_input.transform(data_in) predictions = inference_results.output_df # Alternatively, you can use the predict method on # a Pandas DataFrame instance directly: # predictions = model_input.predict(data_in.pandas()) out.write_pandas(predictions)
By default, transforms run on Spark, in which case the model adapter instance will be loaded on the driver and require additional logic to distribute work over executors. The only exceptions among the serializers defined in palantir_models
is the SparkMLAutoSerializer
class designed for Spark ML models. The SparkMLAutoSerializer
specifically handles distributing the model to each executor, and results in a model instance that natively runs on executors over Spark DataFrame inputs.
For this reason, we recommend simply using the lightweight
decorator for most use cases. If required, and if your model expects a single pandas DataFrame input and output, you can use the DistributedInferenceWrapper
to handle the distribution of the model over the executors, as described below.
To instantiate the model adapter class, the environment must have access to the model adapter code. In particular, if the model was created in a different repository, the adapter code, which is packaged alongside the model as a Python library, needs to be imported as a dependency in your repository. The application will prompt you to do this, as shown in the screenshot below.
You can specify a particular model version using the model_version
parameter. This is especially recommended if the model is not being retrained on a regular schedule as it helps prevent an unintended or problematic model from reaching production. If you do not specify a model version, the system will use the latest model available on the build’s branch by default.
Note that if no version is specified, each transform run will automatically fetch the latest model files for the model input, but it will not automatically update the adapter library version (containing the adapter logic you authored for that version and its Python dependencies) in the repository if the model was generated outside of the repository where it is being used. To update the library version, you will need to select the appropriate adapter version in the repository’s Libraries sidebar and verify that all checks pass. The adapter version corresponding to each model version can be found on the model’s page under Inference configuration.
If this workflow does not suit your needs, consider either using the model within the same repository where it is created or setting use_sidecar
to True
, as explained below.
Running a model in a sidecar container (use_sidecar=True
) is recommended for most use cases where the model was built in a code workspace or in a repository different from the one being used for generating predictions.
The main benefit of running the model as a sidecar is that the exact same library versions used to produce the model will also be used to run inference with it. In contrast, importing the adapter code as prompted by the repository user interface will create a new environment solve that merges the constraints from the adapter code and the repository. This may result in different library versions being used.
Additionally, when using a sidecar container to run the model, the adapter code corresponding to the model version being used will automatically be loaded in the sidecar without the user having to manually update the dependency and run checks in the repository.
When using a sidecar, predict()
requests are automatically routed to the sidecar container without any additional code changes required:
Copied!1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
from transforms.api import Input, lightweight, Output, transform, TransformInput, TransformOutput from palantir_models import ModelAdapter # Lightweight is not supported with `use_sidecar` @transform( out=Output('path/to/output'), model_input=ModelInput( "path/to/my/model", ), data_in=Input("path/to/input") ) def my_transform(out: TransformOutput, model_input: ModelAdapter, data_in: TransformInput) -> None: predictions = model_input.transform(data_in) out.write_pandas(predictions)
The example below will provision a sidecar alongside the driver and executor, each with 1 GPU, 2 CPUs and 4 GB of memory.
Copied!1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
from transforms.api import Input, Output, transform, TransformInput, TransformOutput from palantir_models import ModelAdapter from palantir_models.transforms import ModelInput @transform( out=Output('path/to/output'), model_input=ModelInput( "path/to/my/model", use_sidecar=True, sidecar_resources={ "cpus": 2.0, "memory_gb": 4.0, "gpus": 1 } ), data_in=Input("path/to/input") ) def my_transform(out: TransformOutput, model_input: ModelAdapter, data_in: TransformInput) -> None: ...
You can run distributed model inference using Spark executors. This approach can be beneficial for batch inference involving computationally heavy models or large datasets, with near-linear scalability.
Consider the following code snippet demonstrating how you can wrap an existing model for distributed inference:
Copied!1 2 3 4 5 6 7 8 9 10 11 12
from transforms.api import transform, Input, Output, configure from palantir_models.transforms import ModelInput, DistributedInferenceWrapper @transform( input_df=Input("ri.foundry.main.dataset.3cd098b3-aae0-455a-9383-4eec810e0ac0"), model_input=ModelInput("ri.models.main.model.5b758039-370c-4cfc-835e-5bd3f213454c"), output=Output("ri.foundry.main.dataset.c0a3edbc-c917-4f20-88f1-d797ebf27cb2"), ) def compute(ctx, input_df, model_input, output): model_input = DistributedInferenceWrapper(model_input, ctx, 'auto') inference_outputs = model_input.predict(input_df.dataframe()) output.write_dataframe(inference_outputs)
The DistributedInferenceWrapper
class is initialized with the following parameters:
Parameter | Type | Description | Notes |
---|---|---|---|
model | ModelAdapter | The model adapter instance to be wrapped. This is typically the model_input provided by ModelInput . | |
ctx | TransformContext | The transform context, used to access Spark session information. This is typically the ctx argument of your transform function. | |
num_partitions | Union[Literal["auto"], int] | Number of partitions to use for the Spark DataFrame. If 'auto' , it will be set to match the number of Spark executors. If you experience Out Of Memory (OOM) errors, try increasing this value. | Default: 'auto' |
max_rows_per_chunk | int | Spark splits each partition into chunks before sending it to the model. This parameter sets the maximum number of rows allowed per chunk. More rows per chunk means less overhead but more memory usage. | Default: 1,000,000 |
Usage notes:
use_sidecar
parameter (described above) is optional.