ModelInput in transforms

The ModelInput class allows you to load and use models within Python transforms, making it easy to incorporate model inference logic into your data pipelines. To learn more about using models in code workspaces, you can review details on the ModelInput class in Jupyter® Code Workspaces.

Class definition

Copied!1
2
3
4
5
6
7
8
from palantir_models.transforms import ModelInput

ModelInput(
    alias,                  # (string) Path or RID of model to load
    model_version=None,     # (Optional) RID of specific model version
    use_sidecar=False,      # (Optional) Run model in separate container
    sidecar_resources=None  # (Optional) Resource configuration for sidecar
)

Parameters

Parameter Type Description Version / Notes

alias str Path or resource ID (RID) of the model resource to load from.

model_version Optional[str] RID or semantic version of the specific model version to use. If not specified, the latest version will be used.

use_sidecar Optional[bool] When True, runs the model in a separate container to prevent dependency conflicts between the model adapter and transform environment. Note that Lightweight transforms do not support using sidecars. Introduced in palantir_models version 0.1673.0

sidecar_resources

Optional[Dict[str, Union[float, int]]]

Resource configuration for the sidecar container. This parameter can only be used when use_sidecar is set to True.

Supports the following options:

Option	Type	Description
`"cpus"`	`float`	Number of CPUs for the sidecar container
`"memory_gb"`	`float`	Memory in GB for the sidecar container
`"gpus"`	`int`	Number of GPUs for the sidecar container

Introduced in palantir_models version 0.1673.0

Usage notes

The code snippet below demonstrates the usage of a model in a transform. The platform will create an instance of the model adapter class that was defined for the model version, giving you access to the methods defined in the adapter. This following example assumes the adapter for the model has a single Pandas input and a single Pandas output DataFrame called output_df, specified in its API. The transform method on the model adapter, which leverages your provided predict method, automatically converts data_in, a TransformInput instance, into the tabular input (either a Spark or Pandas DataFrame) expected by your model adapter as defined in the API.

Copied!1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
from transforms.api import Input, lightweight, Output, transform, TransformInput, TransformOutput
from palantir_models import ModelAdapter
from palantir_models.transforms import ModelInput


# Use Lightweight if the model does not require Spark
@lightweight
@transform(
    out=Output('path/to/output'),
    model_input=ModelInput(
        "path/to/my/model",
        # Use specific model version.
        # The model version can be copied from the left sidebar on the model page.
        model_version="ri.models.main.model-version.74b03bd6-5715-4904-85f8-4a29499e05a3"
    ),
    data_in=Input("path/to/input")
)
def my_transform(out: TransformOutput, model_input: ModelAdapter, data_in: TransformInput) -> None:
    inference_results = model_input.transform(data_in)
    predictions = inference_results.output_df
    # Alternatively, you can use the predict method on
    # a Pandas DataFrame instance directly:
    # predictions = model_input.predict(data_in.pandas())
    out.write_pandas(predictions)

By default, transforms run on Spark, in which case the model adapter instance will be loaded on the driver and require additional logic to distribute work over executors. The only exceptions among the serializers defined in palantir_models is the SparkMLAutoSerializer class designed for Spark ML models. The SparkMLAutoSerializer specifically handles distributing the model to each executor, and results in a model instance that natively runs on executors over Spark DataFrame inputs.

For this reason, we recommend simply using the lightweight decorator for most use cases. If required, and if your model expects a single pandas DataFrame input and output, you can use the DistributedInferenceWrapper to handle the distribution of the model over the executors, as described below.

Importing the adapter code

To instantiate the model adapter class, the environment must have access to the model adapter code. In particular, if the model was created in a different repository, the adapter code, which is packaged alongside the model as a Python library, needs to be imported as a dependency in your repository. The application will prompt you to do this, as shown in the screenshot below.

Import dependencies if the model is from another repository.

Specifying a version

You can specify a particular model version using the model_version parameter. This is especially recommended if the model is not being retrained on a regular schedule as it helps prevent an unintended or problematic model from reaching production. If you do not specify a model version, the system will use the latest model available on the build’s branch by default.

Note that if no version is specified, each transform run will automatically fetch the latest model files for the model input, but it will not automatically update the adapter library version (containing the adapter logic you authored for that version and its Python dependencies) in the repository if the model was generated outside of the repository where it is being used. To update the library version, you will need to select the appropriate adapter version in the repository’s Libraries sidebar and verify that all checks pass. The adapter version corresponding to each model version can be found on the model’s page under Inference configuration.

If this workflow does not suit your needs, consider either using the model within the same repository where it is created or setting use_sidecar to True, as explained below.

Running models as sidecar containers

Running a model in a sidecar container (use_sidecar=True) is recommended for most use cases where the model was built in a code workspace or in a repository different from the one being used for generating predictions.

The main benefit of running the model as a sidecar is that the exact same library versions used to produce the model will also be used to run inference with it. In contrast, importing the adapter code as prompted by the repository user interface will create a new environment solve that merges the constraints from the adapter code and the repository. This may result in different library versions being used.

Additionally, when using a sidecar container to run the model, the adapter code corresponding to the model version being used will automatically be loaded in the sidecar without the user having to manually update the dependency and run checks in the repository.

When using a sidecar, predict() requests are automatically routed to the sidecar container without any additional code changes required:

Copied!1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
from transforms.api import Input, lightweight, Output, transform, TransformInput, TransformOutput
from palantir_models import ModelAdapter


# Lightweight is not supported with `use_sidecar`
@transform(
    out=Output('path/to/output'),
    model_input=ModelInput(
        "path/to/my/model",
    ),
    data_in=Input("path/to/input")
)
def my_transform(out: TransformOutput, model_input: ModelAdapter, data_in: TransformInput) -> None:
    predictions = model_input.transform(data_in)
    out.write_pandas(predictions)

Specifying resources with the sidecar

The example below will provision a sidecar alongside the driver and executor, each with 1 GPU, 2 CPUs and 4 GB of memory.

Copied!1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
from transforms.api import Input, Output, transform, TransformInput, TransformOutput
from palantir_models import ModelAdapter
from palantir_models.transforms import ModelInput


@transform(
    out=Output('path/to/output'),
    model_input=ModelInput(
        "path/to/my/model",
        use_sidecar=True,
        sidecar_resources={
            "cpus": 2.0,
            "memory_gb": 4.0,
            "gpus": 1
        }
    ),
    data_in=Input("path/to/input")
)
def my_transform(out: TransformOutput, model_input: ModelAdapter, data_in: TransformInput) -> None:
    ...

Distributed inference using Spark executors

You can run distributed model inference using Spark executors. This approach can be beneficial for batch inference involving computationally heavy models or large datasets, with near-linear scalability.

Consider the following code snippet demonstrating how you can wrap an existing model for distributed inference:

Copied!1
2
3
4
5
6
7
8
9
10
11
12
from transforms.api import transform, Input, Output, configure
from palantir_models.transforms import ModelInput, DistributedInferenceWrapper

@transform(
    input_df=Input("ri.foundry.main.dataset.3cd098b3-aae0-455a-9383-4eec810e0ac0"),
    model_input=ModelInput("ri.models.main.model.5b758039-370c-4cfc-835e-5bd3f213454c"),
    output=Output("ri.foundry.main.dataset.c0a3edbc-c917-4f20-88f1-d797ebf27cb2"),
)
def compute(ctx, input_df, model_input, output):
    model_input = DistributedInferenceWrapper(model_input, ctx, 'auto')
    inference_outputs = model_input.predict(input_df.dataframe())
    output.write_dataframe(inference_outputs)

The DistributedInferenceWrapper class is initialized with the following parameters:

Parameter	Type	Description	Notes
`model`	`ModelAdapter`	The model adapter instance to be wrapped. This is typically the `model_input` provided by `ModelInput`.
`ctx`	`TransformContext`	The transform context, used to access Spark session information. This is typically the `ctx` argument of your transform function.
`num_partitions`	`Union[Literal["auto"], int]`	Number of partitions to use for the Spark DataFrame. If `'auto'`, it will be set to match the number of Spark executors. If you experience Out Of Memory (OOM) errors, try increasing this value.	Default: `'auto'`
`max_rows_per_chunk`	`int`	Spark splits each partition into chunks before sending it to the model. This parameter sets the maximum number of rows allowed per chunk. More rows per chunk means less overhead but more memory usage.	Default: `1,000,000`

Usage notes:

You can configure the number of executors through Spark profiles.
The distributed wrapper uses Spark’s user-defined functions (UDFs).
The model should have one input and one output dataset, with any number of input parameters.
The input and output must be in the Pandas DataFrame format. Spark DataFrame is not supported.
Usage of the use_sidecar parameter (described above) is optional.

←

PREVIOUSAPI: ModelAdapter reference

NEXTAPI: Model save and load

→