Create a direct model deployment

Direct model deployments are live hosted endpoints that immediately connect models to user applications such as Workshop and Slate. Direct model deployments are queried in TypeScript through Functions on models or from an external system through a REST API call.

The following sections explain how to create, configure, and publish a direct model deployment and describe some debugging steps and feature considerations to review before getting started.

1. Create a direct model deployment

To create and start a direct model deployment, navigate to the model. Select Start Deployment at the top of a model page under Live Inference. Once running, you can interactively test the deployment by selecting Run.

Create a new direct model deployment in a model.

2. Configure a direct model deployment

To configure the resources of a direct model deployment, select the Configure button in the top right of the Live Inference panel. Direct model deployments can be configured to scale from zero. When the deployment reaches 70% capacity, it will create an additional replica until it reaches the maximum replica count specified in the runtime scaling configuration. This also allows deployments to automatically scale down after 45 minutes without a live request.

Configure a direct model deployment in a model

3. Publish a Function

To register a model as a no-code Function, select the plus icon in the model artifact sidebar and provide a model Function name. A user can register one Function per branch.

We recommend naming your Functions similarly between branches. For example, you could use Plane Detection Model on the main branch and Plane Detection Model [Develop] on the develop branch.

Publish a Function direct model deployment from a model.

Automatic upgrades

One direct model deployment can be created for each branch of a model. When a new model version is published to that branch, the direct model deployment will automatically upgrade to the new endpoint with no downtime. If you do not want automatic upgrades, consider using a Modeling Objective live deployment.

Automatic vertical scaling

Direct model deployments are backed by compute modules and therefore support automatic horizontal scaling between a user-specified minimum and maximum replica range.

Model API type safety

Direct model deployments enforce type safety for all inference requests to ensure the model API type matches the input type. Type safety is respected for all input types, particularly the following:

  • Numeric values: If the API of a model is defined as type int, and a value of 3.6 is passed to the model, the 0.6 will be truncated and the input will be 3.
  • Date and timestamps: Direct model deployments will cast date and timestamp types before being provided to the predict() method. Timestamp fields now expect a string with format ISO 8601.
  • Enforced API structure: Direct model deployments will explicitly require fields marked as required in the model API.

Model type safety is different from live modeling deployments which do not currently support type casting.

Debug a direct model deployment

To view debugging information and logs for your direct model deployment, select the Deployment health tab at the top of the model page. Here you can find the deployment's running build, health information about replicas, logs, and metrics about each replica's state.

Debug a direct model deployment from the model artifact page.

You can also view the call stack of your model inference under the Run Debug card. This allows you to see how long each python function took and where performance improvements can be made.

Note: This does not show the call stack in container models, or if an error is thrown during inference.

Debug a direct model deployment with "Run Debug".

Comparison: Direct model deployments vs Modeling Objective live deployments

The available features of direct model deployments differ from features of Modeling Objective live deployments. Review the table below for more details.