The following documentation provides an example on how to train a scikit-learn binary classification model using the open source UCI ML Breast Cancer Wisconsin (Diagnostic) ↗ dataset in the Code Repositories application using the Model Training Template
.
For a detailed walkthrough of the following steps, including how to author a model adapter and write Python transforms for model training, refer to our documentation on how to train a model in Code Repositories.
First, author a model adapter using the Model Training Template
in Code Repositories.
The example logic below assumes the following:
model
.columns
, prediction
, probability_0
, and probability_1
, where,
prediction
is 0 or 1, with 0 being no cancer detected, and 1 being cancer detected.probability_0
is the probability that cancer was not detected.probability_1
is the probability that cancer was detected.python 3.8.18
, pandas 1.5.3
, scikit-learn 1.3.2
, and dill 0.3.7
Copied!1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44
import palantir_models as pm from palantir_models_serializers import * class SklearnClassificationAdapter(pm.ModelAdapter): @pm.auto_serialize( model=DillSerializer() ) def __init__(self, model): self.model = model @classmethod def api(cls): columns = [ 'mean_radius', 'mean_texture', 'mean_perimeter', 'mean_area', 'mean_smoothness', 'mean_compactness', 'mean_concavity', 'mean_concave_points', 'mean_symmetry', 'mean_fractal_dimension', 'radius_error', 'texture_error', 'perimeter_error', 'area_error', 'smoothness_error', 'compactness_error', 'concavity_error', 'concave_points_error', 'symmetry_error', 'fractal_dimension_error', 'worst_radius', 'worst_texture', 'worst_perimeter', 'worst_area', 'worst_smoothness', 'worst_compactness', 'worst_concavity', 'worst_concave_points', 'worst_symmetry', 'worst_fractal_dimension' ] inputs = {"df_in": pm.Pandas(columns=columns)} outputs = {"df_out": pm.Pandas(columns= columns + [ ("prediction", int), ("probability_0", float), ("probability_1", float) ])} return inputs, outputs def predict(self, df_in): X = df_in.copy() predictions = self.model.predict(X) probabilities = self.model.predict_proba(X) df_in['prediction'] = predictions for idx, label in enumerate(self.model.classes_): df_in[f"probability_{label}"] = probabilities[:, idx] return df_in
In the same repository in model_training/model_training.py
, author the model training logic.
This example uses the open source UCI ML Breast Cancer Wisconsin (Diagnostic) dataset ↗ provided in the scikit-learn library.
Copied!1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41
from transforms.api import transform from palantir_models.transforms import ModelOutput from main.model_adapters.adapter import SklearnClassificationAdapter from sklearn.datasets import load_breast_cancer from sklearn.compose import make_column_transformer from sklearn.impute import SimpleImputer from sklearn.pipeline import Pipeline from sklearn.preprocessing import StandardScaler from sklearn.ensemble import RandomForestClassifier @transform( model_output=ModelOutput("/path/to/model_asset"), ) def compute(model_output): X_train, y_train = load_breast_cancer(as_frame=True, return_X_y=True) X_train.columns = X_train.columns.str.replace(' ', '_') columns = X_train.columns numeric_transformer = Pipeline( steps=[ ("imputer", SimpleImputer(strategy="median")), ("scaler", StandardScaler()) ] ) preprocessor = make_column_transformer( (numeric_transformer, columns), remainder="passthrough" ) model = Pipeline( steps=[ ("preprocessor", preprocessor), ("classifier", RandomForestClassifier(n_estimators=50, max_depth=3)) ] ) model.fit(X_train, y_train) foundry_model = SklearnClassificationAdapter(model) model_output.publish(model_adapter=foundry_model)
You can run inference with your model in a Python transform. For example, once your model has been trained, copy the below inference logic into the model_training/run_inference.py
file and select Build.
Copied!1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
from transforms.api import transform, Output from palantir_models.transforms import ModelInput from sklearn.datasets import load_breast_cancer @transform( inference_output=Output("ri.foundry.main.dataset.5dd9907f-79bc-4ae9-a106-1fa87ff021c3"), model=ModelInput("ri.models.main.model.cfc11519-28be-4f3e-9176-9afe91ecf3e1"), ) def compute(inference_output, model): X, y = load_breast_cancer(as_frame=True, return_X_y=True) X.columns = X.columns.str.replace(' ', '_') inference_results = model.transform(X) inference_output.write_pandas(inference_results.df_out)
A Palantir model can be submitted to a modeling objective for the following:
After submitting this model to a modeling objective, you can launch a sandbox deployment to host this model for live inference. Once the sandbox is launched and ready, you can perform live inference and connect this model to an operational application.
The example below shows input for the binary classification model using the single I/O endpoint:
[
{
"mean_radius": 15.09,
"mean_texture": 23.71,
"mean_perimeter": 92.65,
"mean_area": 944.07,
"mean_smoothness": 0.53,
"mean_compactness": 0.21,
"mean_concavity": 0.76,
"mean_concave_points": 0.39,
"mean_symmetry": 0.08,
"mean_fractal_dimension": 0.14,
"radius_error": 0.49,
"texture_error": 0.82,
"perimeter_error": 2.51,
"area_error": 17.22,
"smoothness_error": 0.07,
"compactness_error": 0.01,
"concavity_error": 0.05,
"concave_points_error": 0.05,
"symmetry_error": 0.01,
"fractal_dimension_error": 0.08,
"worst_radius": 12.95,
"worst_texture": 20.66,
"worst_perimeter": 185.41,
"worst_area": 624.87,
"worst_smoothness": 0.18,
"worst_compactness": 0.26,
"worst_concavity": 0.01,
"worst_concave_points": 0.05,
"worst_symmetry": 0.29,
"worst_fractal_dimension": 0.05
}
]