In distributed training, the training of a model takes place across multiple computing resources working in parallel. Foundry supports distributed model training in Spark environments. Distributed training enables you to:
Currently, only XGBoost Spark is directly supported for distributed model training in Foundry. Additional distributed training libraries may be supported in the future.
Follow the steps below to learn how to perform distributed model training in Foundry.
Before using distributed training libraries, you must configure your Spark environment properly using a Spark profile.
To perform distributed model training, you must import and apply the KUBERNETES_OPEN_PORTS_ALL
profile in your repository, as shown in the example code below:
Copied!1 2 3 4 5 6 7 8
from transforms.api import configure @configure(profile=[ "KUBERNETES_OPEN_PORTS_ALL", ]) @transform_df(...) def compute(...): ...
To reiterate: applying the KUBERNETES_OPEN_PORTS_ALL
profile is mandatory for distributed training.
XGBoost provides a seamless PySpark integration for distributed training via the SparkXGBClassifier ↗.
An example of basic setup for XGBoost Spark is as follows:
Copied!1 2 3 4 5 6 7 8 9 10
from xgboost.spark import SparkXGBClassifier @configure(profile=["KUBERNETES_OPEN_PORTS_ALL"]) @transform(...) def compute(...): xgb = SparkXGBClassifier( features_col=<your_feature_col_name>, # other parameters as needed ) model = xgb.fit(<your_training_dataframe>)
For additional configuration details, refer to the XGBoost Spark Documentation ↗.
To leverage GPUs for distributed model training, follow the steps below:
EXECUTOR_GPU_ENABLED
Spark profile to your transform.SparkXGBClassifier
device
parameter to 'gpu'
or 'cuda'
, depending on your GPU setup.Example:
Copied!1 2 3 4 5 6 7 8 9 10 11 12 13
from transforms.api import configure @configure(profile=[ "KUBERNETES_OPEN_PORTS_ALL", "EXECUTOR_GPU_ENABLED", ]) @transform_df(...) def compute(...): xgb = SparkXGBClassifier( ..., device='gpu' # options: 'cpu', 'gpu', 'cuda' ) model = xgb.fit(...)
Refer to the SparkXGBClassifier documentation ↗ for more information on GPU configuration.
Optionally, you may want to perform hyperparameter optimization. We recommend using Optuna ↗ for hyperparameter optimization in Transforms.