Build settings

This page describes build settings in Pipeline Builder that can be used to adjust the performance of your batch and streaming pipelines.

You can edit the Build settings of your pipeline by selecting the settings icon next to Deploy in the top right of your screen.

Screenshot of the "Build settings" dropdown menu.

Batch pipeline

Batch compute profiles

The following batch compute profiles are available to select in Build settings:

ProfileDriver coresDriver memoryDynamic min executorsDynamic max executorsExecutor coresExecutor memoryExecutor off-heap memory
Extra Small14GBN/AN/AN/AN/AN/A
Small12GB1213GBN/A
Medium16GB21626GBN/A
Large113GB23226GBN/A
Extra Large127GB212826GBN/A
Natively Accelerated Small12GB121600MB2400MB
Natively Accelerated Medium16GB21621200MB4800MB
Natively Accelerated Large113GB23221200MB4800MB
Natively Accelerated Extra Large127GB212821200MB4800MB

Native acceleration

You can improve performance by enabling native acceleration of batch pipelines in Pipeline Builder with Velox ↗.

Read more about native acceleration in Foundry.

Enable native acceleration

You can edit the build settings of your pipeline by selecting the settings icon next to Deploy. The settings for native acceleration contain preconfigured profiles for small, medium, and large compute sizes. These align with the default small, medium, and large sizes based on the total memory footprint (there is no local mode). These preconfigured profiles are recommended if you are trying to run a pipeline with native acceleration for the first time.

Screenshot of the Build settings dropdown

There is also a natively accelerated profile with advanced configuration, allowing you to fully specify the on-heap and off-heap memory ratios, as well as all other resource and compute affecting configurations for the build.

Screenshot of the Build settings dropdown

Most of the time, selecting a preconfigured native acceleration profile should be enough to speed up your pipelines. If you encounter OOMs or performance regressions that do not occur in the non-natively accelerated build, the memory configuration is likely suboptimal. Often, adopting the advanced profile and reducing the percentage of memory allocated to off-heap can resolve the issue. If problems persist, it is likely that the pipeline is not well-suited for native acceleration and you should continue using the default run profiles.

Memory configuration considerations for native acceleration

Running Spark with native acceleration in Foundry requires a slightly different configuration from normal batch pipelines. Spark supports performing some operations with off-heap memory ↗. Off-heap memory is memory that is not managed by the JVM, cutting out GC overhead and leading to better performance. By default, we do not enable off-heap memory in Foundry, as doing so can introduce additional maintenance costs for pipelines. Enabling off-heap memory is necessary for native acceleration since DataFrames modified by Velox must be off-heap to be accessible by the native process. Foundry still requires sufficient on-heap memory for everything except Velox data transformations (for instance, orchestration, scheduling, and build management code still run in the JVM), but ideally most work will now be performed off-heap. Configuring a pipeline to use native acceleration introduces additional maintenance costs in balancing on-heap and off-heap memory. Pipeline Builder will offer managed profiles to assist with this, but custom configuration may still be necessary.

Streaming pipeline

Streaming compute profiles

The following compute profiles are available to select in Build settings:

ProfileJob Manager memoryParallelismTask Manager memory
Extra Extra Small1GB11GB
Extra Small1GB11GB
Small1GB24GB
Medium1GB36GB
Large2GB48GB
XLarge2GB812GB