Build settings

This page describes build settings in Pipeline Builder that can be used to adjust the performance of your batch and streaming pipelines.

You can edit the Build settings of your pipeline by selecting the settings icon next to Deploy in the top right of your screen.

Screenshot of the "Build settings" dropdown menu.

Batch pipeline

Batch compute profiles

The following batch compute profiles are available to select in Build settings:

ProfileDriver coresDriver memoryDynamic min executorsDynamic max executorsExecutor coresExecutor memoryExecutor off-heap memory
Extra Small14GBN/AN/AN/AN/AN/A
Small12GB1213GBN/A
Medium16GB21626GBN/A
Large113GB23226GBN/A
Extra Large127GB212826GBN/A
Natively Accelerated Small12GB121600MB2400MB
Natively Accelerated Medium16GB21621200MB4800MB
Natively Accelerated Large113GB23221200MB4800MB
Natively Accelerated Extra Large127GB212821200MB4800MB

Native acceleration

You can improve performance by enabling native acceleration of batch pipelines in Pipeline Builder with Velox ↗.

Native acceleration is a technique that leverages low-level hardware optimizations to improve the performance of batch jobs. These performance gains are achieved by shifting compute from Java Virtual Machine (JVM) languages to native languages, such as C++, which are compiled down to machine code and run directly on the hardware of the machine. By using platform-specific features, native acceleration aims to significantly reduce the time needed to process large-scale data workloads in order to speed up job execution and improve resource utilization.

Enable native acceleration

You can edit the build settings of your pipeline by selecting the settings icon next to Deploy. The settings for native acceleration contain preconfigured profiles for small, medium, and large compute sizes. These align with the default small, medium, and large sizes based on the total memory footprint (there is no local mode). These preconfigured profiles are recommended if you are trying to run a pipeline with native acceleration for the first time.

Screenshot of the Build settings dropdown

There is also a natively accelerated profile with advanced configuration, allowing you to fully specify the on-heap and off-heap memory ratios, as well as all other resource and compute affecting configurations for the build.

Screenshot of the Build settings dropdown

Most of the time, selecting a preconfigured native acceleration profile should be enough to speed up your pipelines. If you encounter OOMs or performance regressions that do not occur in the non-natively accelerated build, the memory configuration is likely suboptimal. Often, adopting the advanced profile and reducing the percentage of memory allocated to off-heap can resolve the issue. If problems persist, it is likely that the pipeline is not well-suited for native acceleration and you should continue using the default run profiles.

Build analysis

You can conduct basic analysis of a natively accelerated build in the Spark Details page. Under the Query Plan tab, select Physical Plan; you will see something like the following:

== Physical Plan ==
AdaptiveSparkPlan
+- == Final Plan ==
   Execute InsertIntoHadoopFsRelationCommand
   +- WriteFiles
      +- CollectMetrics
         +- VeloxColumnarToRowExec
            +- ^ ProjectExecTransformer
               +- ^ InputIteratorTransformer
                  +- ^ InputAdapter
                     +- ^ RowToVeloxColumnar
                        +- ^ HashAggregate
                           +- ^ VeloxColumnarToRowExec
                              +- ^ AQEShuffleRead
                                 +- ^ ShuffleQueryStage
                                    +- ColumnarExchange
                                       +- ^ ProjectExecTransformer
                                          +- ^ RowToVeloxColumnar
                                             +- * ColumnarToRow
                                                +- BatchScan parquet

While broadly similar to a conventional Spark query plan, you will notice a few key differences. Instead of the ProjectExec node, there is a ProjectExecTransformer. This means that the operation will be executed natively in the Velox query engine. All offloaded nodes of the query plan will be marked with ^ symbol in the tree. Blocks of native execution are sandwiched by RowToVeloxColumnar and VeloxColumnarToRowExec. These nodes are responsible for converting Spark datasets to Arrow DataFrames and vice-versa. This serialization/deserialization has a significant cost.

There are generally two patterns which indicate poor native acceleration performance:

  • A small percentage of nodes executed natively, as indicated by the ^ symbol.
  • A large number of RowToVeloxColumnar and VeloxColumnarToRowExec nodes resulting in high serialization overheads.

This analysis can be helpful if performance is not as expected. Small changes to pipelines can have a large impact on the amount of compute that is offloaded. Features like checkpoints can be used to manually group together chunks of a build that can all be executed natively.

Implementation and architecture of native acceleration

Foundry’s implementation of native acceleration is built upon the Apache Gluten ↗ project. Foundry native acceleration leverages the Velox ↗ query engine to accelerate Spark jobs at runtime. Velox is written in C++ and is designed explicitly with database acceleration in mind ↗, providing a developer API to run operator-level operations on Arrow DataFrames ↗. Gluten provides the necessary "glue" to bind the Spark runtime with Velox.

In this setup, a pipeline first generates a Spark query plan as in a normal build (one without native acceleration). Additional optimization rules are then applied to the plan in order to identify whether parts of the query can be run with Velox. This decision is based on whether Velox has an equivalent implementation and whether a mapping for the implementation exists in Gluten. The query can be offloaded at the operator-level: this corresponds roughly to SQL statements like SELECT, FILTER, or JOIN. Any part of the query plan that can be offloaded is marked at this stage.

Once the planning step is complete, the query is executed through the normal Spark engine. This means all task scheduling, executor orchestration, and lifecycle management proceed as normal. The difference comes when an executor reaches part of the query plan that has been marked for native execution. If this occurs, instead of calling the default implementations in Spark, the Velox implementations are invoked.

This architecture is particularly advantageous because it supports queries where not all computations can be done with Velox. This is because the offload decision is made at the operator level rather than for the entire plan. The number of supported operators is constantly growing, but user-authored code like UDFs can never be offloaded as a native implementation does not exist.

Why is native acceleration faster?

Spark is written in Scala, a JVM language, and contains many optimizations such as code generation ↗ to improve its performance. Further, the JVM itself contains optimizations such as the C2 Compiler ↗ that aim to take advantage of as many platform-specific features as possible. However, native languages such as C++ continue to offer better performance for three basic reasons:

  • Compile-time optimizations: While Java and Scala are compiled into bytecode, which is then executed by the JVM, native languages like C++ are compiled directly into machine code. This allows the C++ compiler to perform extensive optimizations at compile-time that significantly reduce runtime overhead. In contrast, JVM languages rely on Just-In-Time (JIT) compilation, which occurs during execution and may not achieve the same level of optimization because it has to balance the time spent on compilation with the need to start running quickly.
  • No garbage collection (GC): In C++, memory management is handled manually, which eliminates the overhead associated with garbage collection (GC). In JVM languages, the GC process can introduce unpredictable pauses and overhead that can impact performance, especially in memory-intensive applications.
  • Direct hardware access and availability of vectorization APIs: C++ provides direct access to hardware features and low-level system resources, enabling developers to leverage platform-specific optimizations and vectorization APIs such as SSE, AVX, and other SIMD (Single Instruction Multiple Data) instructions. This direct access allows for fine-tuned performance optimizations that are not as easily achievable in JVM languages, where the abstraction layer may prevent the same level of hardware interaction.

Memory configuration considerations for native acceleration

Running Spark with native acceleration in Foundry requires a slightly different configuration from normal batch pipelines. Spark supports performing some operations with off-heap memory ↗. Off-heap memory is memory that is not managed by the JVM, cutting out GC overhead and leading to better performance. By default, we do not enable off-heap memory in Foundry, as doing so can introduce additional maintenance costs for pipelines. Enabling off-heap memory is necessary for native acceleration since DataFrames modified by Velox must be off-heap to be accessible by the native process. Foundry still requires sufficient on-heap memory for everything except Velox data transformations (for instance, orchestration, scheduling, and build management code still run in the JVM), but ideally most work will now be performed off-heap. Configuring a pipeline to use native acceleration introduces additional maintenance costs in balancing on-heap and off-heap memory. Pipeline Builder will offer managed profiles to assist with this, but custom configuration may still be necessary.

Streaming pipeline

Streaming compute profiles

The following compute profiles are available to select in Build settings:

ProfileJob Manager memoryParallelismTask Manager memory
Extra Extra Small1GB11GB
Extra Small1GB11GB
Small1GB24GB
Medium1GB36GB
Large2GB48GB
XLarge2GB812GB