You can improve Spark's performance by enabling native acceleration with Velox ↗.
Native acceleration is a technique that leverages low-level hardware optimizations to improve the performance of batch jobs. These performance gains are achieved by shifting compute from Java Virtual Machine (JVM) languages to native languages, such as C++, which are compiled down to machine code and run directly on the hardware of the machine. By using platform-specific features, native acceleration aims to significantly reduce the time needed to process large-scale data workloads in order to speed up job execution and improve resource utilization.
Native acceleration is available for Python transforms and Pipeline Builder.
You can conduct basic analysis of a natively accelerated build in the Spark Details page. Under the Query Plan tab, select Physical Plan; you will see something like the following:
== Physical Plan ==
AdaptiveSparkPlan
+- == Final Plan ==
Execute InsertIntoHadoopFsRelationCommand
+- WriteFiles
+- CollectMetrics
+- VeloxColumnarToRowExec
+- ^ ProjectExecTransformer
+- ^ InputIteratorTransformer
+- ^ InputAdapter
+- ^ RowToVeloxColumnar
+- ^ HashAggregate
+- ^ VeloxColumnarToRowExec
+- ^ AQEShuffleRead
+- ^ ShuffleQueryStage
+- ColumnarExchange
+- ^ ProjectExecTransformer
+- ^ RowToVeloxColumnar
+- * ColumnarToRow
+- BatchScan parquet
While broadly similar to a conventional Spark query plan, you will notice a few key differences. Instead of the ProjectExec
node, there is a ProjectExecTransformer
. This means that the operation will be executed natively in the Velox query engine. All offloaded nodes of the query plan will be marked with ^
symbol in the tree. Blocks of native execution are sandwiched by RowToVeloxColumnar
and VeloxColumnarToRowExec
. These nodes are responsible for converting Spark datasets to Arrow DataFrames and vice-versa. This serialization/deserialization has a significant cost.
There are generally two patterns which indicate poor native acceleration performance:
^
symbol.RowToVeloxColumnar
and VeloxColumnarToRowExec
nodes resulting in high serialization overheads.This analysis can be helpful if performance is not as expected. Small changes to pipelines can have a large impact on the amount of compute that is offloaded. Features like checkpoints can be used to manually group together chunks of a build that can all be executed natively.
Foundry’s implementation of native acceleration is built upon the Apache Gluten ↗ project. Foundry native acceleration leverages the Velox ↗ query engine to accelerate Spark jobs at runtime. Velox is written in C++ and is designed explicitly with database acceleration in mind ↗, providing a developer API to run operator-level operations on Arrow DataFrames ↗. Gluten provides the necessary "glue" to bind the Spark runtime with Velox.
In this setup, a pipeline first generates a Spark query plan as in a normal build (one without native acceleration). Additional optimization rules are then applied to the plan in order to identify whether parts of the query can be run with Velox. This decision is based on whether Velox has an equivalent implementation and whether a mapping for the implementation exists in Gluten. The query can be offloaded at the operator-level: this corresponds roughly to SQL statements like SELECT
, FILTER
, or JOIN
. Any part of the query plan that can be offloaded is marked at this stage.
Once the planning step is complete, the query is executed through the normal Spark engine. This means all task scheduling, executor orchestration, and lifecycle management proceed as normal. The difference comes when an executor reaches part of the query plan that has been marked for native execution. If this occurs, instead of calling the default implementations in Spark, the Velox implementations are invoked.
This architecture is particularly advantageous because it supports queries where not all computations can be done with Velox. This is because the offload decision is made at the operator level rather than for the entire plan. The number of supported operators is constantly growing, but user-authored code like UDFs can never be offloaded as a native implementation does not exist.
View the full list of supported operators and expressions ↗
Spark is written in Scala, a JVM language, and contains many optimizations such as code generation ↗ to improve its performance. Further, the JVM itself contains optimizations such as the C2 Compiler ↗ that aim to take advantage of as many platform-specific features as possible. However, native languages such as C++ continue to offer better performance for three basic reasons:
Running Spark with native acceleration in Foundry requires a slightly different configuration from normal batch pipelines. Spark supports performing some operations with off-heap memory ↗. Off-heap memory is memory that is not managed by the JVM, cutting out GC overhead and leading to better performance. By default, we do not enable off-heap memory in Foundry, as doing so can introduce additional maintenance costs for pipelines. Enabling off-heap memory is necessary for native acceleration since DataFrames modified by Velox must be off-heap to be accessible by the native process. Foundry still requires sufficient on-heap memory for everything except Velox data transformations (for instance, orchestration, scheduling, and build management code still run in the JVM), but ideally most work will now be performed off-heap. Configuring a pipeline to use native acceleration introduces additional maintenance costs in balancing on-heap and off-heap memory.