You can choose to read your input dataset as snapshot or incremental, depending on your use case.
Snapshot computation performs transforms over the entire input, not just newly-added data. The output dataset is fully replaced by the latest pipeline output every build.
Best used when:
APPEND
transactions.
SNAPSHOT
transactions, incrementally reading the input is not possible.APPEND
transactions.
Incremental computation performs transforms only on new data that has been appended to the selected input since the last build. This can reduce compute resources, but comes with important restrictions.
A pipeline will only run with incremental computation if the selected input dataset changes through APPEND
or UPDATE
transactions that do not modify existing files. Marking a snapshot input as incremental will have no effect.
Best used when:
APPEND
transactions or additive UPDATE
transactions.
This section outlines restrictions that might be applicable to your workflow. Review prior to incremental computation setup to ensure proper implementation.
For more information, see an example of incremental computation in Pipeline Builder.