Class IncrementalUtils
java.lang.Object
com.palantir.transforms.excel.utils.IncrementalUtils
Functions that are convenient when using the transforms-excel-parser library in incremental pipelines.
-
Method Summary
Modifier and TypeMethodDescriptionstatic void
writeAppendingIfPossible
(com.palantir.transforms.lang.java.api.FilesModificationType inputFileModificationType, org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> newData, com.palantir.transforms.lang.java.api.FoundryOutput output) Write new data to the output, appending if possible.
-
Method Details
-
writeAppendingIfPossible
public static void writeAppendingIfPossible(com.palantir.transforms.lang.java.api.FilesModificationType inputFileModificationType, org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> newData, com.palantir.transforms.lang.java.api.FoundryOutput output) Write new data to the output, appending if possible.This function will perform a merge-and-replace SNAPSHOT transaction if the new data has columns that are not present in the existing output.
This function will perform an APPEND transaction as long as the columns in the new data are a subset of the columns in the existing output, even if existing files were modified in the input dataset (i.e., the input file modification type is UPDATED). You can use the
_file_path
column along with the_file_modified_timestamp
column created by using theTransformsExcelParser.includeFileModifiedTimestamp()
option to deduplicate as necessary downstream.- Parameters:
inputFileModificationType
- The incremental modification type of the input dataset. If NEW_VIEW, this function will perform a SNAPSHOT write, ignoring the existing data in the output.newData
- Data to writeoutput
- Destination to write the new data to.
-