Configuration reference

Sunset

HyperAuto V1 is in the sunset phase of development and will be deprecated at a future date. Full support remains available. The creation of new V1 pipelines is discouraged, and users should migrate from HyperAuto V1 to V2 as detailed in the migration documentation.

Warning

This section describes advanced manual settings that can bring your SDDI pipeline into a broken state if not applied correctly. Always verify changes on a branch before deploying to production.

SDDI's pipeline is generated by a fully automated code repository. Cockpit is the default place to interact with those configurations, but you may have to manually amend the configuration files for advanced parameters or to configure non-standard source types.

To review the steps involved, read about pipeline generation.

Configurations are performed within two main files located in the transforms-bellhop/src/config/ folder:

SourceConfig.yaml
PipelineConfig.yaml

SourceConfig.yaml

The following is a notional example of a fully-defined SourceConfig file.

Copied!1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
sourceName: MY_SOURCE
sourceRid: ri.magritte..source.abcdefgh-1234-5678-910a-zyxwvut
sapContext:
  type: direct
rawFolderStructure:
  raw: /HyperAuto/source/raw
  dataDictionary: /HyperAuto/source/metadata
cleaningLibraries:
  - convert_all_columns_to_clean_types
deploymentSemanticVersion: 2
metadataSparkProfiles:
  - DRIVER_MEMORY_MEDIUM
languageKey: 'E'
tables:
  - tableName: ABCD
    datasetTransformsConfig:
      datasetName: ABCD
      deduplicationComparisonColumns: []
      batchUnionComponents: []
      tableCleaningLibraries: []
  - tableName: WXYZ
    datasetTransformsConfig:
      datasetName: WXYZ
      deduplicationComparisonColumns:
        - /PALANTIR/TIMESTAMP
        - /PALANTIR/ROWNO
      batchUnionComponents:
        - WXYZ_historical
        - WXYZ_incremental
      tableCleaningLibraries:
        - parse_timestamp_column
      sparkProfiles:
        profiles:
          - EXECUTOR_MEMORY_MEDIUM
          - NUM_EXECUTORS_4

Parameters description

Parameter	Description
`sourceName`	The name to identify a source system. Used to prefix primary and foreign keys.
`sourceRid`	The RID of the source attached to this SDDI instance.
`sapContext`	(Optional) Details of the SAP context.
`rawFolderStructure`	Defines the folders in which the raw data and metadata reside.
`cleaningLibraries`	List of cleaning libraries to apply to all tables.
`deduplicationConfig`	(Optional, default: None) Config used to specify which columns to use for the deduplication logic.
`metadataSparkProfiles`	(Optional, default: None) List of Spark profiles to apply to metadata generation.
`languageKey`	(Optional, default: 'E') Language to use in enrichments.
`deploymentSemanticVersion`	(Optional, default: 0) Semantic version of the pipeline; incrementing it will force a snapshot.
`tables`	List of tables from that source to be processed by SDDI.

`sapContext`

(Optional) Details of the SAP context. SAP Explorer will use this to pre-select the context. Each context will need to have its own SourceConfig file.

`rawFolderStructure`

Defines the folders in which the raw data and metadata reside.

Fields:

raw: Path of folder where raw tables are ingested.
dataDictionary: (Optional, default:raw) Path of folder where metadata tables are ingested.

`cleaningLibraries`

List of cleaning libraries to apply to all tables. Cleaning functions are defined in transforms-bellhop/src/software_defined_integrations/transforms/cleaned/function_libraries.

Adding or removing a function requires incrementing the deploymentSemanticVersion.

`deduplicationConfig`

(Optional, default: None) Config used to specify which columns to use for the deduplication logic. Configuration defined here is applied across all tables.

Fields:

comparisonColumns: Columns for which the max value will be taken to determine the uniqueness of primary keys.
changeModeColumn: (Optional) If specified, rows having value D in this column will be deleted.

`deploymentSemanticVersion`

(Optional, default: 0) Semantic version of the pipeline; incrementing it will force a snapshot.

See Incremental Transforms for the effects of deploymentSemanticVersion on incremental and snapshot transforms.

`metadataSparkProfiles`

(Optional, default: None) List of Spark profiles to apply to metadata dataset generation (objects, fields, links and diffs).

Be sure the profiles are added to the repository before referencing them here.

`tables`

List of tables from defined source to be processed by SDDI.

Fields:

tableName: Name of the table in metadata.
datasetTransformsConfig
- datasetName: Foundry dataset name of the raw data.
- deduplicationComparisonColumns: Table-specific config used to deduplicate data and specify which columns to use for the deduplication logic. Applied after the global deduplication fields.
- changeModeColumn: (Optional) If specified, rows having value D in this column will be deleted. Applied over the global change mode column.
- batchUnionComponents: List of input dataset names that should be unioned before the cleaning step.
- sparkProfiles: (Optional) Spark profiles to apply at different stages of the transforms.
  - profiles: Spark profiles; see details for adding them to the repository.
  - stages: (Optional, default: None) Transform stages the profiles should be applied to. Value should be in [CLEANED, DERIVED, ENRICHED, FINAL, RENAMED, RENAMED_CHANGELOG]. If None, profiles are applied at all stages.
- tableCleaningLibraries: List of cleaning libraries to apply to this table. Cleaning functions are defined in transforms-bellhop/src/software_defined_integrations/transforms/cleaned/function_libraries. Adding or removing a function will require you to increment the deploymentSemanticVersion.
- enforceUniquePrimaryKeys: (Optional, default: False). If True and deduplicationComparisonColumns are defined, guarantees that only one record per primary key will be kept at the deduplication stage. This may result in non-deterministic behavior.

PipelineConfig.yaml

Example of a notional fully-defined PipelineConfig file.

Copied!1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
sourceName: HyperAuto
sourceType: SAP_ERP
sourceConfigFileNames:
  - SourceConfig.yaml
outputFolder: /HyperAuto/source/output
workflows:
  my_workflow:
    variables:
      - name: my_variable_name
        value: my_variable_value
    enrichments:
      - my_enrichment_name
tables:
  ABCD:
    displayName: Header Table
    types:
      - OBJECT
  WXYZ:
    displayName: Item Table
    types:
      - OBJECT
      - METADATA
disableForeignKeyGeneration: False
disableEnrichedStage: False
disableRenamedStage: False

Parameters description

Parameter	Description
`projectName`	Project name. Serves as a prefix to Ontology objects.
`sourceType`	Type of sources supported by SDDI. Should be one of [SAP_ERP, SALESFORCE, ORACLE_NETSUITE].
`sourceConfigFileNames`	List of SourceConfig filenames to include in the pipeline.
`outputFolder`	Defines the folder in which output datasets will be written.
`workflows`	List of workflows to deploy, with configurations.
`tables`	List of tables processed in this SDDI pipeline.
`disableEnrichedStage`	(Optional, default: False) If enabled, no enriched datasets will be produced. Use with caution, as enabling will break workflows.
`disableRenamedStage`	(Optional, default: False) If enabled, no renamed_changelog datasets will be produced. Use with caution, as enabling will break workflows.
`disableForeignKeyGeneration`	If enabled, no foreign key columns will be produced. Use with caution, as enabling will break workflows.

`tables`

List of tables processed in this SDDI pipeline:

displayName: Human-readable name of the table. Output dataset name will be constructed in the form displayName (technicalName)
types: List of data types this table represents (can be many).
- OBJECT: Master data table that constitutes an object in the ontology.
- METADATA: Metadata table that contains information on objects and constructing primary keys.
- CUSTOMIZATION: Enrichment table that is joined to master data tables at enriched step of SDDI pipeline.

←

PREVIOUSCockpit

NEXTMigrating from HyperAuto V1 to V2

→