HyperAuto V1 FAQ

General usage tips & guidance

Can I debug and preview code in an SDDI repository?

Yes, you can debug and preview code in an SDDI repository. In the SDDI repository, navigate to the file /transforms-bellhop/src/software_defined_data_integrations/transforms/pipeline_builder.py and select the transform you want to preview from the Preview button.

Can I configure a schedule to which new tables will be automatically added?

An SDDI repository produces a dataset called BUILD that is connected to all final datasets produced by the repository. In order to guarantee that all newly-ingested tables get built, create a new Full Build schedule (including upstream datasets) with this BUILD dataset as a target. The smart scheduler will only initiate builds for the parts of the pipeline where the raw data had been refreshed.

One of my tables / derived_element is failing due to MODULE_UNREACHABLE, what should I do?

MODULE_UNREACHABLE is often a sign that DRIVER_MEMORY in your Spark environment is insufficient. You can apply Spark profiles in your SourceConfig.yaml file for selected tables; see the configuration reference for details. Do not forget to import the assigned profile to your repository config first.

I added table <TABLE_NAME> to my pipeline, but when I try to build my pipeline it is failing with AssertionError: 0 instances of <TABLE_NAME> found in 'objects' metadata table

Make sure you have rerun metadata datasets objects, links, fields and diffs after new tables are ingested and added to your SDDI pipeline.

Do I need to increase semantic version if I add new tables to Bellhop config files?

No, you do not need to increase semantic version after adding new tables to Bellhop config files. However, you will need to rebuild metadata datasets objects, links, fields, and diffs.

Can I disable some of the intermediate stages generated by an SDDI repositiory?

Yes. The foreign key generation, enrichment stage, and renaming stage can be disabled using parameters in the PipelineConfig file. Incrementing the deploymentSemanticVersion is required for changes to take effect.

Disabling any or all of those steps will result in data schema consequences and may cause breaks in downstream usage of the data.