8. [Repositories] Ontology Data Pipelines31. Destructive Backing Dataset Changes Part 1

31 - Destructive Backing Dataset Changes: Part 1

This content is also available at learn.palantir.com ↗ and is presented here for accessibility purposes.

📖 Task Introduction

A backing dataset change is "destructive" when it changes the schema of columns already registered to and indexed in a Phonograph table. By default, Phonograph does not automatically accept schema changes, and so you'll need to handle them manually by updating the property mapping and Phonograph table registration. In this task, you'll learn to handle deletions or changes to the type (e.g,. from double to integer) of a column in the backing dataset. Whether unexpectedly or not, let's say that in an upstream data source, the rule_name property currently part of your flight alerts dataset has been removed. We'll begin by simulating this removal in your backing dataset and then address the subsequent failure in the next task.

🔨 Task Instructions

  1. Open the flight_alerts.py file in your ontology_flight_alerts_logic repository.

  2. Update your return statement to also drop the rule_name column: return source_df.drop('rule_id', 'rule_name')

  3. Preview, commit, and build your code using best practices.

  4. Once your dataset build completes, open the output flight_alerts dataset and proceed to the Syncs section of the Details tab. The sync may still be in a "running state," but once it completes, it will report a failure.

  5. Click the History button in the red failure block. This will take you to an ordered list of the history of this sync.

  6. Click the top failed sync item in the list. You're now looking at the sync details, including the expandable Phonograph schema mismatch error message.

  7. Expand the error message by clicking the > next to the word Details. Note the final line in the error:

    foundryColumnsInPhonographTableSchemaMissingFromFoundrySchema=[rule_name]

    The registered Phonograph table was expecting a rule_name column (because it was previously synced) but didn't find one in the backing dataset.

  8. Open your flight alerts object type in OMA.

  9. Click the Datasources menu item in the left-hand panel and scroll down to the Phonograph block. You can also see the failed index status here in OMA (clicking on the Failed sync link will take you to the error in the Job Tracker application that we saw earlier).