8. [Builder] Ontology Data Pipelines11. Ontology Data Transforms

11 - Ontology Data Transforms

This content is also available at learn.palantir.com ↗ and is presented here for accessibility purposes.

📖 Task Introduction

In your new pipeline artifact, you'll be generating three outputs using three input datasets to prepare them to back Ontology object and link types.

  1. flight_alerts_clean: This will back our flight alert object type, but first we want to remove the category column, since it's not needed in any anticipated workflows (and reducing the amount of data to be synchronized to the Ontology storage service also reduces computation load).
  2. passengers_clean: We determined this dataset requires no updates at this point, so we'll use Pipeline Builder to simply pass it through to an output.
  3. passenger_flight_alert_clean: There is a many-to-many relationship between passengers and flight alerts. Just as with many-to-many joins in a relational database, a join table is needed to back many-to-many link types in the Ontology. We'll therefore also need to prepare this dataset, which is already a part of our pipeline (and which we'll assume also needs no further preparation).

🔨 Task Instructions

  1. Import the three datasets mentioned above.
  2. Create output datasets for passengers_clean and passenger_flight_alert_clean called passengers and passenger_flight_alerts.
  3. For flight_alerts_clean, add a transform that removes the category column.
  4. Create an output dataset from that flight alert transform called flight_alerts.
  5. Deploy your pipeline.