8. [Builder] Ontology Data Pipelines9. Checking Your Backing Datasets

9 - Checking your Backing Datasets

This content is also available at learn.palantir.com ↗ and is presented here for accessibility purposes.

📖 Task Introduction

We want to create and link two object types—flight alerts and passengers—with the eventual goal of creating an alert inbox application that enables an analyst to take action, potentially including reaching out to affected passengers. With that end in mind, let's review your data pipelines, checking them against best practices and determining whether there's anything we can do to further prepare our flight alerts and passengers datasets to back Ontology object types.

🔨 Task Instructions

  1. Proceed to your personal /Temporary Training Artifacts/${yourName} folder.
  2. Right click on the /Data Engineering Tutorials folder and choose Explore data lineage from the fly-out menu.

In this pipeline you have two candidate datasets for creating Ontology objects in light of our desired outcomes:

  • passengers_clean
  • flight_alerts_clean

One question that immediately surfaces when considering our ontology model is whether flight alerts should include aggregated passenger data or simply access passenger data from flight alerts via a configured Ontology link type.

We might consider combining alert and passenger information if the data is a single piece of information rather than an aggregate. In this case, because there is a one-to-many relationship between alerts and passengers, the passenger data would be aggregated per alert. Passenger data is also not primarily supporting information about a flight alert. Conceptually, passengers and flight alerts are very distinct entities and have very different search semantics and use cases.

For these reasons, let's model them as separate object types connected via a link made possible by a shared key between the backing datasets.