10 - Building an Ontology Project

This content is also available at learn.palantir.com ↗ and is presented here for accessibility purposes.

📖 Task Introduction

We left off in the Data Engineering learning path having created a Transform Project. Datasets that will be used to back Ontology object and link types should be output by a code repository into an Ontology Project. In this task, you'll set up an Ontology project folder and repository and generate code to output prepared datasets. First, with your Data Lineage graph open, let's check in on the flight_alerts_clean and passengers_clean datasets to determine if additional preparation is required.

🔨 Task Instructions

In your open Data Lineage graph, click on the flight_alerts_clean dataset node and open the Preview helper tab in the bottom left of the screen.
- Notice that column names are all written in snake_case, data is consistently formatted, and all columns are strings with the exception of the flight_date column.
Confirm primary key uniqueness by clicking the "▾" next to the alert_display_name in the Preview helper and choosing View stats. In the histogram of values, verify no value appears > 1 time.
- Remember you also have a primary key data expectation check on this dataset that will fail the build in case the primary key is not unique.
Review the columns in the dataset and ask, "does our flight alert object type really need all of these columns mapped as object properties to support all known workflows?"
- For now, let's hold that the rule_id column isn't needed and there's no operational harm in removing it from this dataset.
Repeat these review steps for the passengers_clean dataset. It, too, adheres to best naming and schema practices, and we'll assume its columns map perfectly to the properties we need.
Let's create an Ontology Project folder and associated transform artifacts. Return to your ../Data Engineering Tutorials/ folder and create a new folder titled Ontology Project: Flight Alerts.
Adhering to best practices, create at least the following subfolders:
- /data
  - /transformed
  - /ontology
- /analysis
- /documentation
Create a new Python code repository in your /Ontology Project: Flight Alerts folder called ontology_flight_alerts_logic.