This content is also available at learn.palantir.com ↗ and is presented here for accessibility purposes.
We left off in the Data Engineering learning path having created a Transform Project. Datasets that will be used to back Ontology object and link types should be output by a code repository into an Ontology Project.
In this task, you'll set up an Ontology project folder and repository and generate code to output prepared datasets. First, with your Data Lineage graph open, let's check in on the flight_alerts_clean
and passengers_clean
datasets to determine if additional preparation is required.
In your open Data Lineage graph, click on the flight_alerts_clean
dataset node and open the Preview helper tab in the bottom left of the screen.
flight_date
column.Confirm primary key uniqueness by clicking the "▾" next to the alert_display_name
in the Preview helper and choosing View stats. In the histogram of values, verify no value appears > 1 time.
Review the columns in the dataset and ask, "does our flight alert object type really need all of these columns mapped as object properties to support all known workflows?"
category
column isn't needed and there's no operational harm in removing it from this dataset.Repeat these review steps for the passengers_clean
dataset. It, too, adheres to best naming and schema practices, and we'll assume its columns map perfectly to the properties we need.
Let's create an Ontology Project folder and associated transform artifacts. Return to your ../Data Engineering Tutorials/
folder and create a new folder titled Ontology Project: Flight Alerts.
Adhering to best practices, create at least the following subfolders:
/data
/transformed
/ontology
/analysis
/documentation
Create a new batch pipeline with Pipeline Builder in your .../Ontology Project: Flight Alerts/data/ontology
folder called ontology_flight_alerts_logic
.