2 - Preparing Your Dataset

This content is also available at learn.palantir.com ↗ and is presented here for accessibility purposes.

An Ontology is a categorization of the world, and one of the elements of our notional world in this track is a Flight Alert. We'll take as given that these alerts are triggered and turned into “data” using logic elsewhere in the platform.

Before embarking on an Ontology development project like this one, you should know:

The workflow you aim to achieve with your object types and links and the data architecture required.
Object types are backed by a single dataset, and a dataset can back only one object type.
Cleaning and formatting should be done upstream in data transformations, not the Ontology.

The first step to creating an object type is finding or developing the right dataset. In this lesson, we're going to simply create a copy of an existing dataset in order to work comfortably without impacting other Foundry users' outputs. In real projects, you would typically develop a more complex pipeline if your desired dataset does not exist, potentially collaborating on the pipeline with data engineers and administrators of various source systems connected to Foundry.

🔨 Create a new datasource for your object type

If you've already created a personal sandbox folder for use during tutorials, navigate to that folder. If you have not, follow the steps from the Create a Sandbox Folder page from the Introduction to Palantir Foundry tutorial to create it now.
In your personal sandbox folder, open your Training Pipeline Simulator (yourName date) folder and create a new pipeline by choosing New button and choosing the Pipeline option from the dropdown menu.
When asked about pipeline location, batch vs streaming preference, and other pipeline creation options, use the default values (do not change anything)
You should now see a welcome screen with an option to Add Foundry datasets. Select this option, then find the /Foundry Training and Resources/Example Data/Aviation Ontology/flight_alerts/ dataset. Select the + next to that dataset, then add the dataset using the button near the bottom-right corner of the window. This closes the window and shows the flight_alerts dataset in the central area of the Pipeline Builder UI.
Select "Add pipeline output" on the right sidebar, then choose the "Add" button alongside the "Dataset" option. The right sidebar should now contain column names autopopulated based on the flight_alerts dataset.
Rename your output dataset to flight_alerts_{yourname}_{date} using the field at the top of the right sidebar (above the columns and all other buttons, where the original name says New dataset {date})
To copy the data from the source dataset to the newly created output dataset, save by clicking the green arrow (Depending on your screen resolution, the bottom may or may not be labeled "save") button at the top of the Pipeline Builder UI, and then click the blue hammer "deploy" button close to it. When that button opens a pop-up, confirm the deployment by clicking the green "deploy pipeline" button.
It may take up to few minutes for that new dataset to be ready - you can monitor the progress of the dataset by clicking on the refresh wheel icon to the right of the deploy button; you can click on it to view details, or wait for it to become a green check mark that means the deployment was successful.

You now have a unique dataset that is ready to back your Ontology object type.