In addition to transforming single datasets, Pipeline Builder allows you to bring datasets together with joins and unions.
A join combines two datasets that have at least one matching column. Depending on the type of join you configure, your join output can combine matching rows and exclude non-matching rows.
To join two datasets together, select the first dataset node in your graph and click Join.
The first selected dataset is the Left side dataset. Select another dataset node to be the Right side dataset. Click Start to configure the join.
In the join form, you can edit the join type, select match conditions, and preview the output table.
null
filled in columns for non-matching rows.city
column in the left Clean Facility Data
dataset is equal to the CITY
column in the right Facility Person
dataset.All data in the above and following examples was randomly generated and is non-representational.
You can decide to include specific columns in the join and add a prefix to the right table. Select Show advanced to expand the prefix and column fields, enter a prefix for the right table, and select the columns to include in the join. In the example below, we are keeping all columns from the left dataset and only including the STATE
and population
columns from the right dataset.
Once you finish configuring your join, click Apply to add the join to your workflow. You will see the join node connected to the two joined datasets in your graph. We named our new join Join person data
, and it is a direct output of the original Clean Facility Data
and Facility person
datasets.
Rename or edit the join by clicking the join node and selecting Edit.
Drag the white or gray circles on nodes to change connections and remove links on the graph.