2 - Adding and Referencing the Data Expectations Library and Modules
This content is also available at learn.palantir.com ↗ and is presented here for accessibility purposes.
📖 Task Introduction
You will need to add references to a designated library in order to use the expectations framework in your code.
🔨 Task Instructions
Open your flight_alerts_logic code repository in your /Datasource Project: Flight Alerts folder.
Create a new branch from Master named yourName/feature/primary_key_check.
Open the Libraries tab on the left of your repository and confirm that transforms-expectations appears in the installed list. If it is not, search for it and add it using the workflow you learned in the previous tutorial Monitoring Data Pipeline Health.
Open your flight_alerts_clean.py transform file in the code editor.
Update your imports from transforms.api to include a new, case-sensitive item: Check.
On a new line below the transform.api imports, add a new import statement: from transforms import expectations as E. Once complete, your import statements in this transform file should resemble the block below:
from pyspark.sql import functions as F
from transforms.api import transform_df, Input, Output, Check
from transforms import expectations as E
📚 Recommended Reading (~2 min read)
Review this introductory article to review relevant terms essential to successful use of the Data Expectations framework. There is no need to follow the links under the Learn More heading for now.