Once you finish describing your pipeline in Pipeline Builder and resolving schema errors, you are ready to deliver your pipeline.
A deploy updates the logic on your pipeline outputs and a build executes that logic to materialize logic changes.
Builds can be time and resource intensive, especially if your data scale is large or if you are reprocessing the entirety of your pipeline's inputs. For this reason and others, you might choose to deploy your pipeline without building. By choosing only to deploy, you can defer the cost of the build until building is necessary.
If you want to deliver your first end-to-end pipeline and include all defined logic, select Deploy in the right of the top toolbar.
You can choose which outputs to build after your logic changes are deployed. Builds are done per job group, meaning you can optionally build all outputs in any given job group or individual outputs that are ungrouped. Ontology type outputs must always be built, meaning any job group with an Ontology type output must be built.
After successfully initiating a deployment, a blue banner will appear at the top of your graph. Select View to access the Build details view.
In the Build details view, you can find build information, progress metrics, and build schedule details.
Build info: Shows the status, total duration, and estimated duration of your pipeline. You can also view a variety of metadata, including the start and end times, initiating user, progress within a job list, and build ID.
Build progress: Displays details of the pipeline build over time as a Gantt chart.
Build schedule: Displays the name, frequency, status history, and last modified date of the pipeline build schedule.
Progress details: Toggle to see whether the build is starting, waiting in the Project's resource queue, initializing Spark application, running, or finishing.
You can choose to edit the Build settings of your pipeline by clicking the settings icon next to Deploy. Choose from the following compute settings:
In Pipeline Builder, you can choose to save changes to your pipeline without initiating a deployment. This flexibility allows you to edit your workflow without committing logic changes to production.
After making a change to your workflow, select Save in the top toolbar.
If you click Propose first, the current state will be automatically saved.
If you only save your changes without deploying them, your pipeline logic will not update to the latest changes. You must deploy the pipeline to capture changes to transform logic.
You can also choose to start a build of your pipeline even when you navigate outside the pipeline graph. For instance, you can open a dataset preview by right-clicking on the output node and selecting Open. You can then initiate a build by clicking Build in the upper right corner of the interface.
The Build option outside the pipeline graph will not update the pipeline logic with any changes made since the last deployment. To update logic and push to output, return to the pipeline graph and use Deploy.
If you are running a streaming pipeline, additional options will be available to you. Note that streaming pipelines are only available on some accounts. For more information, contact your Palantir representative.
You can use Replay on deploy if necessary to to instruct your pipeline to begin computation from a specific historical point in time.
In the Deploy window, choose the start time for data processing in your pipeline delivery:
2 months
ago.Replaying your pipeline could lead to lengthy downtimes, possibly as long as multiple days. When you replay your pipeline, your stream history will be lost and all downstream pipeline consumers will be required to replay.
For more information on replays, refer to the documentation on breaking changes.
Stream redeployment refers to the process of resuming a streaming job from a previously saved checkpoint. When a streaming job is paused or stopped, a bookmark is created within the data, indicating the position up to which records have been read. Bookmarks, also called checkpoints, are also created periodically while a stream is running. This enables recovery in case the stream encounters a failure for any reason.
By doing so, when the stream is restarted, it resumes processing from that specific checkpoint. During redeployment, the existing output streams are preserved, and new data is appended to them.
On the other hand, stream replay entails generating a new view of the output stream. Establishing a new view on the dataset is considered as a new stream containing fresh data; however, accessing data from a prior view is still possible. Various situations may necessitate or provide advantages for stream replay, including the following:
Be aware that replaying a pipeline may result in extended downtimes, which could last several days depending on the replay starting point. When you replay a pipeline, all data in the output stream is lost. If you wish to retain the data from the previous stream, you can direct the output to a new destination. However, if you intend to push records to the original output stream in the future, you will need to replay the pipeline.
To redeploy a stream, follow the same procedure used for the initial deployment; select Deploy in the Pipeline Builder interface.
To replay a stream, add the additional setting to either replay from Start of input data or From a specified time.