In this tutorial, we will use Foundry Streaming and Pipeline Builder to create a simple pipeline with an output of a single dataset with information on sensor temperatures. You will learn how to create a stream in Foundry, push records into that stream, and transform them in Pipeline Builder.
First, we need to create a new stream.
On the Define page, select Normal for the throughput and define a basic schema as: sensor_id: String
, temperature: Double
.
We are now ready to connect our stream. At this point, we could set up a streaming data ingestion task with a source. For this tutorial, we will instead manually push records to the stream with Curl.
Select Test with a personal token and follow the on-screen prompts for generating a short-lived personal token.
Personal tokens should not be used for production pipelines. Production pipelines should use an OAuth token workflow.
Within seconds, you will see a record appear in the stream viewer on the page:
We have now ingested streaming data in real time. Let’s transform that data now.
This will create a pipeline for the input stream, displayed on a graph.
Selecting the input stream node will display a preview of the data. Note that the preview runs on a cold storage view of the stream; records from the stream will be delayed before they appear.
Click on the input stream node on the graph and select the Transform action (the blue T icon next to the input node).
This will open a list of all transforms currently supported for streams based on the input types of the columns in the stream. For this tutorial, we will convert all sensor_ids
to uppercase, remove any whitespace on them, and filter by temperatures exceeding three degrees.
sensor_id
column, and click Apply.sensor_id
column again, and click Apply.temperature
column, set the filter to greater than 3
, and select Apply.If you save your changes without deploying them, your pipeline logic will not update to the latest changes. You must deploy the pipeline to capture changes to transform logic.
This will take you to the stream preview page with the output stream from you transform.
The streaming cluster takes about one minute to start, so you may not see records immediately. Once running, however, the cluster will process all new records in real time.
Now that you know how to create a simple streaming pipeline, learn more about managing streams by exploring how to debug a failing stream. For more advanced transform functionality, learn more about Pipeline Builder.