Streaming Pipelines: Overview

Streaming pipelines provide the ability to make immediate critical decisions based on real-time data. By processing data as a stream with dedicated compute, streaming pipelines are able to process records with very low latency. On average, streaming data can be accessible in the Ontology and available for analysis in time series applications, such as Quiver or Foundry Rules, in under 15 seconds. To achieve this low-latency, streams are built on top of compute that runs continuously and require different architecture and maintenance consideration compared to batch pipelines.

Best practices

When building out streaming pipelines, consider these factors:

  • Streams often power highly operational workflows and require careful planning around downtime, maintenance, and logic changes to ensure high uptime and availability.
  • Compute for streaming runs continuously. This can result in higher compute costs than a periodic batch job. Similarly to batch pipelines, consider starting with the smallest profile available and adjust that if the scale of your data requires it.
  • Streams operate on a per-row basis and have constraints on the maximum row size to ensure low latency data transfers. The constraint is set to 1mb per individual row.
  • Streams using state (windows or aggregations, for example) require design consideration to ensure the state is not broken when changing the stream logic.

Get started

To start using streaming pipelines in Foundry, review how to create a simple streaming pipeline, and learn about streaming transforms in Pipeline Builder. If you want to learn about connecting your data sources to Foundry, review how to push data into a stream, or how to setup a streaming sync.