Realtime audio

Audio is a modality for interacting with the rest of the platform through the Ontology. Context is pulled from the Ontology before and during a conversation, and a realtime model can listen, transcribe, and optionally speak back and make tool calls based on that context. The results are written back into the Ontology where functions, pipelines, AIP Logic, Workshop, actions, and other Foundry capabilities are available.

Use cases include:

Dictation. A single speaker dictates decisions, observations, or notes. The system transcribes the audio, extracts entities, and writes them to the Ontology with the speaker's permissions.
Meetings. The system transcribes, diarizes, and structures multi-speaker audio into Ontology objects live.
Live call assistance. A human is on a call with a customer. A Foundry application listens, transcribes the conversation, surfaces relevant Ontology context to the Foundry user on the call, and writes outcomes back when the call ends.
Voice-controlled interfaces. A Foundry user issues commands by voice instead of navigating manually — querying the Ontology, triggering actions, and navigating workflows hands-free.

Recording and consent are your responsibility

Applications built with realtime audio capture, transcribe, or otherwise process human speech. Many jurisdictions require notifying participants that they are being recorded or transcribed before the recording starts, and some require explicit consent. This may apply whether the participant is the application user, a third party on the other end of a call, or anyone else whose voice is captured. Compliance with these requirements is the responsibility of your application's developer and the organization deploying it.

Before deploying a realtime audio workflow:

Verify the legal and regulatory requirements that apply in every jurisdiction where the application will be used.
Build appropriate notification and consent flows into the application before audio capture begins.
Document your consent posture and retention practices alongside the application.
Follow your organization's internal policies, compliance requirements, and legal obligations.

It is your responsibility to verify these requirements and obtain the necessary consents before deploying.

Get started

To build your first voice-enabled application, follow the tutorial: Build a voice-enabled OSDK application.

Available models

For the list of supported realtime speech-to-speech and transcription models, see Available audio models.

←

PREVIOUSAIP Model Catalog / Model deprecation

NEXTBuild a voice-enabled OSDK application

→

Realtime audio

Recording, transcription, and consent

Get started

Available models