Audio is a modality for interacting with the rest of the platform through the Ontology. Context is pulled from the Ontology before and during a conversation, and a realtime model can listen, transcribe, and optionally speak back and make tool calls based on that context. The results are written back into the Ontology where functions, pipelines, AIP Logic, Workshop, actions, and other Foundry capabilities are available.
Use cases include:
Applications built with realtime audio capture, transcribe, or otherwise process human speech. Many jurisdictions require notifying participants that they are being recorded or transcribed before the recording starts, and some require explicit consent. This may apply whether the participant is the application user, a third party on the other end of a call, or anyone else whose voice is captured. Compliance with these requirements is the responsibility of your application's developer and the organization deploying it.
Before deploying a realtime audio workflow:
It is your responsibility to verify these requirements and obtain the necessary consents before deploying.
To build your first voice-enabled application, follow the tutorial: Build a voice-enabled OSDK application.
For the list of supported realtime speech-to-speech and transcription models, see Available audio models.