API: Media

Functions are executed in an environment that has strict memory limits. Exceeding these memory limits can happen quickly when dealing with file data; we recommend only interacting with media under 20MB.

If an object has a media reference property, you can use functions to interact with the associated media item. A media item exposes a number of methods for conveniently interacting with the underlying media. There are a number of built-in operations that allow you to easily interact with different kinds of media with no need for external libraries. The following documentation explains what functionality is available and how to use it.

If you need any operations that don't currently exist out-of-the-box, you will likely need to use external libraries or write your own custom code. Learn more about adding NPM dependencies to Functions repositories.

Universal operations

Some operations are supported by all media types.

Read raw media data

To read the raw data from a media item, use the readAsync method on the media item. You can access the media item by selecting the media reference property on the object. The signature for the readAsync method is as follows:

Copied!
1 readAsync(): Promise<Blob>;

Blob is a standard JavaScript type ↗, representing a file-like object of immutable, raw data. As mentioned above, you can use this with libraries to interact with your media beyond the default functionality.

Get media metadata

To get a media item's metadata, use the getMetadataAsync method on the media item. The signature for the getMetadataAsync method is as follows:

Copied!
1 getMetadataAsync(): Promise<IMediaMetadata>;

Type guards

Several type guards are available, which will allow you to access functionality that is specific to certain media types. The following type guards can be used on media item metadata:

  • isAudioMetadata()
  • isDicomMetadata()
  • isDocumentMetadata()
  • isImageryMetadata()
  • isVideoMetadata()

As an example, you could use the imagery type guard to pull out image specific metadata fields:

Copied!
1 2 3 4 5 const metadata = await myObject.mediaReference?.getMetadataAsync(); if (isImageryMetadata(metadata)) { const imageWidth = metadata.dimensions?.width; ... }

You can also use type guards on the media item namespace, which then gives you access to more methods on the type-specific media item. The type guards you can use here are:

  • MediaItem.isAudio()
  • MediaItem.isDocument()
  • MediaItem.isImagery()

Document-specific operations

Text extraction

To extract text from a document, you can either use the ocrAsync or the extractTextAsync method on the media item.

For machine-generated PDFs, it may be faster and/or more accurate to extract text embedded digitally in the PDF rather than using optical character recognition (OCR). Below is an example of text extraction usage:

Copied!
1 extractTextAsync(options: IDocumentExtractTextOptions): Promise<string[]>;

The following can optionally be provided as a TypeScript object:

  • startPage: The zero-indexed start page (inclusive).
  • endPage: The zero-indexed end page (exclusive).

For non-machine-generated PDFs, it would be best to use the OCR method for extracting text.

Copied!
1 ocrAsync(options: IDocumentOcrOptions): Promise<string[]>;

The following can optionally be provided as a TypeScript object:

  • startPage: The zero-indexed start page (inclusive).
  • endPage: The zero-indexed end page (exclusive).
  • languages: A list of languages to recognize (can be empty).
  • scripts: A list of scripts to recognize (can be empty).
  • outputType: Specifies the output type as text or hocr.

Remember that you need to use type guards in order to access media-type specific operations. Here's an example of using the isDocument type guard to then perform OCR text extraction:

Copied!
1 2 3 4 5 6 7 8 9 10 11 12 import { MediaItem } from "@foundry/functions-api"; import { ArxivPaper } from "@foundry/ontology-api"; @Function() public async firstPageText(paper: ArxivPaper): Promise<string | undefined> { if (MediaItem.isDocument(paper.mediaReference!)) { const text = (await paper.mediaReference.ocrAsync({ endPage: 1, languages: [], scripts: [], outputType: 'text' }))[0]; return text; } return undefined; }

Audio-specific operations

Transcription

Audio media items support transcription using the transcribeAsync method. The signature is as follows:

Copied!
1 transcribeAsync(options: IAudioTranscriptionOptions): Promise<string>;

The following can optionally be passed in to specify how the transcription should run. The available options are:

  • language: The language to transcribe, passed using the TranscriptionLanguage enum.
  • performanceMode: Runs transcriptions in More Economical or More Performant mode, passed using the TranscriptionPerformanceMode enum.
  • outputFormat: Specifies the output format by passing an object of type plainTextNoSegmentData (plain text) or pttml. pttml is a TTML-like ↗️ format where the object also takes a boolean addTimestamps parameter if the type is plainTextNoSegmentData.

Here's an example of providing options for transcription:

Copied!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 import { Function, MediaItem, TranscriptionLanguage, TranscriptionPerformanceMode } from "@foundry/functions-api"; import { AudioFile } from "@foundry/ontology-api"; @Function() public async transcribeAudioFile(file: AudioFile): Promise<string|undefined> { if (MediaItem.isAudio(file.mediaReference!)) { return await file.mediaReference.transcribeAsync({ language: TranscriptionLanguage.ENGLISH, performanceMode: TranscriptionPerformanceMode.MORE_ECONOMICAL, outputFormat: {type: "plainTextNoSegmentData", addTimestamps: true} }); } return undefined; }