Functions are executed in an environment that has strict memory limits. Exceeding these memory limits can happen quickly when dealing with file data; we recommend only interacting with media under 20MB.
If an object has a media reference property, you can use functions to interact with the associated media item. A media item exposes a number of methods for conveniently interacting with the underlying media. There are a number of built-in operations that allow you to easily interact with different kinds of media with no need for external libraries. The following documentation explains what functionality is available and how to use it.
If you need any operations that don't currently exist out-of-the-box, you will likely need to use external libraries or write your own custom code. Learn more about adding NPM dependencies to Functions repositories.
Some operations are supported by all media types.
To read the raw data from a media item, use the readAsync
method on the media item. You can access the media item by selecting the media reference property on the object. The signature for the readAsync
method is as follows:
Copied!1
readAsync(): Promise<Blob>;
Blob
is a standard JavaScript type ↗, representing a file-like object of immutable, raw data. As mentioned above, you can use this with libraries to interact with your media beyond the default functionality.
To get a media item's metadata, use the getMetadataAsync
method on the media item. The signature for the getMetadataAsync
method is as follows:
Copied!1
getMetadataAsync(): Promise<IMediaMetadata>;
Several type guards are available, which will allow you to access functionality that is specific to certain media types. The following type guards can be used on media item metadata:
isAudioMetadata()
isDicomMetadata()
isDocumentMetadata()
isImageryMetadata()
isVideoMetadata()
As an example, you could use the imagery type guard to pull out image specific metadata fields:
Copied!1 2 3 4 5
const metadata = await myObject.mediaReference?.getMetadataAsync(); if (isImageryMetadata(metadata)) { const imageWidth = metadata.dimensions?.width; ... }
You can also use type guards on the media item namespace, which then gives you access to more methods on the type-specific media item. The type guards you can use here are:
MediaItem.isAudio()
MediaItem.isDocument()
MediaItem.isImagery()
To extract text from a document, you can either use the ocrAsync
or the extractTextAsync
method on the media item.
For machine-generated PDFs, it may be faster and/or more accurate to extract text embedded digitally in the PDF rather than using optical character recognition (OCR). Below is an example of text extraction usage:
Copied!1
extractTextAsync(options: IDocumentExtractTextOptions): Promise<string[]>;
The following can optionally be provided as a TypeScript object:
startPage
: The zero-indexed start page (inclusive).endPage
: The zero-indexed end page (exclusive).For non-machine-generated PDFs, it would be best to use the OCR method for extracting text.
Copied!1
ocrAsync(options: IDocumentOcrOptions): Promise<string[]>;
The following can optionally be provided as a TypeScript object:
startPage
: The zero-indexed start page (inclusive).endPage
: The zero-indexed end page (exclusive).languages
: A list of languages to recognize (can be empty).scripts
: A list of scripts to recognize (can be empty).outputType
: Specifies the output type as text
or hocr
.Remember that you need to use type guards in order to access media-type specific operations. Here's an example of using the isDocument
type guard to then perform OCR text extraction:
Copied!1 2 3 4 5 6 7 8 9 10 11 12
import { MediaItem } from "@foundry/functions-api"; import { ArxivPaper } from "@foundry/ontology-api"; @Function() public async firstPageText(paper: ArxivPaper): Promise<string | undefined> { if (MediaItem.isDocument(paper.mediaReference!)) { const text = (await paper.mediaReference.ocrAsync({ endPage: 1, languages: [], scripts: [], outputType: 'text' }))[0]; return text; } return undefined; }
Audio media items support transcription using the transcribeAsync
method. The signature is as follows:
Copied!1
transcribeAsync(options: IAudioTranscriptionOptions): Promise<string>;
The following can optionally be passed in to specify how the transcription should run. The available options are:
language
: The language to transcribe, passed using the TranscriptionLanguage
enum.performanceMode
: Runs transcriptions in More Economical
or More Performant
mode, passed using the TranscriptionPerformanceMode
enum.outputFormat
: Specifies the output format by passing an object of type
plainTextNoSegmentData
(plain text) or pttml
. pttml
is a TTML-like ↗️ format where the object also takes a boolean addTimestamps
parameter if the type is plainTextNoSegmentData
.Here's an example of providing options for transcription:
Copied!1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
import { Function, MediaItem, TranscriptionLanguage, TranscriptionPerformanceMode } from "@foundry/functions-api"; import { AudioFile } from "@foundry/ontology-api"; @Function() public async transcribeAudioFile(file: AudioFile): Promise<string|undefined> { if (MediaItem.isAudio(file.mediaReference!)) { return await file.mediaReference.transcribeAsync({ language: TranscriptionLanguage.ENGLISH, performanceMode: TranscriptionPerformanceMode.MORE_ECONOMICAL, outputFormat: {type: "plainTextNoSegmentData", addTimestamps: true} }); } return undefined; }