Extract text from PDF

Supported in: Batch

Extracts raw text from the pages in a PDF.

Expression categories: Media

Declared arguments

Media reference: The column containing media references to PDF files in a media set.
Expression<Media reference>
optional End page: The end of the page range (inclusive).
Expression<Integer>
optional Error handling: Determines the behavior of the pipeline for inputs that fail to process.
Enum<FAIL, NULL>
optional Start page: The start of the page range. If no value is provided, it will default to the first page.
Expression<Integer>

Output type: Array<String>

Argument values:

Media Reference	Start Page	End Page	Output
{"mimeType":"application/pdf","reference":{"type":"mediaSetItem","mediaSetItem":{"mediaSetRid":"ri.mio.test.media-set.1","mediaItemRid":"ri.mio.test.media-item.1"}}}	1	2	[ first page, second page ]