Extract text from PDF

Supported in: Batch

Extract raw text from pages in PDF files.

Expression categories: Media

Declared arguments

  • Media reference - The column containing media references to PDF files in a media set.
    Expression<Media reference>
  • optional End page - Page range end, inclusive. Defaults to the last page in the document. Supports negative indexing.
    Expression<Integer>
  • optional Start page - Page range start, inclusive. Defaults to the first page (1) in the document.
    Expression<Integer>

Output type: Array<String>

Examples

Example 1: Base case

Argument values:

  • Media reference: Media Reference
  • End page: End Page
  • Start page: Start Page
Media ReferenceStart PageEnd PageOutput
{"mimeType":"application/pdf","reference":{"type":"mediaSetItem","mediaSetItem":{"mediaSetRid":"ri.mio.test.media-set.1","mediaItemRid":"ri.mio.test.media-item.1"}}}12[ first page, second page ]