Extract text from PDF

Supported in: Batch

Extracts raw text from the pages in a PDF.

Expression categories: Media

Declared arguments

  • Media reference - The column containing media references to PDF files in a media set.
    Expression<Media reference>
  • optional End page - The end of the page range (inclusive).
    Expression<Integer>
  • optional Start page - The start of the page range. If no value is provided, it will default to the first page.
    Expression<Integer>

Output type: Array<String>

Examples

Example 1: Base case

Argument values:

  • Media reference: Media Reference
  • End page: End Page
  • Start page: Start Page
Media ReferenceStart PageEnd PageOutput
{"mimeType":"application/pdf","reference":{"type":"mediaSetItem","mediaSetItem":{"mediaSetRid":"ri.mio.test.media-set.1","mediaItemRid":"ri.mio.test.media-item.1"}}}12[ first page, second page ]