Extract document metadata

Supported in: Batch

Extracts metadata fields from a document.

Expression categories: Media

Declared arguments

  • Media reference - The column containing media references to PDF files in a media set.
    Expression<Media reference>
  • Metadata to include - Select the metadata columns to include in the output.
    Set<Enum<Bytes, Document author, Document title, Page count>>

Output type: Struct

Examples

Example 1: Base case

Argument values:

  • Media reference: Media Reference
  • Metadata to include: [Document Author, Page Count, Document Title]
Media ReferenceOutput
{"mimeType":"application/pdf","reference":{"type":"mediaSetItem","mediaSetItem":{"mediaSetRid":"ri.mio.test.media-set.1","mediaItemRid":"ri.mio.test.media-item.1"}}}{
author: Jane Doe,
page_count: 23,
title: Document Title,
}

Example 2: Base case

Argument values:

  • Media reference: Media Reference
  • Metadata to include: [Document Title]
Media ReferenceOutput
{"mimeType":"application/pdf","reference":{"type":"mediaSetItem","mediaSetItem":{"mediaSetRid":"ri.mio.test.media-set.1","mediaItemRid":"ri.mio.test.media-item.1"}}}{
title: Who Framed Roger Rabbit - Final Script,
}

Example 3: Base case

Argument values:

  • Media reference: Media Reference
  • Metadata to include: [Document Author, Page Count]
Media ReferenceOutput
{"mimeType":"application/pdf","reference":{"type":"mediaSetItem","mediaSetItem":{"mediaSetRid":"ri.mio.test.media-set.1","mediaItemRid":"ri.mio.test.media-item.1"}}}{
author: John Smith,
page_count: 78,
}

Example 4: Null case

Argument values:

  • Media reference: Media Reference
  • Metadata to include: []
Media ReferenceOutput
{"mimeType":"application/pdf","reference":{"type":"mediaSetItem","mediaSetItem":{"mediaSetRid":"ri.mio.test.media-set.1","mediaItemRid":"ri.mio.test.media-item.1"}}}null