Extract document metadata

Supported in: Batch

Extract metadata fields from the document's media reference.

Expression categories: Media

Declared arguments

  • Document metadata information to include - Additional metadata columns to include.
    Set<Enum<Bytes, Document Author, Document Title, Page Count>>
  • Media reference - The column containing media references to PDF files in a media set.
    Expression<Media reference>

Output type: Struct

Examples

Example 1: Base case

Argument values:

  • Document metadata information to include: [Document Author, Page Count, Document Title]
  • Media reference: Media Reference
Media ReferenceOutput
{"mimeType":"application/pdf","reference":{"type":"mediaSetItem","mediaSetItem":{"mediaSetRid":"ri.mio.test.media-set.1","mediaItemRid":"ri.mio.test.media-item.1"}}}{
author: Jane Doe,
page_count: 23,
title: Document Title,
}

Example 2: Base case

Argument values:

  • Document metadata information to include: [Document Title]
  • Media reference: Media Reference
Media ReferenceOutput
{"mimeType":"application/pdf","reference":{"type":"mediaSetItem","mediaSetItem":{"mediaSetRid":"ri.mio.test.media-set.1","mediaItemRid":"ri.mio.test.media-item.1"}}}{
title: Who Framed Roger Rabbit - Final Script,
}

Example 3: Base case

Argument values:

  • Document metadata information to include: [Document Author, Page Count]
  • Media reference: Media Reference
Media ReferenceOutput
{"mimeType":"application/pdf","reference":{"type":"mediaSetItem","mediaSetItem":{"mediaSetRid":"ri.mio.test.media-set.1","mediaItemRid":"ri.mio.test.media-item.1"}}}{
author: John Smith,
page_count: 78,
}

Example 4: Null case

Argument values:

  • Document metadata information to include: []
  • Media reference: Media Reference
Media ReferenceOutput
{"mimeType":"application/pdf","reference":{"type":"mediaSetItem","mediaSetItem":{"mediaSetRid":"ri.mio.test.media-set.1","mediaItemRid":"ri.mio.test.media-item.1"}}}null