Extracts text from the pages in a PDF file using optical character recognition (OCR).
Expression categories: Media
Declared arguments
Languages to detect - Languages to detect in the input files. Set<Enum<Afrikaans, Albanian, Amharic, Arabic, Armenian, Assamese, Azerbaijani, Azerbaijani - Cyrilic, Basque, Belarusian, and more ...>>
Media reference - The column containing media references to PDF files in a media set. Expression<Media reference>
OCR output format - Output will be an array of strings. Each entry corresponds to one page of the PDF. Enum<Text, hOCR>
Scripts to detect - Scripts to detect in the input files. Set<Enum<Arabic, Armenian, Bengali, Canadian Aboriginal, Cherokee, Cyrillic, Devanagari, Ethiopic, Fraktur, Georgian, and more ...>>
optionalEnd page - The end of the page range (inclusive). Negative indexing is supported. Expression<Integer>
optionalError handling - Determines the behavior of the pipeline for inputs that fail to process. Enum<FAIL, NULL>
optionalStart page - The start of the page range. If no value is provided, it will default to the first page. Expression<Integer>