Extracts content from the specified document, while preserving the document's layout.
Expression categories: Media
Declared arguments
Languages to detect: Languages to detect in the input files. Set<Enum<Afrikaans, Albanian, Amharic, Arabic, Armenian, Assamese, Azerbaijani, Azerbaijani - Cyrilic, Basque, Belarusian, and more ...>>
Media reference: The PDF to extract content from. Expression<Media reference>
Output format: The desired format of the output. Choose between a simple text-based output or a structured output with all details, including the bounding boxes. Enum<Full extract, Text and tables>
optionalEnd page: The end of the page range (inclusive). If no value is provided, it will default to the last page. Expression<Integer>
optionalError handling: Determines the behavior of the pipeline for inputs that fail to process. Enum<FAIL, NULL>
optionalStart page: The start of the page range. If no value is provided, it will default to the first page. Expression<Integer>