Use Palantir-provided language models within transforms

Prerequisites

To use Palantir-provided language models, AIP must first be enabled on your enrollment.

Palantir provides a set of language and embedding models which can be used within Python transforms. The models can be used through the palantir_models library. This library provides a set of FoundryInputParams that can be used with the transforms.api.transform decorator.

Repository setup

To add language model support to your transforms, open the library search panel on the left side of your Code Repository. Search for palantir_models and choose Add and install library within the Library tab. Repeat this process with language-model-service-api to add that library as well.

Your Code Repository will then resolve all dependencies and run checks again. Checks may take a moment to complete, after which you will be able to start using the library in your transforms.

Transform setup

Prerequisites

The palantir_model classes can only be used with the transforms.api.transform decorator.

In this example, we will use the palantir_models.transforms.OpenAiGptChatLanguageModelInput. First, import OpenAiGptChatLanguageModelInput into your Python file. This class can now be used to create our transform. Then, follow the prompts to specify and import the model and dataset that you wish to use as input.

Copied!
1 2 3 4 5 6 7 8 9 10 11 from transforms.api import transform, Input, Output from palantir_models.transforms import OpenAiGptChatLanguageModelInput from palantir_models.models import OpenAiGptChatLanguageModel @transform( source_df=Input("/path/to/input/dataset") model=OpenAiGptChatLanguageModelInput("ri.language-model-service..language-model.gpt-4_azure"), output=Output("/path/to/output/dataset"), ) def compute_generic(ctx, source_df, model: OpenAiGptChatLanguageModel, output): ...

As you begin typing the resource identifier, a dropdown menu will automatically appear to indicate models available for use. You may choose your desired option from the dropdown.

Dropdown menu showing available LLMs for selection following a user's input of "ri.".

Use language models to generate completions

For this example, we will be using the language model to determine the sentiment for each review in the input dataset. The OpenAiGptChatLanguageModelInput provides an OpenAiGptChatLanguageModel to the transform at runtime which can then be used to generate completions for reviews.

Copied!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 from transforms.api import transform, Input, Output from palantir_models.transforms import OpenAiGptChatLanguageModelInput from palantir_models.models import OpenAiGptChatLanguageModel from language_model_service_api.languagemodelservice_api_completion_v3 import GptChatCompletionRequest from language_model_service_api.languagemodelservice_api import ChatMessage, ChatMessageRole @transform( reviews=Input("/path/to/reviews/dataset"), model=OpenAiGptChatLanguageModelInput("ri.language-model-service..language-model.gpt-4_azure"), output=Output("/output/path"), ) def compute_sentiment(ctx, reviews, model: OpenAiGptChatLanguageModel, output): def get_completions(review_content: str) -> str: system_prompt = "Take the following review determine the sentiment of the review" request = GptChatCompletionRequest( [ChatMessage(ChatMessageRole.SYSTEM, system_prompt), ChatMessage(ChatMessageRole.USER, review_content)] ) resp = model.create_chat_completion(request) return resp.choices[0].message.content reviews_df = reviews.pandas() reviews_df['sentiment'] = reviews_df['review_content'].apply(get_completions) out_df = ctx.spark_session.createDataFrame(reviews_df) return output.write_dataframe(out_df)

Embeddings

Along with generative language models, Palantir also provides an embedding model. The following example shows how we can use the palantir_models.transforms.GenericEmbeddingModelInput to calculate embeddings on the same reviews dataset. The GenericEmbeddingModelInput provides a GenericEmbeddingModel to the transform at runtime which can be used to calculate embeddings for each review. The embeddings are explicitly cast to floats because the ontology vector property requires this.

Copied!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 from transforms.api import transform, Input, Output from language_model_service_api.languagemodelservice_api_embeddings_v3 import GenericEmbeddingsRequest from palantir_models.models import GenericEmbeddingModel from palantir_models.transforms import GenericEmbeddingModelInput from pyspark.sql.types import ArrayType, FloatType @transform( reviews=Input("/path/to/reviews/dataset"), embedding_model=GenericEmbeddingModelInput("ri.language-model-service..language-model.text-embedding-ada-002_azure"), output=Output("/path/to/embedding/output") ) def compute_embeddings(ctx, reviews, embedding_model: GenericEmbeddingModel, output): def internal_create_embeddings(val: str): return embedding_model.create_embeddings(GenericEmbeddingsRequest(inputs=[val])).embeddings[0] reviews_df = reviews.pandas() reviews_df['embedding'] = reviews_df['review_content'].apply(internal_create_embeddings) spark_df = ctx.spark_session.createDataFrame(reviews_df) out_df = spark_df.withColumn('embedding', spark_df['embedding'].cast(ArrayType(FloatType()))) return output.write_dataframe(out_df)

Use vision language models to extract PDF document content

Vision LLM-based document extraction and parsing is one of the most prevalent workflows in Foundry. Vision LLMs can extract information from complex documents with mixed content such as tables, figures, and charts, with high accuracy. The following transform input types are available to streamline the process of building a transform pipeline for document extraction in Foundry:

  • VisionLLMDocumentsExtractorInput: Processes PDF media sets by taking each media item and splitting it into individual pages that are converted into images and sent to the visual language model. This option is recommended for cases where custom image processing is not necessary, and a solution that handles every step of the process is preferred.
  • VisionLLMDocumentPageExtractorInput: Processes individual pages of a PDF document. This option is recommended in cases where users want more flexibility and control over the extraction process. For example, users can apply custom image processing, or handle splitting PDF pages with custom logic.

These transform input types abstract away common logic to simplify the process of extracting PDF content into formatted Markdown strings.

If you have used transforms for document extraction, you may be familiar with transform code similar to the example below. This example iterates over every page of a PDF document in the media set, converting it to an image and sending the encoded image to a Vision LLM using LMS model request types.

Copied!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 import base64 import json from transforms.api import Output, transform from transforms.mediasets import MediaSetInput from palantir_models.transforms import GenericVisionCompletionLanguageModelInput from language_model_service_api.languagemodelservice_api_completion_v3 import GenericVisionCompletionRequest from language_model_service_api.languagemodelservice_api import ( ChatMessageRole, GenericMediaContent, GenericMessage, GenericMessageContent, MimeType, ) prompt = """ You are an expert analyst with deep knowledge across various domains. You will be provided a table taken from a document page and you must extract all of its rows in valid table Markdown. Your output must only be the extracted valid table Markdown representation of the table shown. Further notes: If you don't find any tables on the page, you will return no output. Make sure the JSON containing all of the items you created is valid, and return it. Valid table Markdown: """ @transform( output=Output("ri.foundry.main.dataset.abc"), input=MediaSetInput("ri.mio.main.media-set.abc"), extractor=GenericVisionCompletionLanguageModelInput( "ri.language-model-service..language-model.anthropic-claude-3-7-sonnet" ), ) def compute(ctx, input, output, extractor): def process_item(media_item_rid): responses = [] metadata = input.get_media_item_metadata(media_item_rid).document for i in range(metadata.pages): image_bytes = input.transform_document_to_png( media_item_rid, page_number=i, height=2048 ).read() image_str = base64.b64encode(image_bytes).decode("utf-8") completion_request = GenericVisionCompletionRequest( [GenericMessage( contents=[ GenericMessageContent( generic_media=GenericMediaContent(content=image_str, mime_type=MimeType.IMAGE_PNG) ), GenericMessageContent(text=prompt), ], role=ChatMessageRole.USER) ], max_tokens=max_tokens, ) response = extractor.create_vision_completion(completion_request) responses.append({"extractionResult": response.completion, "page": i}) return json.dumps(responses) process_item_udf = udf(process_item, StringType()) media_references = input.list_media_items_by_path_with_media_reference(ctx) extracted_data = media_references.withColumn("extracted_text_document_page", process_item_udf(media_references["mediaItemRid"])) output.write_dataframe(extracted_data)

With the VisionLLMDocumentsExtractorInput, the following steps are abstracted away to simplify the process of PDF content extraction:

  • Per-page PDF to image processing
  • Base64 image encoding
  • LMS vision completion request construction
  • Parsing of the GENERIC_CHAT_COMPLETION_RESPONSE
  • Writing a comprehensive prompt to instruct the LLM on extracting document components into Markdown.

The example below uses a vision language model to extract content from PDF documents in a media set. The VisionLLMDocumentsExtractorInput provides a VisionLLMDocumentsExtractor to the transform at runtime. This extractor splits each PDF in the input media set into pages, converts each page to an image, and sends the images to the vision LLM, along with a prompt instructing the LLM to extract page contents into Markdown. The extracted output for each page is then stored in a row in the output dataset. We will refer to the image converted from the PDF page as the page image.

Copied!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 from transforms.api import Output, transform from transforms.mediasets import MediaSetInput from palantir_models.transforms import VisionLLMDocumentsExtractorInput @transform( output=Output("ri.foundry.main.dataset.abc"), input=MediaSetInput("ri.mio.main.media-set.abc"), extractor=VisionLLMDocumentsExtractorInput( "ri.language-model-service..language-model.anthropic-claude-3-7-sonnet") ) def compute(ctx, input, output, extractor): extracted_data = extractor.create_extraction(input, with_ocr=False) output.write_dataframe( extracted_data, column_typeclasses={ "mediaReference": [{"kind": "reference", "name": "media_reference"}] }, )

Note the with_ocr parameter in the code above. If set to True, for each page of each media item, mio.transform_document_to_text_ocr will first be applied to extract text using OCR. Then, the page image, the extracted OCR text, and the prompt will all be sent to the vision LLM for extraction. This can improve extraction results, but will increase runtime due to additional OCR processing. If set to False, only the page image will be sent to the model.

Note that there are slightly different default prompts for with_ocr=True and with_ocr=False if the user doesn't set the prompt explicitly. When with_ocr is set to False, the prompt asks the model to extract solely based on the page image. When set to True, the prompt instructs the model to use the OCR results as a reference.

A row in the output dataset will be populated for each page of a PDF document in the input media set. Each row has 5 columns; mediaItemRid, mediaReference, pageNumber, status, and extractionResult.

Customize the prompt

You can customize the prompt by passing it to the extractor like so:

Copied!
1 2 my_prompt = "my magic prompt" extracted_data = extractor.create_extraction(input, prompt=my_prompt)

Customize image processing specifications

By default, each page image is resized to a height of 2048 pixels, while retaining the original aspect ratio of each page before it was passed into the vision LLM. Users can customize the processing specifications as follows:

Copied!
1 2 3 4 5 6 7 8 from palantir_models.models._document_extractors import ImageSpec from transforms.mediasets import ResizingMode from language_model_service_api.languagemodelservice_api import MimeType image_spec = ImageSpec(resizing_mode=ResizingMode.RESIZING, width=2048, mime_type=MimeType.IMAGE_PNG) extracted_data = extractor.create_extraction(input, image_spec=image_spec)

Use VisionLLMDocumentPageExtractorInput for further customization

The VisionLLMDocumentPageExtractorInput can be used on single pages, providing less abstraction and more flexibility for users to apply customized image processing, like rotation, sharpening, or contrast adjustments.

Below is an example of VisionLLMDocumentPageExtractorInput usage. In this example, the same operation as the previous example using VisionLLMDocumentsExtractorInput is used, but this extractor operates on a single image.

Copied!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 import json from pyspark.sql.functions import udf from pyspark.sql.types import StringType from transforms.api import Output, transform from transforms.mediasets import MediaSetInput from palantir_models.transforms import VisionLLMDocumentPageExtractorInput @transform( output=Output("ri.foundry.main.dataset.abc"), input=MediaSetInput("ri.mio.main.media-set.abc"), extractor=VisionLLMDocumentPageExtractorInput( "ri.language-model-service..language-model.anthropic-claude-3-7-sonnet" ), ) def compute_document_page(ctx, input, output, extractor): def process_item(media_item_rid): responses = [] metadata = input.get_media_item_metadata(media_item_rid).document for i in range(metadata.pages): image_bytes = input.transform_document_to_png( media_item_rid, page_number=i, height=2048 ).read() ### insert any custome image processing logic response = extractor.create_extraction(image_bytes) responses.append({"extractionResult": response, "page": i}) return json.dumps(responses) process_item_udf = udf(process_item, StringType()) media_references = input.list_media_items_by_path_with_media_reference(ctx) extracted_data = media_references.withColumn( "extracted_text_document_page", process_item_udf(media_references["mediaItemRid"]), ) output.write_dataframe(extracted_data)