Security Auditing

Overview

Audit logs are the primary way for an auditor to understand what actions have been taken by Foundry users.

Audit logs in Foundry contain enough information to infer:

  • who performed an action
  • what the action was
  • when the action happened
  • where the action happened

In some cases, audit logs will contain contextual information about users including Personal Identifiable Information (PII), such as names and email addresses. As such, audit log contents should be considered sensitive and viewed only by persons with the necessary security qualifications.

Audit logs are typically consumed into a purpose-built system for security monitoring (a "security information and event management", or SIEM solution) owned by the customer.

This guide will explain the process of extracting and using audit logs in two sections:

Best practice

Customers are strongly encouraged to capture and monitor their own audit logs via the mechanisms presented below. See Monitoring Security Audit Logs for additional guidance.

Audit delivery

Audit logs are delivered for downstream consumption via several mechanisms depending on a customer's security infrastructure and SIEM requirements. Batches of audit logs produced by Foundry services are compiled, compressed, and moved to a log bucket within 24 hours (often S3, although environment-dependent). From here, Foundry can deliver logs directly to customer for consumption via Audit Export To Foundry.

Audit Export to Foundry

Audit logs can be exported, per-organization, directly into a Foundry dataset. As part of the configuration setup, an organization admin chooses where within the Foundry file system this audit log dataset will be generated.

Once log data has landed in a dataset, a customer may choose to export the audit data to an external SIEM via Foundry's Data Connection application.

Export permissions

To export audit logs, a user will need audit-export:orchestrate-v2 operation on the target organization(s). This can be granted via the Organization administrator role in Control Panel, under the Organization permissions tab. See Organization Permissions for more details.

Export setup

To set up Audit Export to Foundry:

  1. Navigate to the Foundry Control Panel.
  2. In the left-hand sidebar, select the relevant organization via the dropdown menu.
  3. In the left-hand sidebar, select Audit Logs under Organization Settings.
  4. Select Create dataset to generate an audit log dataset. Select a location in Foundry for this dataset. By default, audit log datasets will be marked with the Organization selected above. See Organization Markings.
  5. Optionally, configure a start date filter to limit this dataset to events that occur on or after a given date.
  6. Optionally, configure a retention policy to limit the amount of time logs are kept in this export dataset. Note that retention policies are based on the transaction timestamp when logs were added to the export dataset, not the timestamp of the log entries themselves. Deleting the relevant transactions may take up to seven days after they are marked for deletion. Here are two examples to illustrate how this works in practice:
    • Example one:
      • Start date: One year ago
      • Retention policy: 90 days
      • In this case, the export dataset will initially contain logs from the past year. 90 days after the creation of the dataset, the retention policy will take effect, and only logs added in the last 90 days will be retained, rather than the full year that was initially in the dataset. In practice, this means that the size of the dataset decreases significantly after 90 days when older logs are removed.
    • Example two:
      • Start date: 30 days ago
      • Retention policy: 90 days
      • In this scenario, the export dataset will begin with logs from the past 30 days. As new logs are added, the dataset will grow until it holds a rolling window of the most recent 90 days of data. Logs added in transactions older than 90 days will be removed according to the retention policy.

audit export to Foundry setup

Note that for larger stacks, builds in the first several hours may produce empty append transactions. This is expected behavior as the pipeline processes a backlog of audit logs.

Best practice

Due to the sensitivity of audit logs, it is highly recommended that the created dataset is restricted on a need-to-know basis and is only accessible by persons with necessary qualifications. Use markings to restrict your audit log dataset and to specify the set of platform administrators who can view potentially sensitive usage details like identifying information or search queries.

Disable an export

To disable an export, move the audit log dataset to the trash or to another Project.

Moving an audit log dataset will stop any further builds of that dataset. There is no way to restart these builds, even if the dataset is subsequently restored from the trash or moved back to the original Project.

Audit log updates

On build, audit log datasets follow a specific set of conditions to append new logs as they become available (subject to change):

  • Once audit logs are available in the log bucket, typically within 24 hours, an ingest runs every two hours to land those logs in a hidden intermediary dataset upstream of the export dataset.
  • New logs are then appended to the export dataset from the intermediary dataset once every 10 minutes. Each append pulls at most 10 GB of log data. Appends of 10GB of log data are generally only needed when an audit log dataset is first created.
  • When no new log data is available, the schedule on the export dataset pauses for one hour before resuming.
  • In most cases, once an audit log dataset is fully up-to-date, jobs will continue to run every hour. Typically, one out of every three jobs will append new data to an audit log dataset; the others will be aborted with no additional content written.
  • The runtime of each audit log append is directly proportional to how many new logs are being appended to an audit log dataset.
  • Schedules controlling the builds of the export dataset are controlled by audit-export and are hidden from view of the user.

Analyzing an audit log dataset

Audit log datasets can contain very high volumes of data, so we recommend filtering down this dataset using the time column before performing any aggregations or visualizations. For any filtering, we recommend using Pipeline Builder or Transforms as audit datasets may be too large to effectively analyze in Contour without filtering them first.

Audit schema

All logs that Palantir products produce are structured logs. This means that they have a specific schema that they follow, which can be relied on by downstream systems.

Palantir audit logs are currently delivered in the audit.2 schema, also commonly refered to as "Audit V2". An updated schema, audit.3 or "Audit V3" is in development but is not yet generally available.

Within both audit.2 and audit.3 schemas, audit logs may vary depending on the service that produces it. This is because each service is reasoning about a different domain, and thus will have different concerns that it needs to describe. This variance is more noticeable in audit.2, as will be explained below.

Service-specific information is primarily captured within the request_params and result_params fields. The contents of these fields will change shape depending on both the service doing the logging and the event being logged.

Audit categories

Audit logs can be thought of as a distilled record of all actions taken by users in the platform. This is often a compromise between verbosity and precision, where overly verbose logs may contain more information but be more difficult to reason about.

Palantir logs include a concept called audit categories to make logs easier to reason about with little service-specific knowledge.

With audit categories, audit logs are described as a union of auditable events. Audit categories are based on a set of core concepts, such as data versus metaData versus logic, and divided into categories that describe actions on those concepts, such as dataLoad (loading data from the system), metaDataCreate (creating a new piece of metadata that describes some data), and logicDelete (deleting some logic within the system, where the logic describes a transformation between two pieces of data).

Audit categories have also gone through a versioning change, from a looser form within audit.2 logs to a stricter and richer form within audit.3 logs. See below for more detail.

Refer to Audit log categories for a detailed list of available audit.2 and audit.3 categories.

Audit log attribution

Audit logs are written to a single log archive per environment. When audit logs are processed via the delivery pipeline, the User ID fields (uid and otherUids in the schema below) are extracted, and the users are mapped to their corresponding organizations.

An Audit Export orchestrated for a given orchestration is limited to audit logs attributed to that organization. Actions taken solely by service (non-human) users will not typically be attributed to any organization as these users are not organization members, except service users for Third Party Applications using Client Credentials Grants and used only by the registering organization, which will generate audit logs attributed to that organization.

Audit.2 logs

audit.2 logs have no inter-service guarantees about the shape of the request and response parameters. As such, reasoning about audit logs must typically be performed on a service-by-service basis.

audit.2 logs may present an audit category within them that can be useful for narrowing a search. However, this category does not contain further information or prescribe the rest of the contents of the audit log. Additionally, audit.2 logs are not guaranteed to contain an audit category. If present, categories will be included in either the _category or _categories field within request_params.

The schema of audit.2 log export datasets is provided below.

FieldTypeDescription
filename.log.gzName of the compressed file from the log archive
typestringSpecifies the audit schema version - "audit.2"
timedatetimeRFC3339Nano UTC datetime string, e.g. 2023-03-13T23:20:24.180Z
uidoptional<UserId>User ID (if available); this is the most downstream caller
sidoptional<SessionId>Session ID (if available)
token_idoptional<TokenId>API token ID (if available)
ipstringBest-effort identifier of the originating IP address
trace_idoptional<TraceId>Zipkin trace ID (if available)
namestringName of the audit event, such as PUT_FILE
resultAuditResultThe result of the event (success, failure, etc.)
request_paramsmap<string, any>The parameters known at method invocation time
result_paramsmap<string, any>Information derived within a method, commonly parts of the return value

Audit.3 logs

Audit V3 is under development and is not yet generally available.

audit.3 logs establish stricter usage of audit categories to reduce the need to understand the particular service when reasoning about log contents. audit.3 logs are produced with the following guarantees in mind:

  1. Every audit category explicitly defines the values/items on which it applies - for example, dataLoad describes the precise resources that are loaded.
  2. Every log is produced strictly as a union of audit categories. This means that logs will not contain free-form data.
  3. Certain important information within an audit log are promoted to the top-level of the audit.3 schema. For example, all named resources are present at the top level, as well as within the request and response parameters.

These guarantees mean that for any particular log it is possible to tell (1) what auditable event created it and (2) exactly what fields it contains. These guarantees are service-agnostic.

The audit.3 schema is provided below. This information is non-exhaustive and subject to change:

FieldTypeDescription
environmentoptional<string>The environment that produced this log.
stackoptional<string>The stack on which this log was generated.
serviceoptional<string>The service that produced this log.
productstringThe product that produced this log.
productVersionstringThe version of the product that produced this log.
hoststringThe host that produced this log.
producerTypeAuditProducerHow this audit log was produced; for example, from a backend (SERVER) or frontend (CLIENT).
timedatetimeRFC3339Nano UTC datetime string, for example 2023-03-13T23:20:24.180Z.
namestringThe name of the audit event, such as PUT_FILE.
resultAuditResultIndicates whether the request was successful or the type of failure; for example, ERROR or UNAUTHORIZED.
categoriesset<string>All audit categories produced by this audit event.
entitieslist<any>All entities (for example, resources) present in the request and response params of this log.
usersset<ContextualizedUser>All users present in this audit log, contextualized.

ContextualizedUser:
fields:
  • uid: UserId
  • userName: optional<string>
  • firstName: optional<string>
  • lastName: optional<string>
  • groups: list<string>
  • realm: optional<string>
requestFieldsmap<string, any>The parameters known at method invocation time.
Entries in the request and response fields will be dependent on the categories field defined above.
resultFieldsmap<string, any>Information derived within a method, commonly parts of the return value.
originslist<string>All addresses attached to the request. This value can be spoofed.
sourceOriginoptional<string>The origin of the network request, with the value verified through the TCP stack.
originoptional<string>The best-effort identifier of the originating machine. For example, an IP address, a Kubernetes node identifier, or similar. This value can be spoofed.
orgIdoptional<string>The organization to which the uid belongs, if available.
userAgentoptional<string>The user agent of the user that originated this log.
uidoptional<UserId>The user ID, if available. This is the most downstream caller.
sidoptional<SessionId>The session ID, if available.
eventIduuidThe unique identifier for an auditable event. This can be used to group log lines that are part of the same event. For example, the same eventId will be logged in lines emitted at the start and end of a large binary response streamed to the consumer.
logEntryIduuidThe unique identifier for this audit log line, not repeated across any other log line in the system. Note that some log lines may be duplicated during ingestion into Foundry, and there may be several rows with the same logEntryId. Rows with the same logEntryId are duplicates and can be ignored.
sequenceIduuidA best-effort ordering field for events that share the same eventId.
traceIdoptional<TraceId>The Zipkin trace ID, if available.