Monitoring at scale

Monitoring at scale introduces new capabilities that make monitoring Foundry resources less time-intensive.

If you are already using check groups, think of this as an additional option for monitoring your resources. It will not replace any workflows or check groups you have already set up.

Terms and definitions

  • Metric: Resources emit metrics, or logs. Monitors are created on top of these metrics to set a user’s standards of performance on a given resource.
  • Resource: A “thing” in Foundry that can be monitored, including datasets, agents, schedules, objects, and link types.
  • Scope: A scope is the boundary around the set of resources on which your thresholds are set. A resource can be monitored on different scope types:
    • Single: The monitor is only applied to that specific resource.
    • Project: The monitor is applied to any resources of the specified type in the Project or multiple Projects.
  • Monitoring rule: A threshold or set of thresholds put on the metrics of a resource within a given scope and contain:
    • Resource type
    • Metric threshold tolerances
    • Severity level assignment
  • Monitoring view: A collection of monitoring rules that a group of subscribers care about.
  • Subscriber: A user subscribed to a monitoring view.
  • Alerts: Notifications that can have low, medium, or high assignments and are sent to subscribers.

Start monitoring resources

You can monitor resources in two ways:

  • Upgrade an existing check group to a monitoring view
  • Create a new monitoring view

Upgrade an existing check group to a monitoring view

To upgrade an existing check group, open your check group in the Data Health application. In the top banner, select Upgrade to Monitoring View.

You can create a new monitoring view or move all the checks to an existing monitoring view.

  • Monitoring views are filesystem resources. If you are creating a new monitoring view, be sure to store it in a Project accessible to potential subscribers.
  • After upgrading your check group, checks will continue to be supported exactly as they are now. There are no changes to email digest, alerting, subscriptions, or any other workflow related to health checks.
  • Each check group can be linked to a single monitoring view and vice versa; therefore, you can only upgrade one check group to a single existing monitoring view, or create a new monitoring view if a suitable one does not exist.

Create a new monitoring view

To create a new monitoring view, go to the Monitoring View tab in the top right corner of the Data Health app and create a new monitoring view.

Create monitoring rules

To create a monitoring rule, navigate to the Manage monitors tab. First, select the resource type you are looking to monitor. Depending on the resource type, you can either choose to monitor just that resource on a single scope, or you can monitor all the resources of that type across a single or multiple Project scope.

You must have Viewer permission on the resources to monitor them. To receive alerts triggered by monitoring rules, you must have Viewer permission on the resources and the monitoring view.

Configure monitors

Monitors are set on the metrics a resource emits. As you set up your monitors, we suggest certain configurations based on Foundry’s standards for health. However, you can change the values or choose to only monitor certain metrics. You can also determine the level of severity the alert will have when it fails. Currently there are three severity types: low, medium, and high.

Edit monitors

You can edit your monitors by selecting from the list of monitors and choosing Edit on the side panel that appears.

Subscribe to alerts

To subscribe to alerts, navigate to the Manage subscriptions tab where all the subscribed users are listed. You can add users and user groups, and configure their alerts based on severity. When a monitor rule triggers an alert, the user subscribed to the monitoring view containing that alert will be notified via email and Foundry notifications. Note that you must have Viewer permission on the resources and the monitoring view to be able to receive alerts.

Integrate with external systems

You can send alerts to external systems such as PagerDuty or Slack with built-in integrations or by using a webhook to hit arbitrary REST endpoints. Learn more about sending alerts to external systems.

FAQ

What resources can be monitored?

You can monitor the following:

Resource typeSupported scope
AgentSingle, Project
Object typeSingle
Link typeSingle
ScheduleSingle, Project
Streaming datasetsSingle, Project
Live deploymentsProject
Time series syncsSingle
Geotemporal observationsSingle
AutomationsSingle, Project
Dataset (coming soon)Project

A reference can be found here

Do all health checks now exist as monitoring rules?

Not all health checks exist as monitoring rules, but the most important health checks have analogous monitoring rules. We recommend using a combination of monitoring rules and health checks in a linked check group. To summarize coverage from monitoring views and health checks:

  1. Resources that can only be monitored with monitoring views: Data connection agents, objects and links in Object Storage V2 (OSv2), Streaming datasets, and Live deployments of models
  2. Dataset-level checks that only exist as health checks: Content, freshness, and schema checks; data expectations; OSv1 (phonograph) and foundry-sync checks
  3. Monitoring rules that replace functionality from health checks: Consecutive schedule failures (replacing schedule status checks) and Schedule duration monitors

For the most comprehensive coverage, we suggest linking your monitoring view to a check group that consists of health checks not currently available in monitoring views.

Why use monitors over health checks?

Monitors cover an entire scope rather than a single resource. This means that when an additional resource is added to that scope, it is automatically covered by the rule. For example, a monitoring rule that is set up to monitor all agents in a Project will also monitor any further agents added into that Project at a later time.

When should I create a new monitoring view instead of adding new rules to an existing one?

A good practice is to think of a single monitoring view the same way you would think of a check group. One monitoring view should relate to a set of users who care about the monitors that are in that view. If a specific set of users [a, b, c] cares about specific Projects [x, y, z], create a single monitoring view with all the resources in those Projects. If a specific set of users only care about monitoring agents, you should create a single monitoring view to monitor all agents in all Projects.

What permissions are required for a monitoring view?

Since a monitoring view is a filesystem resource, a user will need permission to the Project or folder in which the view is saved. To receive alerts or set up monitoring rules on a resource, the user will need access to the Project resources they wish to monitor. Even if a user with all necessary permissions subscribes a user or group to a monitoring view, those new subscribers will NOT receive alerts on any resources if they do not have explicit access permissions to that monitoring view.