View and understand a check group

Data Health incidents often span multiple interconnected resources. Check groups allow you to monitor, troubleshoot, and track related checks. Use the check group functionality to:

  1. Quickly evaluate the health of your data
  2. Identify clusters of failures and determine their root cause
  3. Gather context on failing resources and past events
  4. Take action on multiple checks and communicate your remediation steps

To start, open the relevant check group from the Data Health check group tab. Then click on the “Actions” menu and choose “Troubleshoot checks”.

check groups overview

The check group diagnosis page consists of four main sections:

  1. Check group details
  2. Failure spotlight
  3. List of checks
  4. Context panel

Check group details

The check group details are located in the interface header and contain information such as the check group description and the status of the checks in the group.

The check status diagram will show the number of checks in the group by severity. Snoozed checks will be displayed in faded striped bars.

Failure spotlight

The failure spotlight shows clusters of failing checks with common properties such as check type, resource, data source, etc. These clusters of failing checks can point to a single root cause and are often a good place to begin troubleshooting. By clicking on a cluster of failing checks, the relevant checks will be highlighted in the check list below.

List of checks

The list of checks shows all currently failing checks within the check group.

Grouping

The “Group by” dropdown allows you to group the list of checks by different categories:

  • Build - This strategy will group checks that relate to the same dataset build.
  • Snoozed together - This will group together all checks that were snoozed at the same time and for the same reason.
  • Related - This grouping strategy will attempt to group together all the failing checks in the list, using all the available grouping strategies (starting with checks snoozed together, and then by build).

View options

View options allow you to set common filters on your list of checks:

  • Hide snoozed - This option will hide all datasets with currently-snoozed notifications.
  • Hide finished duration checks - This option will hide all “Build duration” check reports for builds that were finished after the check report had been triggered.

The actions toolbar

By holding the command/control key you can select multiple checks from the list of checks. This "multiple selection" will affect the context panel and open the actions toolbar at the bottom of the list of checks. Any action taken through the toolbar will impact all selected checks or target resources.

actions toolbar

Snoozing notifications in check groups

"Snoozing" notifications turns off notifications temporarily and can reduce noise from known temporary failures and remediated incidents . You can snooze or re-enable notifications individually or by selecting multiple checks.

Snoozing individual checks

Each individual check has a snooze button. The snooze button will appear gray when notifications are active, and will be colored when notifications are snoozed. Clicking on an active snooze button will allow you to “re-snooze” the check, setting a different snooze period.

An orange dot next to an inactive snooze button will signal that the check has recently been snoozed. This may indicate an ongoing incident or past remediation steps. By hovering over the snooze button, you can view the recent snooze history on the relevant check.

snoozing checks dialog

Snoozing multiple checks

You can snooze several checks at the same time by selecting multiple checks and using the Actions toolbar. This will apply the same snooze time frame and reason for all selected checks. Similarly, you can un-snooze (re-enable) multiple checks together through the Actions ribbon.

Snoozing notifications through the Actions toolbar snoozes each of the selected checks individually with similar timeframe and reason. To re-enable notifications on the checks, you need to select them all and un-snooze them through the Actions toolbar.

When you group your list of checks by “snoozed together”, all checks that were snoozed together will appear in a single group, with the reason at the header of the group. Note that snoozed checks are hidden by default; remove the filter if you want to review them.

Context panel

The context panel provides relevant information on the check group, the selected checks, and the target resources monitored by the check group. You can use the context panel to gain insights on past and ongoing incidents, as well as to launch useful Foundry applications to troubleshoot and resolve issues. The context panel includes the following functionality:

Comments

In the Comments tab, you can leave messages related to incidents, troubleshooting steps, and resolutions. The messages you post in the Comments section will be viewable by any user with permissions to view the check group.

Selecting checks in the list will impact both comments in two ways:

  1. Posting a comment will reference the target resources of the selected checks.
  2. The list of comments will highlight all existing comments that reference the target resources.

Issues

Foundry Issues is a helpful tool for managing, tracking, and communicating ongoing incidents. The Issues panel will show you all open issues on resources monitored by the check group. By selecting checks, you can filter the list to show only issues reported on the target resources of the selected checks.

Tip

Check the issues tab before starting to troubleshoot a check. Related issues can help with finding the root cause, and you may find that the problem is already being handled by someone else.

Schedules

The schedules panel will show all the schedules affecting the target resources in the check list. You can use the view to gather basic information on relevant schedules, open the schedule metrics and configuration app, and run your schedule if it is part of your remediation steps. Note that there is sometimes more than a single schedule related to datasets.

Source information

The source panel will show information on the originating source, which is the Foundry resource used to generate the failing resource (for example, a Fusion spreadsheet, code repository, data connection source, etc.)

When the dataset is created by code repository, this panel will show the commit history of the failing dataset (i.e. the dataset which caused the failure, rather than the target dataset on which the check ran). The commit history can help you discover recent code changes that may have caused the failure.

context panel commits view