Both one-time and recurring sensitive data scans are configured via the same user workflow. To begin, navigate to the Sensitive Data Scanner application and select Create new sensitive data scan. This will open an overview page explaining the steps in making a sensitive data scan:
You can explicitly include datasets or folders to scan by selecting Add resource under Included datasets and folders. If you add folders, all datasets within those folders will be scanned, unless they have been explicitly excluded. You must include at least one folder (including spaces/projects) or dataset in this section in order to proceed.
Similarly, you can explicitly exclude datasets or folders to scan by selecting Add resource under Excluded datasets and folders. If you add folders, all datasets within those folders will be excluded from the scan unless they’ve been explicitly included. You are not required to explicitly exclude any resources (this section can be left empty).
In the rare case where a resource is both included and excluded, the most specific inclusion or exclusion will take precedence. For example, a dataset could be included, but it might be located in an excluded folder. In this case, the dataset (included) is more specific than the parent folder (excluded), so the dataset will be scanned.
The following scan strategy options allow you to further refine the behavior of your sensitive data scan.
Sensitive Data Scanner allows you to specify which datasets to scan based on certain dataset attributes. There are two options:
In the example below, Sensitive Data Scanner will scan only source datasets within the selected folder(s).
Additionally, you have the option to configure the number of rows to be scanned.
In the Scan Schedule section, you have two options to configure when the scan will run:
Similar to including and excluding datasets and folders explicitly, you can include and exclude datasets based on the Markings on those datasets. This is an advanced feature that is generally used to exclude datasets that already are protected. For example, in the screenshot below, we see that datasets marked with PII (Personally Identifiable Information) will not be scanned because this PII marking may have been applied after a match in a prior sensitive data scan.
The first steps for creating a sensitive data scan are to select the specific match conditions you would like to look for, followed by the specific match actions that Sensitive Data Scanner should perform if a match is found.
When choosing your match conditions, consider what sensitive data you want to look for and what match conditions are already available for your space. If the desired type(s) of sensitive data do not have a corresponding match condition, you can create a new match condition.
When choosing your match actions, consider the appropriate response to detecting your sensitive data: you can choose to apply Markings, which will place access controls on datasets with sensitive data, or create issues, which will inform a specified set of users that sensitive data was found. You can also choose to not apply an match action at all.
If the appropriate match action does not exist in your space, you can create one. See Create match actions for more details.
If your sensitive data scan involves a substantial number of datasets, it is advisable to test the match conditions before proceeding further. Misconfigured match conditions may generate false positives, leading to unwanted issues or markings on datasets. To test match conditions, select No Match Actions for your scan. Once the scan has finished and you have verified that the match condition aligns with the expected format of data, you can apply additional match actions from the scan's overview page.
Review Applying additional match actions for more details.
In the final stage of creating an sensitive data scan, you can review the match conditions, match actions, and resources that you’ve selected for the scan.
At this step, Sensitive Data Scanner will also compute the datasets required for the scan based on the resource filters you chose when tuning your scan.
If you chose a one-time scan schedule, you can trigger the scan by selecting Run One-Time Scan.
If you chose a recurring scan schedule, you will also be able to add a name and description for the scan. You can then save the scan by selecting Save as Recurring Scan.
For inactive scans that were created within the past seven days, you can select additional match actions to apply to the sensitive data discovered by the scan.
For recurring scans, additional match actions will only apply to matches that were identified up until the point at which the additional match action was selected. If the scan is reactivated later, any future sensitive data detected by the recurring scan will not automatically have the previously selected additional match actions applied.
View the status of the application of additional match actions at the bottom of a scan's overview page.
For inactive scans that were created within the past seven days, you can reverse match actions that were previously applied to the sensitive data discovered by the scan. For Create issues match actions, this will result in the deletion of issues created by the action. For Apply markings match actions, it will involve removing the markings that were applied by the action.
For recurring scans, only the results of the actions up until the time the match action reversal was performed will be reversed. If the scan is later reactivated, any future sensitive data discovered by the recurring scan will still apply the match action if it was configured in the initial scan setup, even after being reversed. To stop the recurring scan from continuing to perform certain match actions when it is made active again, edit the recurring scan to remove the match action before reactivating the scan.
View the status of the reversal of match actions at the bottom of a scan's overview page.