Create and use data profile scans

Dataplex Universal Catalog lets you identify common statistical characteristics (common values, data distribution, null counts) of the columns in your BigQuery tables. This information helps you to understand and analyze your data more effectively.

For more information about Dataplex Universal Catalog data profile scans, see About data profiling.

Before you begin

Enable the Dataplex API.

Roles required to enable APIs

To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains the serviceusage.services.enable permission. Learn how to grant roles.

Enable the API

Required roles

To get the permissions that you need to create and manage data profile scans, ask your administrator to grant you the following IAM roles on your resource such as the project or table:

  • To create, run, update, and delete data profile scans: Dataplex DataScan Editor (roles/dataplex.dataScanEditor) role on the project containing the data scan.
  • To allow Dataplex Universal Catalog to run data profile scans against BigQuery data, grant the following roles to the Dataplex Universal Catalog service account: BigQuery Job User (roles/bigquery.jobUser) role on the project running the scan; BigQuery Data Viewer (roles/bigquery.dataViewer) role on the tables being scanned.
  • To run data profile scans for BigQuery external tables that use Cloud Storage data: grant the Dataplex Universal Catalog service account the Storage Object Viewer (roles/storage.objectViewer) and Storage Legacy Bucket Reader (roles/storage.legacyBucketReader) roles on the Cloud Storage bucket.
  • To view data profile scan results, jobs, and history: Dataplex DataScan Viewer (roles/dataplex.dataScanViewer) role on the project containing the data scan.
  • To export data profile scan results to a BigQuery table: BigQuery Data Editor (roles/bigquery.dataEditor) role on the table.
  • To publish data profile scan results to Dataplex Universal Catalog: Dataplex Catalog Editor (roles/dataplex.catalogEditor) role on the @bigquery entry group.
  • To view published data profile scan results in BigQuery on the Data profile tab: BigQuery Data Viewer (roles/bigquery.dataViewer) role on the table.

For more information about granting roles, see Manage access to projects, folders, and organizations.

You might also be able to get the required permissions through custom roles or other predefined roles.

Required permissions

If you use custom roles, you need to grant the following IAM permissions:

  • To create, run, update, and delete data profile scans:
    • dataplex.datascans.create on project—Create a DataScan
    • dataplex.datascans.update on data scan—Update the description of a DataScan
    • dataplex.datascans.delete on data scan—Delete a DataScan
    • dataplex.datascans.run on data scan—Run a DataScan
    • dataplex.datascans.get on data scan—View DataScan details excluding results
    • dataplex.datascans.list on project—List DataScans
    • dataplex.dataScanJobs.get on data scan job—Read DataScan job resources
    • dataplex.dataScanJobs.list on data scan—List DataScan job resources in a project
  • To allow Dataplex Universal Catalog to run data profile scans against BigQuery data:
    • bigquery.jobs.create on project—Run jobs
    • bigquery.tables.get on table—Get table metadata
    • bigquery.tables.getData on table—Get table data
  • To run data profile scans for BigQuery external tables that use Cloud Storage data:
    • storage.buckets.get on bucket—Read bucket metadata
    • storage.objects.get on object—Read object data
  • To view data profile scan results, jobs, and history:
    • dataplex.datascans.getData on data scan—View DataScan details including results
    • dataplex.datascans.list on project—List DataScans
    • dataplex.dataScanJobs.get on data scan job—Read DataScan job resources
    • dataplex.dataScanJobs.list on data scan—List DataScan job resources in a project
  • To export data profile scan results to a BigQuery table:
    • bigquery.tables.create on dataset—Create tables
    • bigquery.tables.updateData on table—Write data to tables
  • To publish data profile scan results to Dataplex Universal Catalog:
    • dataplex.entryGroups.useDataProfileAspect on entry group—Allows Dataplex Universal Catalog data profile scans to save their results to Dataplex Universal Catalog
    • Additionally, you need one of the following permissions:
      • bigquery.tables.update on table—Update table metadata
      • dataplex.entries.update on entry—Update entries
  • To view published data profile results for a table in BigQuery or Dataplex Universal Catalog:
    • bigquery.tables.get on table—Get table metadata
    • bigquery.tables.getData on table—Get table data

If a table uses BigQuery row-level security, then Dataplex Universal Catalog can only scan rows visible to the Dataplex Universal Catalog service account. To allow Dataplex Universal Catalog to scan all rows, add its service account to a row filter where the predicate is TRUE.

If a table uses BigQuery column-level security, then Dataplex Universal Catalog requires access to scan protected columns. To grant access, give the Dataplex Universal Catalog service account the Data Catalog Fine-Grained Reader (roles/datacatalog.fineGrainedReader) role on all policy tags used in the table. The user creating or updating a data scan also needs permissions on protected columns.

Grant roles to the Dataplex Universal Catalog service account

To run data profile scans, Dataplex Universal Catalog uses a service account that requires permissions to run BigQuery jobs and read BigQuery table data. To grant the required roles, follow these steps:

  1. Get the Dataplex Universal Catalog service account email address. If you haven't created a data profile or data quality scan in this project before, run the following gcloud command to generate the service identity:

    gcloud beta services identity create --service=dataplex.googleapis.com
    

    The command returns the service account email, which has the following format: service-PROJECT_ID@gcp-sa-dataplex.iam.gserviceaccount.com.

    If the service account already exists, you can find its email by viewing principals with the Dataplex name on the IAM page in the Google Cloud console.

  2. Grant the service account the BigQuery Job User (roles/bigquery.jobUser) role on your project. This role lets the service account run BigQuery jobs for the scan.

    gcloud projects add-iam-policy-binding PROJECT_ID \
        --member="serviceAccount:service-PROJECT_NUMBER@gcp-sa-dataplex.iam.gserviceaccount.com" \
        --role="roles/bigquery.jobUser"
    

    Replace the following:

    • PROJECT_ID: your Google Cloud project ID.
    • service-PROJECT_NUMBER@gcp-sa-dataplex.iam.gserviceaccount.com: the email of the Dataplex Universal Catalog service account.
  3. Grant the service account the BigQuery Data Viewer (roles/bigquery.dataViewer) role for each table that you want to profile. This role grants read-only access to the tables.

    gcloud bigquery tables add-iam-policy-binding DATASET_ID.TABLE_ID \
        --member="serviceAccount:service-PROJECT_NUMBER@gcp-sa-dataplex.iam.gserviceaccount.com" \
        --role="roles/bigquery.dataViewer"
    

    Replace the following:

    • DATASET_ID: the ID of the dataset containing the table.
    • TABLE_ID: the ID of the table to profile.
    • service-PROJECT_NUMBER@gcp-sa-dataplex.iam.gserviceaccount.com: the email of the Dataplex Universal Catalog service account.

Create a data profile scan

Console

  1. In the Google Cloud console, go to the Dataplex Universal Catalog Data profiling & quality page.

    Go to Data profiling & quality

  2. Click Create data profile scan.

  3. Optional: Enter a Display name.

  4. Enter an ID. See the Resource naming conventions.

  5. Optional: Enter a Description.

  6. In the Table field, click Browse. Choose the table to scan, and then click Select.

    For tables in multi-region datasets, choose a region where to create the data scan.

    To browse the tables organized within Dataplex Universal Catalog lakes, click Browse within Dataplex Lakes.

  7. In the Scope field, choose Incremental or Entire data.

    • If you choose Incremental data, in the Timestamp column field, select a column of type DATE or TIMESTAMP from your BigQuery table that increases as new records are added, and that can be used to identify new records. For tables partitioned on a column of type DATE or TIMESTAMP, we recommend using the partition column as the timestamp field.
  8. Optional: To filter your data, do any of the following:

    • To filter by rows, click select the Filter rows checkbox. Enter a valid SQL expression that can be used in a WHERE clause in GoogleSQL syntax. For example: col1 >= 0.

      The filter can be a combination of SQL conditions over multiple columns. For example: col1 >= 0 AND col2 < 10.

    • To filter by columns, select the Filter columns checkbox.

      • To include columns in the profile scan, in the Include columns field, click Browse. Select the columns to include, and then click Select.

      • To exclude columns from the profile scan, in the Exclude columns field, click Browse. Select the columns to exclude, and then click Select.

  9. To apply sampling to your data profile scan, in the Sampling size list, select a sampling percentage. Choose a percentage value that ranges between 0.0% and 100.0% with up to 3 decimal digits.

    • For larger datasets, choose a lower sampling percentage. For example, for a 1 PB table, if you enter a value between 0.1% and 1.0%, the data profile samples between 1-10 TB of data.

    • There must be at least 100 records in the sampled data to return a result.

    • For incremental data scans, the data profile scan applies sampling to the latest increment.

  10. Optional: Publish the data profile scan results in the BigQuery and Dataplex Universal Catalog pages in the Google Cloud console for the source table. Select the Publish results to BigQuery and Dataplex Catalog checkbox.

    You can view the latest scan results in the Data profile tab in the BigQuery and Dataplex Universal Catalog pages for the source table. To enable users to access the published scan results, see the Grant access to data profile scan results section of this document.

    The publishing option might not be available in the following cases:

    • You don't have the required permissions on the table.
    • Another data quality scan is set to publish results.
  11. In the Schedule section, choose one of the following options:

    • Repeat: Run the data profile scan on a schedule: hourly, daily, weekly, monthly, or custom. Specify how often the scan should run and at what time. If you choose custom, use cron format to specify the schedule.

    • On-demand: Run the data profile scan on demand.

    • One-time: Run the data quality scan once now, and remove the scan after the time-to-live period.

    • Time to live: The time-to-live value defines the duration a data profile scan remains active after execution. A data profile scan without a specified time-to-live is automatically removed after 24 hours. The time-to-live can range from 0 seconds (immediate deletion) to 365 days.

  12. Click Continue.

  13. Optional: Export the scan results to a BigQuery standard table. In the Export scan results to BigQuery table section, do the following:

    1. In the Select BigQuery dataset field, click Browse. Select a BigQuery dataset to store the data profile scan results.

    2. In the BigQuery table field, specify the table to store the data profile scan results. If you're using an existing table, make sure that it is compatible with the export table schema. If the specified table doesn't exist, Dataplex Universal Catalog creates it for you.

  14. Optional: Add labels. Labels are key-value pairs that let you group related objects together or with other Google Cloud resources.

  15. To create the scan, click Create.

    If you set the schedule to on-demand, you can also run the scan now by clicking Run scan.

gcloud

To create a data profile scan, use the gcloud dataplex datascans create data-profile command.

If the source data is organized in a Dataplex Universal Catalog lake, include the --data-source-entity flag:

gcloud dataplex datascans create data-profile DATASCAN \
--location=LOCATION \
--data-source-entity=DATA_SOURCE_ENTITY

If the source data isn't organized in a Dataplex Universal Catalog lake, include the --data-source-resource flag:

gcloud dataplex datascans create data-profile DATASCAN \
--location=LOCATION \
--data-source-resource=DATA_SOURCE_RESOURCE

Replace the following variables:

  • DATASCAN: The name of the data profile scan.
  • LOCATION: The Google Cloud region in which to create the data profile scan.
  • DATA_SOURCE_ENTITY: The Dataplex Universal Catalog entity that contains the data for the data profile scan. For example, projects/test-project/locations/test-location/lakes/test-lake/zones/test-zone/entities/test-entity.
  • DATA_SOURCE_RESOURCE: The name of the resource that contains the data for the data profile scan. For example, //bigquery.googleapis.com/projects/test-project/datasets/test-dataset/tables/test-table.

C#

C#

Before trying this sample, follow the C# setup instructions in the Dataplex Universal Catalog quickstart using client libraries. For more information, see the Dataplex Universal Catalog C# API reference documentation.

To authenticate to Dataplex Universal Catalog, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.

using Google.Api.Gax.ResourceNames;
using Google.Cloud.Dataplex.V1;
using Google.LongRunning;

public sealed partial class GeneratedDataScanServiceClientSnippets
{
    /// <summary>Snippet for CreateDataScan</summary>
    /// <remarks>
    /// This snippet has been automatically generated and should be regarded as a code template only.
    /// It will require modifications to work:
    /// - It may require correct/in-range values for request initialization.
    /// - It may require specifying regional endpoints when creating the service client as shown in
    ///   https://cloud.google.com/dotnet/docs/reference/help/client-configuration#endpoint.
    /// </remarks>
    public void CreateDataScanRequestObject()
    {
        // Create client
        DataScanServiceClient dataScanServiceClient = DataScanServiceClient.Create();
        // Initialize request argument(s)
        CreateDataScanRequest request = new CreateDataScanRequest
        {
            ParentAsLocationName = LocationName.FromProjectLocation("[PROJECT]", "[LOCATION]"),
            DataScan = new DataScan(),
            DataScanId = "",
            ValidateOnly = false,
        };
        // Make the request
        Operation<DataScan, OperationMetadata> response = dataScanServiceClient.CreateDataScan(request);

        // Poll until the returned long-running operation is complete
        Operation<DataScan, OperationMetadata> completedResponse = response.PollUntilCompleted();
        // Retrieve the operation result
        DataScan result = completedResponse.Result;

        // Or get the name of the operation
        string operationName = response.Name;
        // This name can be stored, then the long-running operation retrieved later by name
        Operation<DataScan, OperationMetadata> retrievedResponse = dataScanServiceClient.PollOnceCreateDataScan(operationName);
        // Check if the retrieved long-running operation has completed
        if (retrievedResponse.IsCompleted)
        {
            // If it has completed, then access the result
            DataScan retrievedResult = retrievedResponse.Result;
        }
    }
}

Go

Go

Before trying this sample, follow the Go setup instructions in the Dataplex Universal Catalog quickstart using client libraries. For more information, see the Dataplex Universal Catalog Go API reference documentation.

To authenticate to Dataplex Universal Catalog, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.


package main

import (
	"context"

	dataplex "cloud.google.com/go/dataplex/apiv1"
	dataplexpb "cloud.google.com/go/dataplex/apiv1/dataplexpb"
)

func main() {
	ctx := context.Background()
	// This snippet has been automatically generated and should be regarded as a code template only.
	// It will require modifications to work:
	// - It may require correct/in-range values for request initialization.
	// - It may require specifying regional endpoints when creating the service client as shown in:
	//   https://pkg.go.dev/cloud.google.com/go#hdr-Client_Options
	c, err := dataplex.NewDataScanClient(ctx)
	if err != nil {
		// TODO: Handle error.
	}
	defer c.Close()

	req := &dataplexpb.CreateDataScanRequest{
		// TODO: Fill request struct fields.
		// See https://pkg.go.dev/cloud.google.com/go/dataplex/apiv1/dataplexpb#CreateDataScanRequest.
	}
	op, err := c.CreateDataScan(ctx, req)
	if err != nil {
		// TODO: Handle error.
	}

	resp, err := op.Wait(ctx)
	if err != nil {
		// TODO: Handle error.
	}
	// TODO: Use resp.
	_ = resp
}

Java

Java

Before trying this sample, follow the Java setup instructions in the Dataplex Universal Catalog quickstart using client libraries. For more information, see the Dataplex Universal Catalog Java API reference documentation.

To authenticate to Dataplex Universal Catalog, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.

import com.google.cloud.dataplex.v1.CreateDataScanRequest;
import com.google.cloud.dataplex.v1.DataScan;
import com.google.cloud.dataplex.v1.DataScanServiceClient;
import com.google.cloud.dataplex.v1.LocationName;

public class SyncCreateDataScan {

  public static void main(String[] args) throws Exception {
    syncCreateDataScan();
  }

  public static void syncCreateDataScan() throws Exception {
    // This snippet has been automatically generated and should be regarded as a code template only.
    // It will require modifications to work:
    // - It may require correct/in-range values for request initialization.
    // - It may require specifying regional endpoints when creating the service client as shown in
    // https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_library
    try (DataScanServiceClient dataScanServiceClient = DataScanServiceClient.create()) {
      CreateDataScanRequest request =
          CreateDataScanRequest.newBuilder()
              .setParent(LocationName.of("[PROJECT]", "[LOCATION]").toString())
              .setDataScan(DataScan.newBuilder().build())
              .setDataScanId("dataScanId1260787906")
              .setValidateOnly(true)
              .build();
      DataScan response = dataScanServiceClient.createDataScanAsync(request).get();
    }
  }
}

Python

Python

Before trying this sample, follow the Python setup instructions in the Dataplex Universal Catalog quickstart using client libraries. For more information, see the Dataplex Universal Catalog Python API reference documentation.

To authenticate to Dataplex Universal Catalog, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.

# This snippet has been automatically generated and should be regarded as a
# code template only.
# It will require modifications to work:
# - It may require correct/in-range values for request initialization.
# - It may require specifying regional endpoints when creating the service
#   client as shown in:
#   https://googleapis.dev/python/google-api-core/latest/client_options.html
from google.cloud import dataplex_v1


def sample_create_data_scan():
    # Create a client
    client = dataplex_v1.DataScanServiceClient()

    # Initialize request argument(s)
    data_scan = dataplex_v1.DataScan()
    data_scan.data_quality_spec.rules.dimension = "dimension_value"
    data_scan.data.entity = "entity_value"

    request = dataplex_v1.CreateDataScanRequest(
        parent="parent_value",
        data_scan=data_scan,
        data_scan_id="data_scan_id_value",
    )

    # Make the request
    operation = client.create_data_scan(request=request)

    print("Waiting for operation to complete...")

    response = operation.result()

    # Handle the response
    print(response)

Ruby

Ruby

Before trying this sample, follow the Ruby setup instructions in the Dataplex Universal Catalog quickstart using client libraries. For more information, see the Dataplex Universal Catalog Ruby API reference documentation.

To authenticate to Dataplex Universal Catalog, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.

require "google/cloud/dataplex/v1"

##
# Snippet for the create_data_scan call in the DataScanService service
#
# This snippet has been automatically generated and should be regarded as a code
# template only. It will require modifications to work:
# - It may require correct/in-range values for request initialization.
# - It may require specifying regional endpoints when creating the service
# client as shown in https://cloud.google.com/ruby/docs/reference.
#
# This is an auto-generated example demonstrating basic usage of
# Google::Cloud::Dataplex::V1::DataScanService::Client#create_data_scan.
#
def create_data_scan
  # Create a client object. The client can be reused for multiple calls.
  client = Google::Cloud::Dataplex::V1::DataScanService::Client.new

  # Create a request. To set request fields, pass in keyword arguments.
  request = Google::Cloud::Dataplex::V1::CreateDataScanRequest.new

  # Call the create_data_scan method.
  result = client.create_data_scan request

  # The returned object is of type Gapic::Operation. You can use it to
  # check the status of an operation, cancel it, or wait for results.
  # Here is how to wait for a response.
  result.wait_until_done! timeout: 60
  if result.response?
    p result.response
  else
    puts "No response received."
  end
end

REST

To create a data profile scan, use the dataScans.create method.

Export table schema

If you want to export the data profile scan results to an existing BigQuery table, make sure that it is compatible with the following table schema:

Column name Column data type Sub field name (if applicable) Sub field data type Mode Example
data_profile_scan struct/record resource_name string nullable //dataplex.googleapis.com/projects/test-project/locations/europe-west2/datascans/test-datascan
project_id string nullable test-project
location string nullable us-central1
data_scan_id string nullable test-datascan
data_source struct/record resource_name string nullable

Entity case: //dataplex.googleapis.com/projects/test-project/locations/europe-west2/lakes/test-lake/zones/test-zone/entities/test-entity

Table case: //bigquery.googleapis.com/projects/test-project/datasets/test-dataset/tables/test-table

dataplex_entity_project_id string nullable test-project
dataplex_entity_project_number integer nullable 123456789012
dataplex_lake_id string nullable

(Valid only if source is entity)

test-lake

dataplex_zone_id string nullable

(Valid only if source is entity)

test-zone

dataplex_entity_id string nullable

(Valid only if source is entity)

test-entity

table_project_id string nullable dataplex-table
table_project_number int64 nullable 345678901234
dataset_id string nullable

(Valid only if source is table)

test-dataset

table_id string nullable

(Valid only if source is table)

test-table

data_profile_job_id string nullable caeba234-cfde-4fca-9e5b-fe02a9812e38
data_profile_job_configuration json trigger string nullable ondemand/schedule
incremental boolean nullable true/false
sampling_percent float nullable

(0-100)

20.0 (indicates 20%)

row_filter string nullable col1 >= 0 AND col2 < 10
column_filter json nullable {"include_fields":["col1","col2"], "exclude_fields":["col3"]}
job_labels json nullable {"key1":value1}
job_start_time timestamp nullable 2023-01-01 00:00:00 UTC
job_end_time timestamp nullable 2023-01-01 00:00:00 UTC
job_rows_scanned integer nullable 7500
column_name string nullable column-1
column_type string nullable string
column_mode string nullable repeated
percent_null float nullable

(0.0-100.0)

20.0 (indicates 20%)

percent_unique float nullable

(0.0-100.0)

92.5

min_string_length integer nullable

(Valid only if column type is string)

10

max_string_length integer nullable

(Valid only if column type is string)

4

average_string_length float nullable

(Valid only if column type is string)

7.2

min_value float nullable (Valid only if column type is numeric - integer/float)
max_value float nullable (Valid only if column type is numeric - integer/float)
average_value float nullable (Valid only if column type is numeric - integer/float)
standard_deviation float nullable (Valid only if column type is numeric - integer/float)
quartile_lower integer nullable (Valid only if column type is numeric - integer/float)
quartile_median integer nullable (Valid only if column type is numeric - integer/float)
quartile_upper integer nullable (Valid only if column type is numeric - integer/float)
top_n struct/record - repeated value string nullable "4009"
count integer nullable 20
percent float nullable 10 (indicates 10%)

Export table setup

When you export to BigQueryExport tables, follow these guidelines:

  • For the field resultsTable, use the format: //bigquery.googleapis.com/projects/{project-id}/datasets/{dataset-id}/tables/{table-id}.
  • Use a BigQuery standard table.
  • If the table doesn't exist when the scan is created or updated, Dataplex Universal Catalog creates the table for you.
  • By default, the table is partitioned on the job_start_time column daily.
  • If you want the table to be partitioned in other configurations or if you don't want the partition, then recreate the table with the required schema and configurations and then provide the pre-created table as the results table.
  • Make sure the results table is in the same location as the source table.
  • If VPC-SC is configured on the project, then the results table must be in the same VPC-SC perimeter as the source table.
  • If the table is modified during the scan execution stage, then the current running job exports to the previous results table and the table change takes effect from the next scan job.
  • Don't modify the table schema. If you need customized columns, create a view upon the table.
  • To reduce costs, set an expiration on the partition based on your use case. For more information, see how to set the partition expiration.

Create multiple data profile scans

You can configure data profile scans for multiple tables in a BigQuery dataset at the same time by using the Google Cloud console.

  1. In the Google Cloud console, go to the Dataplex Universal Catalog Data profiling & quality page.

    Go to Data profiling & quality

  2. Click Create data profile scan.

  3. Select the Multiple data profile scans option.

  4. Enter an ID prefix. Dataplex Universal Catalog automatically generates scan IDs by using the provided prefix and unique suffixes.

  5. Enter a Description for all of the data profile scans.

  6. In the Dataset field, click Browse. Select a dataset to pick tables from. Click Select.

  7. If the dataset is multi-regional, select a Region in which to create the data profile scans.

  8. Configure the common settings for the scans:

    1. In the Scope field, choose Incremental or Entire data.

    2. To apply sampling to the data profile scans, in the Sampling size list, select a sampling percentage.

      Choose a percentage value between 0.0% and 100.0% with up to 3 decimal digits.

    3. Optional: Publish the data profile scan results in the BigQuery and Dataplex Universal Catalog pages in the Google Cloud console for the source table. Select the Publish results to BigQuery and Dataplex Catalog checkbox.

      You can view the latest scan results in the Data profile tab in the BigQuery and Dataplex Universal Catalog pages for the source table. To enable users to access the published scan results, see the Grant access to data profile scan results section of this document.

    4. In the Schedule section, choose one of the following options:

      • Repeat: Run the data profile scans on a schedule: hourly, daily, weekly, monthly, or custom. Specify how often the scans should run and at what time. If you choose custom, use cron format to specify the schedule.

      • On-demand: Run the data profile scans on demand.

  9. Click Continue.

  10. In the Choose tables field, click Browse. Choose one or more tables to scan, and then click Select.

  11. Click Continue.

  12. Optional: Export the scan results to a BigQuery standard table. In the Export scan results to BigQuery table section, do the following:

    1. In the Select BigQuery dataset field, click Browse. Select a BigQuery dataset to store the data profile scan results.

    2. In the BigQuery table field, specify the table to store the data profile scan results. If you're using an existing table, make sure that it is compatible with the export table schema. If the specified table doesn't exist, Dataplex Universal Catalog creates it for you.

      Dataplex Universal Catalog uses the same results table for all of the data profile scans.

  13. Optional: Add labels. Labels are key-value pairs that let you group related objects together or with other Google Cloud resources.

  14. To create the scans, click Create.

    If you set the schedule to on-demand, you can also run the scans now by clicking Run scan.

Run a data profile scan

Console

  1. In the Google Cloud console, go to the Dataplex Universal Catalog Data profiling & quality page.

Go to Data profiling & quality

  • Click the data profile scan to run.
  • Click Run now.