Manage pipelines

This document describes how to manage BigQuery pipelines, including how to schedule and delete pipelines.

This document also describes how to view and manage pipeline metadata in Dataplex.

Pipelines are powered by Dataform.

Before you begin

  1. Create a BigQuery pipeline.
  2. To manage pipeline metadata in Dataplex, ensure that the Dataplex API is enabled in your Google Cloud project.

Required roles

To get the permissions that you need to manage pipelines, ask your administrator to grant you the following IAM roles:

  • To delete pipelines: Dataform Admin (roles/dataform.Admin) on the pipeline
  • To view and run pipelines: Dataform Viewer (roles/dataform.Viewer) on the project

For more information about granting roles, see Manage access to projects, folders, and organizations.

You might also be able to get the required permissions through custom roles or other predefined roles.

To manage pipeline metadata in Dataplex, ensure that you have the required Dataplex roles

For more information about Dataform IAM, see Control access with IAM.

View all pipelines

To view a list of all pipelines in your project, do the following:

  1. In the Google Cloud console, go to the BigQuery page.

    Go to BigQuery

  2. In the Explorer pane, click expand Pipelines.

View past manual runs

To view past manual runs of a selected pipeline, follow these steps:

  1. In the Google Cloud console, go to the BigQuery page.

    Go to BigQuery

  2. In the Explorer pane, expand your project and the Pipelines folder, and then select a pipeline.

  3. Click Executions.

  4. Optional: To refresh the list of past runs, click Refresh.

Configure alerts for failed pipeline runs

Each pipeline has a corresponding Dataform repository ID. Each BigQuery pipeline run is logged in Cloud Logging using the corresponding Dataform repository ID. You can use Cloud Monitoring to observe trends in Cloud Logging logs for BigQuery pipeline runs and to notify you when conditions you describe occur.

To receive alerts when a BigQuery pipeline run fails, you can create a log-based alerting policy for the corresponding Dataform repository ID. For instructions, see Configure alerts for failed workflow invocations.

To find the Dataform repository ID of your pipeline, do the following:

  1. In the Google Cloud console, go to the BigQuery page.

    Go to BigQuery

  2. In the Explorer pane, expand your project and the Pipelines folder, and then select a pipeline.

  3. Click Settings.

    The Dataform repository ID of your pipeline is displayed at the bottom of the Settings tab.

Delete a pipeline

To permanently delete a pipeline, follow these steps:

  1. In the Google Cloud console, go to the BigQuery page.

    Go to BigQuery

  2. In the Explorer pane, expand your project and the Pipelines folder. Find the pipeline that you want to delete.

  3. Click View actions next to the pipeline, and then click Delete.

  4. Click Delete.

Manage metadata in Dataplex

Dataplex lets you store and manage metadata for pipelines. Pipelines are available in Dataplex by default, without additional configuration.

You can use Dataplex to manage pipelines in all pipeline locations. Managing pipelines in Dataplex is subject to Dataplex quotas and limits and Dataplex pricing.

Dataplex automatically retrieves the following metadata from pipelines:

  • Data asset name
  • Data asset parent
  • Data asset location
  • Data asset type
  • Corresponding Google Cloud project

Dataplex logs pipelines as entries with the following entry values:

System entry group
The system entry group for pipelines is @dataform. To view details of pipeline entries in Dataplex, you need to view the dataform system entry group. For instructions about how to view a list of all entries in an entry group, see View details of an entry group in the Dataplex documentation.
System entry type
The system entry type for pipelines is dataform-code-asset. To view details of pipelines,you need to view the dataform-code-asset system entry type, filter the results with an aspect-based filter, and set the type field inside dataform-code-asset aspect to WORKFLOW. Then, select an entry of the selected pipeline. For instructions about how to view details of a selected entry type, see View details of an entry type in the Dataplex documentation. For instructions about how to view details of a selected entry, see View details of an entry in the Dataplex documentation.
System aspect type
The system aspect type for pipelines is dataform-code-asset. To provide additional context to pipelines in Dataplex by annotating data pipeline entries with aspects, view the dataform-code-asset aspect type, filter the results with an aspect-based filter, and set the type field inside dataform-code-asset aspect to WORKFLOW. For instructions about how to annotate entries with aspects, see Manage aspects and enrich metadata in the Dataplex documentation.
Type
The type for data canvases is WORKFLOW. This type lets you filter pipelines in the dataform-code-asset system entry type and the dataform-code-asset aspect type by using the aspect:dataplex-types.global.dataform-code-asset.type=WORKFLOW query in an aspect-based filter.

For instructions about how to search for assets in Dataplex, see Search for data assets in Dataplex in the Dataplex documentation.

What's next