Manage pipelines
This document describes how to manage BigQuery pipelines, including how to schedule and delete pipelines.
This document also describes how to view and manage pipeline metadata in Dataplex.
Pipelines are powered by Dataform.
Before you begin
- Create a BigQuery pipeline.
- To manage pipeline metadata in Dataplex, ensure that the Dataplex API is enabled in your Google Cloud project.
Required roles
To get the permissions that you need to manage pipelines, ask your administrator to grant you the following IAM roles:
-
To delete pipelines:
Dataform Admin (
roles/dataform.Admin
) on the pipeline -
To view and run pipelines:
Dataform Viewer (
roles/dataform.Viewer
) on the project
For more information about granting roles, see Manage access to projects, folders, and organizations.
You might also be able to get the required permissions through custom roles or other predefined roles.
To manage pipeline metadata in Dataplex, ensure that you have the required Dataplex roles
For more information about Dataform IAM, see Control access with IAM.
View all pipelines
To view a list of all pipelines in your project, do the following:
In the Google Cloud console, go to the BigQuery page.
In the Explorer pane, click
expand Pipelines.
View past manual runs
To view past manual runs of a selected pipeline, follow these steps:
In the Google Cloud console, go to the BigQuery page.
In the Explorer pane, expand your project and the Pipelines folder, and then select a pipeline.
Click Executions.
Optional: To refresh the list of past runs, click Refresh.
Configure alerts for failed pipeline runs
Each pipeline has a corresponding Dataform repository ID. Each BigQuery pipeline run is logged in Cloud Logging using the corresponding Dataform repository ID. You can use Cloud Monitoring to observe trends in Cloud Logging logs for BigQuery pipeline runs and to notify you when conditions you describe occur.
To receive alerts when a BigQuery pipeline run fails, you can create a log-based alerting policy for the corresponding Dataform repository ID. For instructions, see Configure alerts for failed workflow invocations.
To find the Dataform repository ID of your pipeline, do the following:
In the Google Cloud console, go to the BigQuery page.
In the Explorer pane, expand your project and the Pipelines folder, and then select a pipeline.
Click Settings.
The Dataform repository ID of your pipeline is displayed at the bottom of the Settings tab.
Delete a pipeline
To permanently delete a pipeline, follow these steps:
In the Google Cloud console, go to the BigQuery page.
In the Explorer pane, expand your project and the Pipelines folder. Find the pipeline that you want to delete.
Click
View actions next to the pipeline, and then click Delete.Click Delete.
Manage metadata in Dataplex
Dataplex lets you store and manage metadata for pipelines. Pipelines are available in Dataplex by default, without additional configuration.
You can use Dataplex to manage pipelines in all pipeline locations. Managing pipelines in Dataplex is subject to Dataplex quotas and limits and Dataplex pricing.
Dataplex automatically retrieves the following metadata from pipelines:
- Data asset name
- Data asset parent
- Data asset location
- Data asset type
- Corresponding Google Cloud project
Dataplex logs pipelines as entries with the following entry values:
- System entry group
- The system entry group
for pipelines is
@dataform
. To view details of pipeline entries in Dataplex, you need to view thedataform
system entry group. For instructions about how to view a list of all entries in an entry group, see View details of an entry group in the Dataplex documentation. - System entry type
- The system entry type
for pipelines is
dataform-code-asset
. To view details of pipelines,you need to view thedataform-code-asset
system entry type, filter the results with an aspect-based filter, and set thetype
field insidedataform-code-asset
aspect toWORKFLOW
. Then, select an entry of the selected pipeline. For instructions about how to view details of a selected entry type, see View details of an entry type in the Dataplex documentation. For instructions about how to view details of a selected entry, see View details of an entry in the Dataplex documentation. - System aspect type
- The system aspect type
for pipelines is
dataform-code-asset
. To provide additional context to pipelines in Dataplex by annotating data pipeline entries with aspects, view thedataform-code-asset
aspect type, filter the results with an aspect-based filter, and set thetype
field insidedataform-code-asset
aspect toWORKFLOW
. For instructions about how to annotate entries with aspects, see Manage aspects and enrich metadata in the Dataplex documentation. - Type
- The type for data canvases is
WORKFLOW
. This type lets you filter pipelines in thedataform-code-asset
system entry type and thedataform-code-asset
aspect type by using theaspect:dataplex-types.global.dataform-code-asset.type=WORKFLOW
query in an aspect-based filter.
For instructions about how to search for assets in Dataplex, see Search for data assets in Dataplex in the Dataplex documentation.
What's next
- Learn more about BigQuery pipelines.
- Learn how to create pipelines.
- Learn how to schedule pipelines.