This page lists and describes all metrics that are gathered in data profiles.
There are three types of data profiles—project data profiles, table data profiles, and column data profiles.
Project data profiles
Each project data profile has the following fields. The values for these fields are aggregated based on the resources profiled within the project.
Insights
Project data profiles provide the following insights:
- Data risk
- Level of risk associated with the data at its current state. For more information, see Sensitivity and data risk levels.
- Sensitivity
- Score indicating the sensitivity level for this project. For more information, see Sensitivity and data risk levels.
Metadata
Project data profiles provide the following metadata:
- Last profile generated
- Date and time the profile was last generated.
- Project ID
- ID of the project that was profiled.
- Resource name
- Fully qualified name of the data profile.
- Status
- Icon that indicates the status of the profiling operation.
Table data profiles
Each table data profile has the following fields:
Insights
Table data profiles provide the following insights:
- Data risk
- Level of risk associated with the data at its current state. For more information, see Sensitivity and data risk levels.
- Sensitivity
- Score indicating the sensitivity level for this table. For more information, see Sensitivity and data risk levels.
Metadata
Table data profiles provide the following metadata:
- Database
- The database containing the table that was profiled. This field applies only to Cloud SQL discovery.
- Dataset ID
- ID of the dataset that contains this table.
- Encryption
- Whether encryption for this table is managed by Google or by your organization.
- Expiration time
- Optional. The time when this table expires.
- Failed column count
- The number of columns skipped in this table because of an error.
- Inspect config snapshot
- Snapshot of the inspection template that was used when the profile was generated. For more information, see Data profile snapshots.
- Instance
- The instance containing the table that was profiled. This field applies only to Cloud SQL discovery.
- Last profile generated
- Date and time the profile was last generated.
- Latest update in BigQuery
- Date and time this table was last modified.
- Project ID
- ID of the project that contains this table.
- Public
Whether this table is available to all users or restricted to certain users.
- Resource labels
Labels that the table had at the time the profile was generated.
- Resource tags
Tags that the table had at the time the profile was generated.
- Resource name
Fully qualified name of the data profile.
- Row count
Number of rows in this table when the profile was generated.
- Scanned column count
The number of columns profiled in this table.
- Service account
Number of service accounts with IAM permissions to access this table.
- Status
Indication of whether the profile succeeded in generation.
- Table ID
ID of this table.
- Table creation time
Date and time the table was created.
- Table size
The size of this table when the profile was generated.
- Type
The type of discovery performed.
Column data profiles
Each column data profile has the following fields:
Insights
Column data profiles provide the following insights:
- Data risk
- Level of risk associated with the data at its current state. For more information, see Sensitivity and data risk levels.
- Sensitivity
- Score indicating the sensitivity level for this column. For more information, see Sensitivity and data risk levels.
- Predicted infoType
If a single built-in or custom infoType clearly predominates over others in the column, Sensitive Data Protection sets this field to that infoType. Otherwise, this field has no value.
To view a list of all infoTypes detected in the column, refer to the Other infoTypes field.
Sensitive Data Protection scans for only the infoTypes that you specified in the inspection template. Thus, only those infoTypes can appear in the Predicted infoType field. For example, if the column has email addresses, but you didn't include the
EMAIL_ADDRESS
infoType detector in your inspection template, then this field doesn't containEMAIL_ADDRESS
.If the column data predominantly matches several closely related infoTypes that belong to the same general category, Sensitive Data Protection sets this field to the more general infoType. For example, if the column predominantly has a mix of
PASSPORT
,AUSTRALIA_PASSPORT
, andCANADA_PASSPORT
infoTypes, the Predicted infoType field is set toPASSPORT
. The Other infoTypes field shows the more specific infoTypes and their estimated prevalence.- Other infoTypes
InfoTypes detected in the column that don't have a strong enough signal to be considered that column's predicted infoType. In this document, see Predicted infoType.
For data profiles generated after October 13, 2022, each infoType listed in this field has an estimated prevalence. The estimated prevalence is an approximate percentage of non-null rows in which the infoType was detected.
For example, suppose you have a column that has the following metrics:
- Predicted infoType:
FDA_CODE
- Other infoTypes:
PERSON_NAME (2%)
,STREET_ADDRESS (1%)
In this example, there is a strong indication that the column contains FDA codes. Sensitive Data Protection also determined that approximately 2% of non-null rows in the column might contain person names and 1% might contain street addresses.
Sensitive Data Protection scans for only the infoTypes that you specified in the inspection template. Thus, only those infoTypes can appear in the Other infoTypes field. For example, if the column has email addresses, but you didn't include the
EMAIL_ADDRESS
infoType detector in your inspection template, then this field doesn't containEMAIL_ADDRESS
.- Predicted infoType:
- Estimated null proportion
Approximate proportion of null values in this column, categorized as high, medium, low, or very low. This value is high if a large proportion of entries in this column is null.
- Estimated uniqueness
An estimate of how much of the data in this column is unique, categorized as high, medium, or low. A high uniqueness level suggests that the column contains distinct values. A high presence of unique values can indicate that the column contains identifiers.
A low uniqueness level suggests that the column contains many common values such as enums or boolean values.
If Sensitive Data Protection determines that there aren't enough rows in the table for it to calculate this metric, this value is blank.
- Free text score
The probability that this column contains freeform text. A value close to 1 indicates the column is likely to contain freeform or natural-language text. Possible values range from 0 through 1.
A high free text score can increase a column's data risk and sensitivity levels.
Metadata
Column data profiles provide the following metadata:
- Database
- The database containing the table column that was profiled. This field applies only to Cloud SQL discovery.
- Data type
- The data type of the contents of this column.
- Dataset ID
- ID of the dataset that contains this table column.
- Field ID
- Name of the column.
- Instance
- The instance containing the table column that was profiled. This field applies only to Cloud SQL discovery.
- Instance location
- Location of the instance containing the table column that was profiled. This field applies only to Cloud SQL discovery.
- Last profile generated
- Date and time the profile was last generated.
- Policy tags
- Indicates if a policy tag is applied to the column. For information on best practices for using policy tags, see Using policy tags in BigQuery.
- Project ID
- ID of the project that contains this table column.
- Resource name
- Fully qualified name of the data profile.
- Status
- Icon that indicates the status of the profiling operation.
- Table ID
- ID of the table that contains this column.
File store data profiles
Sensitive Data Protection uses the term file store to refer to a file storage bucket.
Each file store data profile has the following fields.
Insights
File store data profiles provide the following insights:
- Data risk
- Level of risk associated with the data at its current state. For more information, see Sensitivity and data risk levels.
- File clusters
- Provides a summary for each file cluster that was detected when this file store was profiled. For more information about each summary, see File cluster summaries on this page.
- Sensitivity
- Score indicating the sensitivity level for this file store. For more information, see Sensitivity and data risk levels.
Metadata
File store data profiles provide the following metadata:
- Data storage locations
If you profiled a dual-region Cloud Storage bucket, then this field lists the two regions.
If you profiled an Amazon S3 bucket, then this value is the AWS region where the bucket is stored.
- Encryption
Whether encryption for this file store is managed by Google or by your organization.
- File store type
The type of data that was profiled—Cloud Storage data or Amazon S3 data.
- File store path
The name of the file store.
- Inspect config snapshot
Snapshot of the inspection template that was used when the profile was generated. For more information, see Data profile snapshots.
- Location type
Type of location where the file store is stored:
region
,dual-region
, ormulti-region
.- Profile first created
Date and time the profile was created for the first time.
- Profile last generated
Date and time the profile was last generated.
- Parent ID
The resource that owns the data that was profiled.
- If the data profile is for a Google Cloud resource, then this is the ID of the project that contains the data.
- If the data profile is for an Amazon S3 bucket, then this is the ID of the AWS account that contains the bucket.
- Public
Whether this file store is available to all users or restricted to certain users.
- Resource labels
Labels that the file store had at the time the profile was generated.
- Resource tags
Tags that the file store had at the time the profile was generated.
- Resource location
Region or multi-region that contains the file store.
If you profiled a dual-region Cloud Storage bucket, then this value depends on whether the bucket is stored in a predefined dual region:
- For predefined dual regions, Sensitive Data Protection sets this value to the predefined dual-region name.
- For standard dual regions, Sensitive Data Protection sets this value to the multi-region that contains the dual regions. For information about how regions map to multi-regions, see Dual regions.
- Resource name
Fully qualified name of the data profile.
- Status
Indication of whether the profile succeeded in generation.
File cluster summaries
When a file store data profile is generated, the files are grouped into file clusters. Sensitive Data Protection provides a summary for each file cluster.
Each file cluster summary has the following fields:
- Data risk
- Level of risk associated with the data in this file cluster. For more information, see Sensitivity and data risk levels.
- Errors
- Any errors detected when the file store data profile was generated.
- File extensions scanned
- List of file types detected and scanned to generate the file store data profile.
- File extensions seen
- List of file types detected but not necessarily scanned.
- InfoTypes
- List of built-in and custom infoTypes that were detected in this file cluster.
- Sensitivity
- Score indicating the sensitivity level for this file cluster. For more information, see Sensitivity and data risk levels.
- Type
- Indicates the category of files in this cluster. For more information about
all supported file clusters, see File
clusters.