A regular expression (regex) custom infoType detector allows you to create your
own detectors that enable Sensitive Data Protection to detect matches based
on a regex pattern. For example, suppose that you had medical record numbers in
the form ###-#-#####
. You could define a regex pattern such as the following:
[0-9]{3}-[0-9]{1}-[0-9]{5}
Sensitive Data Protection would then match items like the following:
012-4-56789
Anatomy of a regex custom infoType detector
As summarized in
API Overview, to create a
custom regex infoType detector, you define a
CustomInfoType
object that contains the following:
- The name you want to give the custom infoType detector, within in an
InfoType
object. - An optional
Likelihood
value. If you omit this field, regex matches will return a default likelihood ofVERY_LIKELY
. If you notice a regex custom infoType detector returning too many false positives, try reducing the base likelihood and using detection rules to boost the likelihood using contextual information. To learn more, see Customizing finding likelihood. - Optional
DetectionRule
s, or hotword rules. These rules adjust the likelihood of findings within a given proximity of specified hotwords. Learn more about hotword rules in Customizing finding likelihood. An optional
SensitivityScore
value. If you omit this field, matches to the regular expression will return a default sensitivity level ofHIGH
.Sensitivity scores are used in data profiles. When profiling your data, Sensitive Data Protection uses the sensitivity scores of the infoTypes to calculate the sensitivity level.
A
Regex
object consisting of a single pattern defining the regular expression.
As a JSON object, a regex custom infoType detector that includes all optional components looks like this:
{
"customInfoTypes":[
{
"infoType":{
"name":"CUSTOM_INFOTYPE_NAME"
},
"likelihood":"LIKELIHOOD_LEVEL",
"detectionRules":[
{
"hotwordRule":{
HOTWORD_RULE
}
},
"sensitivityScore":{
"score": "SENSITIVITY_SCORE"
},
],
"regex":{
"pattern":"REGULAR_EXPRESSION_PATTERN"
}
}
],
...
}
Regex example: Match medical record numbers
The following JSON snippet and code in several languages below show
a regular expression custom infoType detector that instructs
Sensitive Data Protection to match a medical record number
(MRN) in the input text "Patient's MRN 444-5-22222," and assign each match a
likelihood of POSSIBLE
.
C#
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Go
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Java
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Node.js
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
PHP
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Python
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
REST
See the JSON quickstart for more information about using the DLP API with JSON.
JSON Input:
POST https://dlp.googleapis.com/v2/projects/[PROJECT_ID]/content:inspect?key={YOUR_API_KEY}
{
"item":{
"value":"Patients MRN 444-5-22222"
},
"inspectConfig":{
"customInfoTypes":[
{
"infoType":{
"name":"C_MRN"
},
"regex":{
"pattern":"[1-9]{3}-[1-9]{1}-[1-9]{5}"
},
"likelihood":"POSSIBLE"
}
]
}
}
JSON Output:
{
"result":{
"findings":[
{
"infoType":{
"name":"C_MRN"
},
"likelihood":"POSSIBLE",
"location":{
"byteRange":{
"start":"13",
"end":"24"
},
"codepointRange":{
"start":"13",
"end":"24"
}
},
"createTime":"2018-11-30T01:29:37.799Z"
}
]
}
}
The output shows that, using the custom infoType detector we gave the name
C_MRN
and its custom regex, Sensitive Data Protection has correctly
identified the medical record number and assigned it a certainty of POSSIBLE
,
as we specified.
Customizing match likelihood builds on this example to include context words.