The ML.PROCESS_DOCUMENT function
This document describes the ML.PROCESS_DOCUMENT
function, which lets you
process unstructured documents from an
object table by using the
Document AI API.
Syntax
ML.PROCESS_DOCUMENT( MODEL `project_id.dataset.model_name`, TABLE `project_id.dataset.object_table` )
Arguments
ML.PROCESS_DOCUMENT
takes the following arguments:
project_id
: Your project ID.dataset
: The BigQuery dataset that contains the model.model
: The name of a remote model with aREMOTE_SERVICE_TYPE
ofCLOUD_AI_DOCUMENT_V1
.object_table
: The name of the object table that contains URIs of the documents.The documents in the object table must be of a supported type. An error is returned for any row that contains a document of an unsupported type.
Output
ML.PROCESS_DOCUMENT
returns the following columns:
ml_process_document_result
: aJSON
value that contains the entities returned by the Document AI API.ml_process_document_status
: aSTRING
value that contains the API response status for the corresponding row. This value is empty if the operation was successful.- The fields returned by the processor specified in the model.
- The object table columns.
Quotas
See Cloud AI service functions quotas and limits.
Known issues
Sometimes after a query job that uses this function finishes successfully, some returned rows contain the following error message:
A retryable error occurred: RESOURCE EXHAUSTED error from <remote endpoint>
This issue occurs because BigQuery query jobs finish successfully
even if the function fails for some of the rows. The function fails when the
volume of API calls to the remote endpoint exceeds the quota limits for that
service. This issue occurs most often when you are running multiple parallel
batch queries. BigQuery retries these calls, but if the retries
fail, the resource exhausted
error message is returned.
To iterate through inference calls until all rows are successfully processed, you can use the BigQuery remote inference SQL scripts or the BigQuery remote inference pipeline Dataform package.
Locations
ML.PROCESS_DOCUMENT
must run in the same region as the remote model that the
function references. You can only create models based on
Document AI in the US
and EU
multi-regions.
Limitations
The function can't process documents with more than 15 pages. Any row that contains such a file returns an error.
Example
The following example uses the
invoice parser
to process the documents represented by the documents
table.
Create the model:
# Create model CREATE OR REPLACE MODEL `myproject.mydataset.invoice_parser` REMOTE WITH CONNECTION `myproject.myregion.myconnection` OPTIONS (remote_service_type = 'cloud_ai_document_v1', document_processor='projects/project_number/locations/processor_location/processors/processor_id/processorVersions/version_id');
Process the documents:
SELECT * FROM ML.PROCESS_DOCUMENT( MODEL `myproject.mydataset.invoice_parser`, TABLE `myproject.mydataset.documents` );
The result is similar to the following:
ml_process_document_result | ml_process_document_status | invoice_type | currency | ... |
---|---|---|---|---|
{"entities":[{"confidence":1,"id":"0","mentionText":"10 105,93 10,59","pageAnchor":{"pageRefs":[{"boundingPoly":{"normalizedVertices":[{"x":0.40452111,"y":0.67199326},{"x":0.74776918,"y":0.67199326},{"x":0.74776918,"y":0.68208581},{"x":0.40452111,"y":0.68208581}]}}]},"properties":[{"confidence":0.66... | USD |
What's next
- Get step-by-step instructions on how to
process documents
using the
ML.PROCESS_DOCUMENT
function. - To learn more about model inference, including other functions that you can use to analyze BigQuery data, see Model inference overview.
- For information about the supported SQL statements and functions for each model type, see End-to-end user journey for each model.