Skip to content
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.

Commit 70015b7

Browse files
authoredApr 22, 2024
docs: set index_cols in read_gbq as a best practice (#624)
1 parent d924ec2 commit 70015b7

File tree

1 file changed

+11
-7
lines changed
  • third_party/bigframes_vendored/pandas/io

1 file changed

+11
-7
lines changed
 

‎third_party/bigframes_vendored/pandas/io/gbq.py

+11-7
Original file line numberDiff line numberDiff line change
@@ -27,13 +27,17 @@ def read_gbq(
2727
):
2828
"""Loads a DataFrame from BigQuery.
2929
30-
BigQuery tables are an unordered, unindexed data source. By default,
31-
the DataFrame will have an arbitrary index and ordering.
32-
33-
Set the `index_col` argument to one or more columns to choose an
34-
index. The resulting DataFrame is sorted by the index columns. For the
35-
best performance, ensure the index columns don't contain duplicate
36-
values.
30+
BigQuery tables are an unordered, unindexed data source. To add support
31+
pandas-compatibility, the following indexing options are supported:
32+
33+
* (Default behavior) Add an arbitrary sequential index and ordering
34+
using an an analytic windowed operation that prevents filtering
35+
push down.
36+
* (Recommended) Set the ``index_col`` argument to one or more columns.
37+
Unique values for the row labels are recommended. Duplicate labels
38+
are possible, but note that joins on a non-unique index can duplicate
39+
rows and operations like ``cumsum()`` that window across a non-unique
40+
index can have some non-deternimism.
3741
3842
.. note::
3943
By default, even SQL query inputs with an ORDER BY clause create a

0 commit comments

Comments
 (0)
Failed to load comments.