-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: raise NoDefaultIndexError
from read_gbq
on clustered/partitioned tables with no index_col
or filters
set
#631
feat: raise NoDefaultIndexError
from read_gbq
on clustered/partitioned tables with no index_col
or filters
set
#631
Conversation
…rtitioned-default-index-error
…ed-default-index-error
NoDefaultIndexError
from read_gbq
on clustered/partitioned tables with no index_col
set. Set force=True
to allow default sequential indexNoDefaultIndexError
from read_gbq
on clustered/partitioned tables with no index_col
set. Set bigframes.options.compute.allow_default_index_on_clustered_partitioned_tables = True
to allow default sequential index
…or-partitioned-default-index-error
This ensures the cached `primary_keys` is more likely to be correct, in case the user called ALTER TABLE after we originally cached the snapshot time.
…tered-or-partitioned-default-index-error
NoDefaultIndexError
from read_gbq
on clustered/partitioned tables with no index_col
set. Set bigframes.options.compute.allow_default_index_on_clustered_partitioned_tables = True
to allow default sequential indexNoDefaultIndexError
from read_gbq
on clustered/partitioned tables with no index_col
set
…or-partitioned-default-index-error
…ve to separate module add todos
NoDefaultIndexError
from read_gbq
on clustered/partitioned tables with no index_col
setNoDefaultIndexError
from read_gbq
on clustered/partitioned tables with no index_col
or filters
set
**New in bigframes version 1.4.0**: Support | ||
:class:`bigframes.enums.TypeKind` to override default index | ||
behavior. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TypeKind or DefaultIndexKind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DefaultIndexKind. Good catch. I had tried a few names for this but the refactor option didn't catch docstrings I think.
rows via pandas-like outer join behavior. Operations like | ||
``cumsum()`` that window across a non-unique index can have some | ||
unpredictability due to ambiguous ordering. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The part about non-determinism I don't think is correct? My understanding is if the index is non-unique, we fall back to hidden hashes to ensure total ordering.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dropped this.
@@ -25,6 +27,8 @@ | |||
"BigQueryOptions", | |||
"get_global_session", | |||
"close_session", | |||
"enums", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is enums an intuitive module, or would a domain-related term be better, eg indexing.IndexType
or directly putting the enum in the main module, bigframes.pandas.IndexType
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to find some guidance on this, but Python community doesn't seem particularly prescriptive about module names.
PEP-8 has this to say:
Modules should have short, all-lowercase names. Underscores can be used in the module name if it improves readability.
https://peps.python.org/pep-0008/#package-and-module-names
Google Python style guide has a bit more to say:
Place related classes and top-level functions together in a module. Unlike Java, there is no need to limit yourself to one class per module.
Use CapWords for class names, but lower_with_under.py for module names.
https://google.github.io/styleguide/pyguide.html#3162-naming-conventions
I tried a few of these options out locally (bigframes.indexes.DefaultIndexKind and bigframes.pandas.DefaultIndexKind), but it feels strange to have something not really mimicking pandas in the pandas sub-package and bigframes.indexes.DefaultIndexKind would imply that we should move the Index and MultiIndex classes there, which is kinda the opposite of what we want to do.
The other option we could try is bigframes.pandas.core.indexes
, but in pandas "core" is how they signify that an API is private an not to be relied on.
IMO, determining if classes are "related" by type for the basic types (e.g. exceptions, enums, ...) will be less effort for us long-term than having to figure out which public package to place these things if it doesn't fit in an existing API.
@@ -107,11 +117,18 @@ def read_gbq( | |||
`project.dataset.tablename` or `dataset.tablename`. | |||
Can also take wildcard table name, such as `project.dataset.table_prefix*`. | |||
In tha case, will read all the matched table as one DataFrame. | |||
index_col (Iterable[str] or str): | |||
index_col (Iterable[str], str, bigframes.enums.IndexKind): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DefaultIndexKind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. Done.
Looks like |
…or-partitioned-default-index-error
…ned-default-index-error' into b335727141-clustered-or-partitioned-default-index-error
Done! Looks like from https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html |
…or-partitioned-default-index-error
Please review #636 first, as this PR builds on that.
Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
Fixes internal issue 335727141
🦕