Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: googleapis/python-bigquery-dataframes
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: v0.19.2
Choose a base ref
...
head repository: googleapis/python-bigquery-dataframes
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: v0.20.0
Choose a head ref
  • 15 commits
  • 48 files changed
  • 7 contributors

Commits on Jan 23, 2024

  1. feat: update cut to work without labels = False and show intervals as…

    … dict (#335)
    
    * test ver.
    
    * add test and adjustment
    
    * update test and docstring.
    
    * remove unused import.
    
    * update code examples.
    
    * COde formatted.
    
    * Update error and unittest.
    
    * Update labels selections.
    Genesis929 authored Jan 23, 2024
    Copy the full SHA
    4ff53db View commit details
  2. fix: Series iteration correctly returns values instead of index (#339)

    * fix: Series iteration correctly returns values instead of index
    
    * Update iter docstring
    TrevorBergeron authored Jan 23, 2024
    Copy the full SHA
    2c6af9b View commit details

Commits on Jan 24, 2024

  1. Copy the full SHA
    75dc9e6 View commit details
  2. chore: Script to inspect and clean up stale GCFs (#331)

    Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
    - [ ] Make sure to open an issue as a [bug/issue](https://togithub.com/googleapis/python-bigquery-dataframes/issues/new/choose) before writing your code!  That way we can discuss the change, evaluate designs, and agree on the general idea
    - [ ] Ensure the tests and linter pass
    - [ ] Code coverage does not decrease (if any source code was changed)
    - [ ] Appropriate docs were updated (if necessary)
    
    Fixes internal issue 319307783 🦕
    
    ### Usage:
    ```bash
    $ python scripts/manage_cloud_functions.py --help
    usage: manage_cloud_functions.py [-h] -p PROJECT_ID [-r REGIONS] {summary,cleanup} ...
    
    Manage cloud functions created to serve bigframes remote functions.
    
    options:
      -h, --help            show this help message and exit
      -p PROJECT_ID, --project-id PROJECT_ID
                            GCP project-id.
      -r REGIONS, --regions REGIONS
                            Cloud functions region(s). If multiple regions, Specify comma separated (e.g. region1,region2)
    
    subcommands:
      {summary,cleanup}
        summary             BigFrames cloud functions summary.
        cleanup             BigFrames cloud functions clean up.
    
    
    $ python scripts/manage_cloud_functions.py summary --help
    usage: manage_cloud_functions.py summary [-h]
    
    Show the bigframes cloud functions summary.
    
    options:
      -h, --help  show this help message and exit
    
    
    $ python scripts/manage_cloud_functions.py cleanup --help
    usage: manage_cloud_functions.py cleanup [-h] [-n NUMBER]
    
    Delete the stale bigframes cloud functions.
    
    options:
      -h, --help            show this help message and exit
      -n NUMBER, --number NUMBER
                            Number of stale (more than a day old) cloud functions to clean up.
    (venv) shobs@shobs-ct-3:~/code/bigframes1$ 
    
    ```
    
    ### Example:
    ```bash
    $ python scripts/manage_cloud_functions.py -p bigframes-dev summary
    us-central1: Total=1412, Recent=86, OlderThanADay=1326
    europe-west4: Total=270, Recent=24, OlderThanADay=246
    southamerica-west1: Total=269, Recent=23, OlderThanADay=246
    europe-west1: Total=262, Recent=23, OlderThanADay=239
    asia-southeast1: Total=260, Recent=18, OlderThanADay=242
    us-east1: Total=1, Recent=0, OlderThanADay=1
    
    $ python scripts/manage_cloud_functions.py -p bigframes-dev -r us-central1,europe-west4 summary
    us-central1: Total=1412, Recent=85, OlderThanADay=1327
    europe-west4: Total=270, Recent=24, OlderThanADay=246
    
    $ python scripts/manage_cloud_functions.py -p bigframes-dev -r us-central1,europe-west4 cleanup -n 2
    [us-central1]: deleted [1] projects/bigframes-dev/locations/us-central1/functions/bigframes-597cc02ef5ce0525e4f51697b5a83b6c-3pfpu6gu last updated on 2024-01-08 21:47:58.503628+00:00
    [us-central1]: deleted [2] projects/bigframes-dev/locations/us-central1/functions/bigframes-68f796a13666bb3bfe354dd1adaeef71 last updated on 2024-01-09 21:52:49.620259+00:00
    [europe-west4]: deleted [1] projects/bigframes-dev/locations/europe-west4/functions/bigframes-558d0ca6649537a9e45896faf08b0a7a last updated on 2024-01-12 21:15:04.379198+00:00
    [europe-west4]: deleted [2] projects/bigframes-dev/locations/europe-west4/functions/bigframes-4b7705561ec336ed80722a8e6e56ac41 last updated on 2024-01-08 05:34:59.331828+00:00
    
    $ python scripts/manage_cloud_functions.py -p bigframes-dev -r us-central1,europe-west4 summary
    us-central1: Total=1410, Recent=85, OlderThanADay=1325
    europe-west4: Total=269, Recent=25, OlderThanADay=244
    ```
    shobsi authored Jan 24, 2024
    Copy the full SHA
    47c3285 View commit details
  3. feat: add ARIMA_EVAULATE options in forecasting models (#336)

    * feat: add ARIMA_EVAULATE options in forecasting models
    
    * feat: add summary method
    
    * fix minor errors
    
    * fix failed tests
    
    * address comments
    ashleyxuu authored Jan 24, 2024
    Copy the full SHA
    73e997b View commit details

Commits on Jan 25, 2024

  1. refactor: add output type annotations to scalar ops (#338)

    * refactor: add output type annotations to scalar ops
    
    * use same expression type annotation everywhere
    
    * pr comments
    TrevorBergeron authored Jan 25, 2024
    Copy the full SHA
    d88c562 View commit details
  2. feat: add DataFrame.peek() as an efficient alternative to head()

    …results preview (#318)
    
    * feat: add efficient peek dataframe preview
    
    * add force parameter to peek to cache full dataframe
    
    * add df.peek docstring
    
    * set peek to default force=False
    
    * update peek docstring and error type
    
    ---------
    
    Co-authored-by: Tim Swast <swast@google.com>
    TrevorBergeron and tswast authored Jan 25, 2024
    Copy the full SHA
    9c34d83 View commit details

Commits on Jan 26, 2024

  1. feat: Improve error message for drive based BQ table reads (#344)

    * feat: Improve error message for drive based BQ table reads
    
    * move exception handling deeper to apply to read_gbq*
    
    * add unit tests
    shobsi authored Jan 26, 2024
    Copy the full SHA
    0794788 View commit details
  2. fix: chance default connection name in getting_started.ipnyb (#347)

    Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
    - [X ] Make sure to open an issue as a [bug/issue](https://togithub.com/googleapis/python-bigquery-dataframes/issues/new/choose) before writing your code!  That way we can discuss the change, evaluate designs, and agree on the general idea
    - [X ] Ensure the tests and linter pass
    - [ X] Code coverage does not decrease (if any source code was changed)
    - [X ] Appropriate docs were updated (if necessary)
    
    Fixes #346
    
    The existing notebook references an incorrect default connection name. This PR corrects that so that users can more easily cleanup after they use the [getting started notebook](https://togithub.com/googleapis/python-bigquery-dataframes/blob/main/notebooks/getting_started/getting_started_bq_dataframes.ipynb)
    shanecglass authored Jan 26, 2024
    Copy the full SHA
    677f014 View commit details
  3. feat: Add Index constructor, repr, copy, get_level_values, to_series (#…

    …334)
    
    * feat: Add Index constructor, copy, get_level_values, to_series
    
    fix mypy error
    
    * fix constructor bug
    
    * fix error with index name mutation
    
    * refactor index to make mutation clearer
    
    * fix index bugs
    
    * give index custom repr
    
    ---------
    
    Co-authored-by: Huan Chen <142538604+Genesis929@users.noreply.github.com>
    TrevorBergeron and Genesis929 authored Jan 26, 2024
    Copy the full SHA
    e5d054e View commit details

Commits on Jan 29, 2024

  1. refactor: Split aggregate ops from implementation (#354)

    Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
    - [ ] Make sure to open an issue as a [bug/issue](https://togithub.com/googleapis/python-bigquery-dataframes/issues/new/choose) before writing your code!  That way we can discuss the change, evaluate designs, and agree on the general idea
    - [ ] Ensure the tests and linter pass
    - [ ] Code coverage does not decrease (if any source code was changed)
    - [ ] Appropriate docs were updated (if necessary)
    
    Fixes #<issue_number_goes_here> 🦕
    TrevorBergeron authored Jan 29, 2024
    Copy the full SHA
    99ed6c3 View commit details
  2. chore: pin pytest version (#358)

    Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
    - [ ] Make sure to open an issue as a [bug/issue](https://togithub.com/googleapis/python-bigquery-dataframes/issues/new/choose) before writing your code!  That way we can discuss the change, evaluate designs, and agree on the general idea
    - [ ] Ensure the tests and linter pass
    - [ ] Code coverage does not decrease (if any source code was changed)
    - [ ] Appropriate docs were updated (if necessary)
    
    Fixes #<issue_number_goes_here> 🦕
    tswast authored Jan 29, 2024
    Copy the full SHA
    6795ed2 View commit details

Commits on Jan 30, 2024

  1. chore: ensure colab sample notebooks are tested (#351)

    * chore: ensure colab sample notebooks are tested
    
    * make restore from backup robust to when the backup doesn't exist
    
    * fix path to notebook params scripts
    
    * exclude notebooks that need parameters other than project_id
    
    * add missing dependencies
    
    * notebook testing fixes
    
    * add sleep to avoid some bucket flakiness
    
    * Revert "add sleep to avoid some bucket flakiness"
    
    This reverts commit dfee838.
    
    * exclude bq_dataframes_llm_code_generation sample
    tswast authored Jan 30, 2024
    Copy the full SHA
    5aad3a1 View commit details
  2. docs: add code samples for Series.{between, cumprod} (#353)

    Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
    - [ ] Make sure to open an issue as a [bug/issue](https://togithub.com/googleapis/python-bigquery-dataframes/issues/new/choose) before writing your code!  That way we can discuss the change, evaluate designs, and agree on the general idea
    - [ ] Ensure the tests and linter pass
    - [ ] Code coverage does not decrease (if any source code was changed)
    - [x] Appropriate docs were updated (if necessary)
         - [x] `Series.between()`: https://screenshot.googleplex.com/BhHpZsL7S9d3FsG
         - [x] `Series.cumprod()`: https://screenshot.googleplex.com/7o7gDNwJZEWst84
    
    Fixes #<issue_number_goes_here> 🦕
    ashleyxuu authored Jan 30, 2024
    Copy the full SHA
    09a52fd View commit details
  3. chore(main): release 0.20.0 (#342)

    Co-authored-by: release-please[bot] <55107282+release-please[bot]@users.noreply.github.com>
    release-please[bot] authored Jan 30, 2024
    Copy the full SHA
    18efb83 View commit details
Showing with 2,074 additions and 593 deletions.
  1. +22 −0 CHANGELOG.md
  2. +2 −2 bigframes/core/__init__.py
  3. +12 −4 bigframes/core/blocks.py
  4. +3 −3 bigframes/core/compile/__init__.py
  5. +413 −0 bigframes/core/compile/aggregate_compiler.py
  6. 0 bigframes/core/compile/analytic_compiler.py
  7. +16 −4 bigframes/core/compile/compiled.py
  8. +21 −17 bigframes/core/compile/compiler.py
  9. +34 −9 bigframes/core/expression.py
  10. +1 −1 bigframes/core/indexers.py
  11. +146 −37 bigframes/core/indexes/index.py
  12. +55 −0 bigframes/core/nodes.py
  13. +6 −3 bigframes/core/reshape/__init__.py
  14. +52 −10 bigframes/dataframe.py
  15. +62 −3 bigframes/dtypes.py
  16. +12 −38 bigframes/functions/remote_function.py
  17. +7 −0 bigframes/ml/core.py
  18. +25 −0 bigframes/ml/forecasting.py
  19. +6 −0 bigframes/ml/sql.py
  20. +163 −73 bigframes/operations/__init__.py
  21. +5 −306 bigframes/operations/aggregations.py
  22. +31 −11 bigframes/operations/base.py
  23. +80 −0 bigframes/operations/type.py
  24. +8 −10 bigframes/series.py
  25. +30 −6 bigframes/session/__init__.py
  26. +1 −1 bigframes/version.py
  27. +13 −4 notebooks/getting_started/getting_started_bq_dataframes.ipynb
  28. +68 −25 noxfile.py
  29. +195 −0 scripts/manage_cloud_functions.py
  30. +65 −0 scripts/notebooks_fill_params.py
  31. +35 −0 scripts/notebooks_restore_from_backup.py
  32. +33 −2 tests/system/large/ml/test_forecasting.py
  33. +40 −0 tests/system/small/ml/test_forecasting.py
  34. +31 −0 tests/system/small/test_dataframe.py
  35. +72 −0 tests/system/small/test_index.py
  36. +24 −1 tests/system/small/test_pandas.py
  37. +44 −0 tests/system/small/test_series.py
  38. +49 −0 tests/unit/core/test_expression.py
  39. +13 −0 tests/unit/ml/test_sql.py
  40. +29 −1 tests/unit/session/test_session.py
  41. +1 −3 tests/unit/test_dtypes.py
  42. +4 −1 tests/unit/test_pandas.py
  43. +6 −3 tests/unit/test_remote_function.py
  44. +5 −2 third_party/bigframes_vendored/pandas/core/frame.py
  45. +6 −6 third_party/bigframes_vendored/pandas/core/generic.py
  46. +51 −0 third_party/bigframes_vendored/pandas/core/indexes/base.py
  47. +12 −5 third_party/bigframes_vendored/pandas/core/reshape/tile.py
  48. +65 −2 third_party/bigframes_vendored/pandas/core/series.py
22 changes: 22 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -4,6 +4,28 @@

[1]: https://pypi.org/project/bigframes/#history

## [0.20.0](https://github.com/googleapis/python-bigquery-dataframes/compare/v0.19.2...v0.20.0) (2024-01-30)


### Features

* Add `DataFrame.peek()` as an efficient alternative to `head()` results preview ([#318](https://github.com/googleapis/python-bigquery-dataframes/issues/318)) ([9c34d83](https://github.com/googleapis/python-bigquery-dataframes/commit/9c34d834e83ca5514bee723ebb9a7ad1ad50e88d))
* Add ARIMA_EVAULATE options in forecasting models ([#336](https://github.com/googleapis/python-bigquery-dataframes/issues/336)) ([73e997b](https://github.com/googleapis/python-bigquery-dataframes/commit/73e997b3e80f844a8120b52ed2ece8b046cf4ca9))
* Add Index constructor, repr, copy, get_level_values, to_series ([#334](https://github.com/googleapis/python-bigquery-dataframes/issues/334)) ([e5d054e](https://github.com/googleapis/python-bigquery-dataframes/commit/e5d054e93a05f5c504e8db57b954c07d33e5f5b9))
* Improve error message for drive based BQ table reads ([#344](https://github.com/googleapis/python-bigquery-dataframes/issues/344)) ([0794788](https://github.com/googleapis/python-bigquery-dataframes/commit/0794788a2d232d795d803cd0c5b3f7d51c562cf1))
* Update cut to work without labels = False and show intervals as dict ([#335](https://github.com/googleapis/python-bigquery-dataframes/issues/335)) ([4ff53db](https://github.com/googleapis/python-bigquery-dataframes/commit/4ff53db48133b817bec5f123b634690244a610d3))


### Bug Fixes

* Chance default connection name in getting_started.ipnyb ([#347](https://github.com/googleapis/python-bigquery-dataframes/issues/347)) ([677f014](https://github.com/googleapis/python-bigquery-dataframes/commit/677f0146acf19def88fddbeb0527a078458948ae))
* Series iteration correctly returns values instead of index ([#339](https://github.com/googleapis/python-bigquery-dataframes/issues/339)) ([2c6af9b](https://github.com/googleapis/python-bigquery-dataframes/commit/2c6af9ba8b362dae39a6e082cdc816c955c73517))


### Documentation

* Add code samples for `Series.{between, cumprod}` ([#353](https://github.com/googleapis/python-bigquery-dataframes/issues/353)) ([09a52fd](https://github.com/googleapis/python-bigquery-dataframes/commit/09a52fda19cde8efa6b20731d5b8e21f50b18a9a))

## [0.19.2](https://github.com/googleapis/python-bigquery-dataframes/compare/v0.19.1...v0.19.2) (2024-01-22)


4 changes: 2 additions & 2 deletions bigframes/core/__init__.py
Original file line number Diff line number Diff line change
@@ -106,10 +106,10 @@ def get_column_type(self, key: str) -> bigframes.dtypes.Dtype:
return self._compile_ordered().get_column_type(key)

def _compile_ordered(self) -> compiling.OrderedIR:
return compiling.compile_ordered(self.node)
return compiling.compile_ordered_ir(self.node)

def _compile_unordered(self) -> compiling.UnorderedIR:
return compiling.compile_unordered(self.node)
return compiling.compile_unordered_ir(self.node)

def row_count(self) -> ArrayValue:
"""Get number of rows in ArrayValue as a single-entry ArrayValue."""
16 changes: 12 additions & 4 deletions bigframes/core/blocks.py
Original file line number Diff line number Diff line change
@@ -287,15 +287,14 @@ def reset_index(self, drop: bool = True) -> Block:
A new Block because dropping index columns can break references
from Index classes that point to this block.
"""
block = self
new_index_col_id = guid.generate_guid()
expr = self._expr.promote_offsets(new_index_col_id)
if drop:
# Even though the index might be part of the ordering, keep that
# ordering expression as reset_index shouldn't change the row
# order.
expr = expr.drop_columns(self.index_columns)
block = Block(
return Block(
expr,
index_columns=[new_index_col_id],
column_labels=self.column_labels,
@@ -321,13 +320,12 @@ def reset_index(self, drop: bool = True) -> Block:
# See: https://pandas.pydata.org/docs/reference/api/pandas.Index.insert.html
column_labels_modified = column_labels_modified.insert(level, label)

block = Block(
return Block(
expr,
index_columns=[new_index_col_id],
column_labels=column_labels_modified,
index_labels=[None],
)
return block

def set_index(
self,
@@ -432,8 +430,18 @@ def to_pandas(
downsampling=sampling, ordered=ordered
)
)
df.set_axis(self.column_labels, axis=1, copy=False)
return df, query_job

def try_peek(self, n: int = 20) -> typing.Optional[pd.DataFrame]:
if self.expr.node.peekable:
iterator, _ = self.session._peek(self.expr, n)
df = self._to_dataframe(iterator)
self._copy_index_to_pandas(df)
return df
else:
return None

def to_pandas_batches(self):
"""Download results one message at a time."""
dtypes = dict(zip(self.index_columns, self.index_dtypes))
6 changes: 3 additions & 3 deletions bigframes/core/compile/__init__.py
Original file line number Diff line number Diff line change
@@ -13,11 +13,11 @@
# limitations under the License.

from bigframes.core.compile.compiled import OrderedIR, UnorderedIR
from bigframes.core.compile.compiler import compile_ordered, compile_unordered
from bigframes.core.compile.compiler import compile_ordered_ir, compile_unordered_ir

__all__ = [
"compile_ordered",
"compile_unordered",
"compile_ordered_ir",
"compile_unordered_ir",
"OrderedIR",
"UnorderedIR",
]
Loading