Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: bigframes.options and bigframes.option_context now uses thread-local variables to prevent context managers in separate threads from affecting each other #652

Merged
merged 12 commits into from
May 6, 2024

Conversation

tswast
Copy link
Collaborator

@tswast tswast commented May 1, 2024

In our tests, this allows us to actually test things like bf.option_context("display.repr_mode", "deferred"): without always having some other test change the display mode and break the test.

Fixes internal issue 308657813

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

  • Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)

Fixes #<issue_number_goes_here> 🦕

Page not found · GitHub · GitHub
Skip to content
404 “This is not the web page you are looking for”
…ead-local variables

to prevent context managers in separate threads from affecting each other

In our tests, this allows us to actually test things like
`bf.option_context("display.repr_mode", "deferred"):` without always
having some other test change the display mode and break the test.

Fixes internal issue 308657813
@tswast tswast requested review from a team as code owners May 1, 2024 21:22
@tswast tswast requested a review from junyazhang May 1, 2024 21:22
@product-auto-label product-auto-label bot added the size: l Pull request size is large. label May 1, 2024
@product-auto-label product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. label May 1, 2024
self._compute_options = compute_options.ComputeOptions()
self._local.bigquery_options = None

def _init_biquery_thread_local(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: bigquery

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Fixed in 4fa1a92

@tswast tswast requested a review from GarrettWu May 1, 2024 21:47
@tswast
Copy link
Collaborator Author

tswast commented May 2, 2024

e2e failure of test_model_register appears to be unrelated.

@tswast
Copy link
Collaborator Author

tswast commented May 2, 2024

Cover failed.

tests/system/small/test_progress_bar.py 72 1 28 1 98% 144

Looks like I now have some dead code in that test. Will investigate.

@milkshakeiii
Copy link
Contributor

Messaged offline. @tswast will fix the failing test, and then I will review.

@tswast
Copy link
Collaborator Author

tswast commented May 2, 2024

test_query_job_dry_run is still failing. Will investigate more tomorrow. Maybe I'm misunderstanding how thread locals work...

@tswast
Copy link
Collaborator Author

tswast commented May 3, 2024

I'm able to reproduce locally. The thread local variable is working how I expect, but the test is making an assumption that the DataFrame doesn't have a query_job populated. This assumption is violated when the DataFrame is used across multiple tests.

I'll update the test.

________________________________________________________________________________ test_query_job_dry_run ________________________________________________________________________________
[gw2] linux -- Python 3.11.1 /usr/local/google/home/swast/envs/bigframes/bin/python3.11

penguins_df_default_index =                                       species     island  culmen_length_mm  \
0           Gentoo penguin (Pygoscelis p...  <NA>         <NA>    <NA>  
24             20.7              191.0       3900.0  FEMALE  
...

[344 rows x 7 columns]

    def test_query_job_dry_run(penguins_df_default_index: bf.dataframe.DataFrame):
        expected = "Computation deferred. Computation will process"
    
        with bf.option_context("display.repr_mode", "deferred"):
            df_result = repr(penguins_df_default_index)
>           assert expected in df_result
E           AssertionError: assert 'Computation deferred. Computation will process' in 'Query Job Info\nJob url: https://console.cloud.google.com/bigquery?project=swast-scratch&j=bq:US:9a761424-b164-4c31-a...240503_session444e7b_fbabcb4962f841a8bb92ec2f27d17bb2\nSlot Time: a moment\nBytes Processed: 28.9 kB\nCache hit: False'

tests/system/small/test_progress_bar.py:142: AssertionError
--------------------------------------------------------------------------------- Captured stdout call ---------------------------------------------------------------------------------
DisplayOptions(max_columns=20, max_rows=25, progress_bar='auto', repr_mode='deferred', max_info_columns=100, max_info_rows=200000, memory_usage=True)
deferred

Copy link
Contributor

@milkshakeiii milkshakeiii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just had one simplification suggestion, feel free to just close and re-request if you prefer the way it is now. Thanks!

# BigQuery options are special because they can only be set once per
# session, so we need an indicator as to whether we are using the
# thread-local session or the global session.
self._local.is_bigquery_thread_local = False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm reading this correctly, would checking for self._local.bigquery_options is None be able to replace this boolean state? It seems a little more DRY to do it that way, if so.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it! Done in a7f3c96

# Already thread-local, so don't reset any options that have been set
# already. No locks needed since this only modifies thread-local
# variables.
if self._local.is_bigquery_thread_local:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i.e., this could be "if self._local.bigquery_options is not None"


@property
def bigquery(self) -> bigquery_options.BigQueryOptions:
"""Options to use with the BigQuery engine."""
if self._local.is_bigquery_thread_local:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here too


This is set to True by using `option_context` with a "bigquery" option.
"""
return self._local.is_bigquery_thread_local
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could also check none (and the comment would need to be tweaked.)

# sessions if we allow that.
if bigframes.options.is_bigquery_thread_local:
bigframes.close_session()
bigframes.options._local.is_bigquery_thread_local = False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

close_session could set _local.bigquery_options to None

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Someone could call close_session() inside of a context manager. We actually have some tests that start and stop multiple sessions in this thread-local context. By using the options as the thread locality indicator, we need to make sure that object stays around.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, right, makes sense.

@tswast tswast requested a review from milkshakeiii May 3, 2024 18:15
Copy link
Contributor

@milkshakeiii milkshakeiii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@tswast tswast merged commit 651fd7d into main May 6, 2024
16 checks passed
@tswast tswast deleted the b308657813-thread-local-option_context branch May 6, 2024 01:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. size: l Pull request size is large.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants