Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataflowTemplatedJobStartOperator fix overwriting of location with default value, when a region is provided. #31082

Merged

Conversation

VVildVVolf
Copy link
Contributor

Fix overwriting of location with default value, when a region is provided.

TL/DR

When the DataflowTemplateOperator is being migrated from Airflow 1 to Airflow 2, if region property is setup with no location passing - Fallback anyway throws a message:

The mutually exclusive parameter location and region key in variables parameter are both present. Please remove one.

New unit tests of operator might explain better - they are not going to pass without the fix.

Details

The DataflowTemplatedJobStartOperator has a parameter location with non empty default value. This operator is using the DataflowHook object. DataflowHook makes a check: if location and region are setup together at the same time, than throws the exception (like above). Since DataflowTemplatedJobStartOperator's location has a default value, it will always pass non-zero value to the DataflowHook.

At the same time DataflowHook has the same default value for its location parameter, so DataflowTemplatedJobStartOperator can safely pass None value (Special unit test is also added, to make sure DataflowHook will use default value when location and region are not provided at the same time).

Notices

The same fallback _fallback_to_location_from_variables is applied to 4 methods (at the moment of writing):

  1. start_template_dataflow (fixed case)
  2. start_java_dataflow (found only definition and unit test references)
  3. start_python_dataflow (found only definition and unit test references)
  4. is_job_dataflow_running (referenced by BeamRunJavaPipelineOperator, DataflowTemplatedJobStartOperator and DataflowCreateJavaJobOperator with explicitly no passing location and region at the same time)

So, since other places are not affected, the change can be pretty small and focused on DataflowTemplatedJobStartOperator only.

P.S. Looks like some other people had the same problem, for example: https://stackoverflow.com/questions/70254851/dataflowtemplateoperator-job-failing-on-cloud-composer-after-upgrading-to-airflo


@boring-cyborg boring-cyborg bot added area:providers provider:google Google (including GCP) related issues labels May 5, 2023
@boring-cyborg
Copy link

boring-cyborg bot commented May 5, 2023

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
Here are some useful points:

  • Pay attention to the quality of your code (ruff, mypy and type annotations). Our pre-commits will help you with that.
  • In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
  • Consider using Breeze environment for testing locally, it's a heavy docker but it ships with a working Airflow and a lot of integrations.
  • Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
  • Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
  • Be sure to read the Airflow Coding style.
    Apache Airflow is a community-driven project and together we are making it better 🚀.
    In case of doubts contact the developers at:
    Mailing List: [email protected]
    Slack: https://s.apache.org/airflow-slack

Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

location: str = DEFAULT_DATAFLOW_LOCATION,
location: str | None = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a breaking change or a bug fix?

Copy link
Member

@potiuk potiuk May 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not breaking as the default location is assigned in hook when it is passed as null and there is no region defined (as far as I understand)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have the same concern, I think this needs to be tested

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is bug fix. The additional tests are provided - the case is valid but will be broken without the fix. The default value is not necessary on the operator level, since it is being passed to the hook anyway (and it has default).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also additional level of protection can be provided at _fallback_variable_parameter method, but looks redundant since the default value is caught from the DataflowHook.start_template_dataflow definition itself. Please let me know if it is useful - I can publish the PR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My question was for clarification of this change in the release notes.
I have no objection either way

@VVildVVolf VVildVVolf requested a review from eladkal May 5, 2023 22:48
@eladkal eladkal changed the title Fix overwriting of location with default value, when a region is provided. DataflowTemplatedJobStartOperator fix overwriting of location with default value, when a region is provided. May 8, 2023
@potiuk potiuk merged commit 810b5d4 into apache:main May 8, 2023
@boring-cyborg
Copy link

boring-cyborg bot commented May 8, 2023

Awesome work, congrats on your first merged pull request! You are invited to check our Issue Tracker for additional contributions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:providers provider:google Google (including GCP) related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants