-
Notifications
You must be signed in to change notification settings - Fork 14.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DataflowTemplatedJobStartOperator
fix overwriting of location with default value, when a region is provided.
#31082
DataflowTemplatedJobStartOperator
fix overwriting of location with default value, when a region is provided.
#31082
Conversation
when a region is provided.
Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
location: str = DEFAULT_DATAFLOW_LOCATION, | ||
location: str | None = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a breaking change or a bug fix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not breaking as the default location is assigned in hook when it is passed as null and there is no region defined (as far as I understand)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have the same concern, I think this needs to be tested
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is bug fix. The additional tests are provided - the case is valid but will be broken without the fix. The default value is not necessary on the operator level, since it is being passed to the hook anyway (and it has default).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also additional level of protection can be provided at _fallback_variable_parameter method, but looks redundant since the default value is caught from the DataflowHook.start_template_dataflow definition itself. Please let me know if it is useful - I can publish the PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My question was for clarification of this change in the release notes.
I have no objection either way
Co-authored-by: Pankaj Singh <[email protected]>
DataflowTemplatedJobStartOperator
fix overwriting of location with default value, when a region is provided.
Awesome work, congrats on your first merged pull request! You are invited to check our Issue Tracker for additional contributions. |
Fix overwriting of location with default value, when a region is provided.
TL/DR
When the
DataflowTemplateOperator
is being migrated from Airflow 1 to Airflow 2, ifregion
property is setup with nolocation
passing - Fallback anyway throws a message:The mutually exclusive parameter
location
andregion
key invariables
parameter are both present. Please remove one.New unit tests of operator might explain better - they are not going to pass without the fix.
Details
The
DataflowTemplatedJobStartOperator
has a parameterlocation
with non empty default value. This operator is using theDataflowHook
object.DataflowHook
makes a check: iflocation
andregion
are setup together at the same time, than throws the exception (like above). SinceDataflowTemplatedJobStartOperator
'slocation
has a default value, it will always pass non-zero value to theDataflowHook
.At the same time
DataflowHook
has the same default value for itslocation
parameter, soDataflowTemplatedJobStartOperator
can safely pass None value (Special unit test is also added, to make sureDataflowHook
will use default value whenlocation
andregion
are not provided at the same time).Notices
The same fallback
_fallback_to_location_from_variables
is applied to 4 methods (at the moment of writing):BeamRunJavaPipelineOperator
,DataflowTemplatedJobStartOperator
andDataflowCreateJavaJobOperator
with explicitly no passinglocation
andregion
at the same time)So, since other places are not affected, the change can be pretty small and focused on
DataflowTemplatedJobStartOperator
only.P.S. Looks like some other people had the same problem, for example: https://stackoverflow.com/questions/70254851/dataflowtemplateoperator-job-failing-on-cloud-composer-after-upgrading-to-airflo