Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

balancer/rls: Add picker metrics #7484

Merged
merged 5 commits into from
Aug 9, 2024
Merged

Conversation

zasweq
Copy link
Contributor

@zasweq zasweq commented Aug 6, 2024

This PR adds the implementation for RLS metrics.

RELEASE NOTES:

  • balancer/rls: Add picker Pick information metrics

@zasweq zasweq requested review from easwars and dfawley August 6, 2024 17:26
@zasweq zasweq added the Type: Feature New features or improvements in behavior label Aug 6, 2024
@zasweq zasweq added this to the 1.66 Release milestone Aug 6, 2024
Copy link

codecov bot commented Aug 6, 2024

Codecov Report

Attention: Patch coverage is 91.07143% with 5 lines in your changes missing coverage. Please review.

Project coverage is 81.50%. Comparing base (c8716e5) to head (b427c24).
Report is 8 commits behind head on master.

Files Patch % Lines
balancer/rls/picker.go 88.88% 4 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #7484      +/-   ##
==========================================
- Coverage   81.58%   81.50%   -0.08%     
==========================================
  Files         357      357              
  Lines       27243    27294      +51     
==========================================
+ Hits        22227    22247      +20     
- Misses       3820     3831      +11     
- Partials     1196     1216      +20     
Files Coverage Δ
balancer/rls/balancer.go 85.76% <100.00%> (+0.14%) ⬆️
balancer/rls/picker.go 91.19% <88.88%> (+0.49%) ⬆️

... and 21 files with indirect coverage changes

balancer/rls/cache.go Outdated Show resolved Hide resolved
balancer/rls/balancer.go Outdated Show resolved Hide resolved
balancer/rls/balancer.go Outdated Show resolved Hide resolved
balancer/rls/balancer.go Outdated Show resolved Hide resolved
@easwars easwars assigned zasweq and unassigned easwars and dfawley Aug 7, 2024
@easwars
Copy link
Contributor

easwars commented Aug 7, 2024

Also, please add a more descriptive release note entry.

@zasweq zasweq changed the title balancer/rls: Add metrics balancer/rls: Add picker metrics Aug 8, 2024
@zasweq zasweq assigned easwars and dfawley and unassigned zasweq Aug 8, 2024
balancer/rls/picker.go Show resolved Hide resolved
balancer/rls/picker.go Outdated Show resolved Hide resolved
balancer/rls/picker.go Outdated Show resolved Hide resolved
@easwars easwars assigned zasweq and unassigned easwars and dfawley Aug 8, 2024
@zasweq zasweq assigned easwars and dfawley and unassigned zasweq Aug 8, 2024
balancer/rls/balancer.go Show resolved Hide resolved
Comment on lines 181 to 187
pr := "fail"
if _, ok := status.FromError(err); ok {
pr = "drop"
}
return res, func() {
targetPicksMetric.Record(p.metricsRecorder, 1, p.grpcTarget, p.rlsServerTarget, cpw.target, pr)
}, err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about making a helper since this logic is used in two places now:

// errToPickResult is a helper function which converts the error value returned
// by Pick() to a string that represents the pick result.
func errToPickResult(err error) string {
	if err == nil {
		return "complete"
	}
	if errors.Is(err, balancer.ErrNoSubConnAvailable) {
		return "queue"
	}
	if _, ok := status.FromError(err); ok {
		return "drop"
	}
	return "fail"
}

And we need to handle the balancer.ErrNoSubConnAvailable case in your code, don't we? Otherwise, we will wrongly mark picks that a child policy wanted to be queued as "fail".

And if the decision is to not record a metric for queued picks, we need to make sure that we ignore that explicitly after we handle the balancer.ErrNoSubConnAvailable case, as in the helper I described above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh for some reason I thought the balancer.ErrNoSubConnAvailable was only possible to be emitted from this RLS Layer. I forgot this could emit from the child as well in these error cases. I will try your helper out.

balancer/rls/picker.go Outdated Show resolved Hide resolved
@easwars
Copy link
Contributor

easwars commented Aug 8, 2024

Also, how do you plan to test these changes?

@easwars easwars assigned zasweq and unassigned easwars and dfawley Aug 8, 2024
@zasweq zasweq assigned easwars and unassigned zasweq Aug 8, 2024
@zasweq
Copy link
Contributor Author

zasweq commented Aug 8, 2024

Thanks for the pass; great suggestions. I plan on testing this with unit tests for more specific sceanrios/counters, and e2e with RLS deployed as a top level balancer of channel, with a OpenTelemetry component configured. I can then verify OpenTelemetry eventually emits the expected metrics atoms for the RLS Balancer metrics.

balancer/rls/balancer.go Show resolved Hide resolved
Comment on lines 201 to 207
rf := func() {
targetPicksMetric.Record(p.metricsRecorder, 1, p.grpcTarget, p.rlsServerTarget, cpw.target, pr)
}
if pr == "queue" {
// Don't record metrics for queued Picks.
rf = func() {}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and down below, I would pull in the check for queued picks inside the callback:

rf := func() {
	if pr == "queue" {
		// Don't record metrics for queued Picks.
		return
	}
	targetPicksMetric.Record(p.metricsRecorder, 1, p.grpcTarget, p.rlsServerTarget, cpw.target, pr)
}

The above approach would mean that for completed and failed picks, we would incur the extra cost of running a conditional if pr == "queue" { ... }, and branch statements can be expensive with CPU pipelining. I'm not too concerned about it though. I prefer the way the code looks when you pull in it. But if you are @dfawley thinks the cost is not worth the ergonomics, I'm fine with it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doug said he doesn't mind either, so I switched to this.

@easwars easwars removed their assignment Aug 8, 2024
@zasweq zasweq merged commit 7b9e012 into grpc:master Aug 9, 2024
13 checks passed
infovivek2020 pushed a commit to infovivek2020/grpc-go that referenced this pull request Aug 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Feature New features or improvements in behavior
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants