-
Notifications
You must be signed in to change notification settings - Fork 13.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize hash map operations in the query system #115747
base: master
Are you sure you want to change the base?
Conversation
These commits modify the If this was unintentional then you should revert the changes before this PR is merged. |
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
Optimize hash map operations in the query system This optimizes hash map operations in the query system by explicitly passing hashes and using more optimal operations. `find_or_find_insert_slot` in particular saves a hash table lookup over `entry`. It's not yet available in a safe API, but will be in rust-lang/hashbrown#466. <table><tr><td rowspan="2">Benchmark</td><td colspan="1"><b>Before</b></th><td colspan="2"><b>After</b></th></tr><tr><td align="right">Time</td><td align="right">Time</td><td align="right">%</th></tr><tr><td>🟣 <b>clap</b>:check</td><td align="right">1.6189s</td><td align="right">1.6129s</td><td align="right"> -0.37%</td></tr><tr><td>🟣 <b>hyper</b>:check</td><td align="right">0.2353s</td><td align="right">0.2337s</td><td align="right"> -0.67%</td></tr><tr><td>🟣 <b>regex</b>:check</td><td align="right">0.9344s</td><td align="right">0.9289s</td><td align="right"> -0.59%</td></tr><tr><td>🟣 <b>syn</b>:check</td><td align="right">1.4693s</td><td align="right">1.4652s</td><td align="right"> -0.28%</td></tr><tr><td>🟣 <b>syntex_syntax</b>:check</td><td align="right">5.6606s</td><td align="right">5.6439s</td><td align="right"> -0.30%</td></tr><tr><td>Total</td><td align="right">9.9185s</td><td align="right">9.8846s</td><td align="right"> -0.34%</td></tr><tr><td>Summary</td><td align="right">1.0000s</td><td align="right">0.9956s</td><td align="right"> -0.44%</td></tr></table> r? `@cjgillot`
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (d898dc6): comparison URL. Overall result: ✅ improvements - no action neededBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. @bors rollup=never Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)This benchmark run did not return any relevant results for this metric. CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeThis benchmark run did not return any relevant results for this metric. Bootstrap: 630.615s -> 639.436s (1.40%) |
[TEST] Optimize hash map operations in the query system This is a variant of rust-lang#115747 without using the `hashbrown` crate to see if that change the bootstrap impact. r? `@cjgillot`
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This could use another perf run to see if using the sysroot |
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
Optimize hash map operations in the query system This optimizes hash map operations in the query system by explicitly passing hashes and using more optimal operations. `find_or_find_insert_slot` in particular saves a hash table lookup over `entry`. It's not yet available in a safe API, but will be in rust-lang/hashbrown#466. <table><tr><td rowspan="2">Benchmark</td><td colspan="1"><b>Before</b></th><td colspan="2"><b>After</b></th></tr><tr><td align="right">Time</td><td align="right">Time</td><td align="right">%</th></tr><tr><td>🟣 <b>clap</b>:check</td><td align="right">1.6189s</td><td align="right">1.6129s</td><td align="right"> -0.37%</td></tr><tr><td>🟣 <b>hyper</b>:check</td><td align="right">0.2353s</td><td align="right">0.2337s</td><td align="right"> -0.67%</td></tr><tr><td>🟣 <b>regex</b>:check</td><td align="right">0.9344s</td><td align="right">0.9289s</td><td align="right"> -0.59%</td></tr><tr><td>🟣 <b>syn</b>:check</td><td align="right">1.4693s</td><td align="right">1.4652s</td><td align="right"> -0.28%</td></tr><tr><td>🟣 <b>syntex_syntax</b>:check</td><td align="right">5.6606s</td><td align="right">5.6439s</td><td align="right"> -0.30%</td></tr><tr><td>Total</td><td align="right">9.9185s</td><td align="right">9.8846s</td><td align="right"> -0.34%</td></tr><tr><td>Summary</td><td align="right">1.0000s</td><td align="right">0.9956s</td><td align="right"> -0.44%</td></tr></table> r? `@cjgillot`
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (9652a29): comparison URL. Overall result: ✅ improvements - no action neededBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. @bors rollup=never Instruction countThis is the most reliable metric that we have; it was used to determine the overall result at the top of this comment. However, even this metric can sometimes exhibit noise.
Max RSS (memory usage)Results (primary -1.2%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResults (primary -1.4%, secondary -3.0%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeThis benchmark run did not return any relevant results for this metric. Bootstrap: 773.717s -> 774.106s (0.05%) |
Perfs looks good and there's no longer a bootstrap regression. |
@bors r+ |
[dependencies.hashbrown] | ||
version = "0.15.2" | ||
default-features = false | ||
features = ["nightly"] # for may_dangle | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why removing this? I understand, that now code relies on whatever hashbrown version\features pulled in, so actual improvements can be from that version\features combination, not from actual code changes, and makes perf fragile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
$ cargo tree -p hashbrown@0.15 -e features -i --depth 2
hashbrown v0.15.2
`-- indexmap v2.7.0
|-- object v0.36.7
|-- wasmparser v0.219.1
|-- wasmparser v0.223.0
`-- wit-component v0.223.0
|-- indexmap feature "default"
|-- indexmap feature "serde"
`-- indexmap feature "std"
|-- hashbrown feature "default-hasher"
| |-- object v0.36.7 (*)
| `-- wasmparser v0.223.0 (*)
|-- hashbrown feature "nightly"
| `-- rustc_data_structures v0.0.0
`-- hashbrown feature "serde"
`-- wasmparser feature "serde"
This was only user of nightly
feature. For comparison, 0.14 version pulls different set of features:
$ cargo tree -p hashbrown@0.14 -e features -i --depth 2
hashbrown v0.14.5
|-- hashbrown feature "ahash"
| `-- hashbrown feature "default"
|-- hashbrown feature "allocator-api2"
| `-- hashbrown feature "default" (*)
|-- hashbrown feature "default" (*)
`-- hashbrown feature "inline-more"
`-- hashbrown feature "default" (*)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It ensures the HashMap
and HashTable
shares the same code instances, which could help with compile time. It also avoids compiling an extra hashbrown copy. Not sure how it makes perf fragile?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It ensures the HashMap and HashTable shares the same code instances, which could help with compile time.
Ok.
It also avoids compiling an extra hashbrown copy.
Isn't hashbrown 0.15 still pulled in via other crates?
Not sure how it makes perf fragile?
This relies on unspecified hashbrown, i.e. updating some other dependencies will in the future pull different hashbrown and affect this code (without knowing). Can you place back whatever working hashbrown configuration work now, so exact config will be tracked (yes, there is features merging, but still)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could do a new perf run with the crates version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be nice to just add inline-more
to hashbrown 0.15 and see results independently (of this PR). For example, when next version of https://github.com/rust-lang/thorin will be released, this will kill remaining hashbrown 0.14 and merge inline-more
feature with others of 0.15.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Posted rust-lang/thorin#39
@bors r- |
Try using std's hashbrown copy Just testing if this affects performance. It may relate to rust-lang#138708, rust-lang#137701 and rust-lang#115747.
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
Optimize hash map operations in the query system This optimizes hash map operations in the query system by explicitly passing hashes and using more optimal operations. `find_or_find_insert_slot` in particular saves a hash table lookup over `entry`. It's not yet available in a safe API, but will be in rust-lang/hashbrown#466. <table><tr><td rowspan="2">Benchmark</td><td colspan="1"><b>Before</b></th><td colspan="2"><b>After</b></th></tr><tr><td align="right">Time</td><td align="right">Time</td><td align="right">%</th></tr><tr><td>🟣 <b>clap</b>:check</td><td align="right">1.6189s</td><td align="right">1.6129s</td><td align="right"> -0.37%</td></tr><tr><td>🟣 <b>hyper</b>:check</td><td align="right">0.2353s</td><td align="right">0.2337s</td><td align="right"> -0.67%</td></tr><tr><td>🟣 <b>regex</b>:check</td><td align="right">0.9344s</td><td align="right">0.9289s</td><td align="right"> -0.59%</td></tr><tr><td>🟣 <b>syn</b>:check</td><td align="right">1.4693s</td><td align="right">1.4652s</td><td align="right"> -0.28%</td></tr><tr><td>🟣 <b>syntex_syntax</b>:check</td><td align="right">5.6606s</td><td align="right">5.6439s</td><td align="right"> -0.30%</td></tr><tr><td>Total</td><td align="right">9.9185s</td><td align="right">9.8846s</td><td align="right"> -0.34%</td></tr><tr><td>Summary</td><td align="right">1.0000s</td><td align="right">0.9956s</td><td align="right"> -0.44%</td></tr></table> r? `@cjgillot`
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (f0ac6ea): comparison URL. Overall result: ✅ improvements - no action neededBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. @bors rollup=never Instruction countThis is the most reliable metric that we have; it was used to determine the overall result at the top of this comment. However, even this metric can sometimes exhibit noise.
Max RSS (memory usage)Results (primary 2.5%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResults (secondary 3.9%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeThis benchmark run did not return any relevant results for this metric. Bootstrap: 774.966s -> 774.408s (-0.07%) |
The cycle / wall-times are worse, but that may just be noise. We can probably just stick with the crates version. |
@rustbot ready |
This optimizes hash map operations in the query system by explicitly passing hashes and using more optimal operations.
find_or_find_insert_slot
in particular saves a hash table lookup overentry
. It's not yet available in a safe API, but will be in rust-lang/hashbrown#466.r? @cjgillot