Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid no-op unlink+link dances in incr comp #128320

Merged
merged 2 commits into from
Mar 22, 2025
Merged

Conversation

saethlin
Copy link
Member

@saethlin saethlin commented Jul 28, 2024

Incremental compilation scales quite poorly with the number of CGUs. This PR improves one reason for that.

The incr comp process hard-links all the files from an old session into a new one, then it runs the backend, which may just hard-link the new session files into the output directory. Then codegen hard-links all the output files back to the new session directory.

This PR (perhaps unimaginatively) fixes the silliness that ensues in the last step. The old link_or_copy implementation would be passed pairs of paths which are already the same inode, then it would blindly delete the destination and re-create the hard-link that it just deleted. This PR lets us skip both those operations. We don't skip the other two hard-links.

cargo +stage1 b && touch crates/core/main.rs && strace -cfw -elink,linkat,unlink,unlinkat cargo +stage1 b before and then after on ripgrep-13.0.0:

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 52.56    0.024950          25       978       485 unlink
 34.38    0.016318          22       727           linkat
 13.06    0.006200          24       249           unlinkat
------ ----------- ----------- --------- --------- ----------------
100.00    0.047467          24      1954       485 total
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 42.83    0.014521          57       252           unlink
 38.41    0.013021          26       486           linkat
 18.77    0.006362          25       249           unlinkat
------ ----------- ----------- --------- --------- ----------------
100.00    0.033904          34       987           total

This reduces the number of hard-links that are causing perf troubles, noted in #64291 and #137560

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jul 28, 2024
@saethlin
Copy link
Member Author

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jul 28, 2024
@bors
Copy link
Contributor

bors commented Jul 28, 2024

⌛ Trying commit 223603c with merge 0a89cd2...

bors added a commit to rust-lang-ci/rust that referenced this pull request Jul 28, 2024
Avoid no-op unlink+link dances in incr comp

Incremental compilation scales quite poorly with the number of CGUs. This PR improves one reason for that.

The incr comp process hard-links all the files from an old session into a new one, then it runs the backend, which may just hard-link the new session files into the output directory. Then codegen hard-links all the output files back to the new session directory.

This PR (perhaps unimaginatively) fixes the silliness that ensues in the last step. The old `link_or_copy` implementation would be passed pairs of paths which are already the same inode, then it would blindly delete the destination and re-create the hard-link that it just deleted. This PR lets us skip both those operations. We don't skip the other two hard-links.

`cargo +stage1 b && touch crates/core/main.rs && strace -cfw -elink,linkat,unlink,unlinkat cargo +stage1 b` before and then after on `ripgrep-13.0.0`:
```
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 52.56    0.024950          25       978       485 unlink
 34.38    0.016318          22       727           linkat
 13.06    0.006200          24       249           unlinkat
------ ----------- ----------- --------- --------- ----------------
100.00    0.047467          24      1954       485 total
```
```
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 42.83    0.014521          57       252           unlink
 38.41    0.013021          26       486           linkat
 18.77    0.006362          25       249           unlinkat
------ ----------- ----------- --------- --------- ----------------
100.00    0.033904          34       987           total
```

r? `@ghost`
@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@bors
Copy link
Contributor

bors commented Jul 28, 2024

💔 Test failed - checks-actions

@bors bors added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jul 28, 2024
@saethlin
Copy link
Member Author

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@bors
Copy link
Contributor

bors commented Jul 28, 2024

⌛ Trying commit 26319ff with merge a712ff3...

bors added a commit to rust-lang-ci/rust that referenced this pull request Jul 28, 2024
Avoid no-op unlink+link dances in incr comp

Incremental compilation scales quite poorly with the number of CGUs. This PR improves one reason for that.

The incr comp process hard-links all the files from an old session into a new one, then it runs the backend, which may just hard-link the new session files into the output directory. Then codegen hard-links all the output files back to the new session directory.

This PR (perhaps unimaginatively) fixes the silliness that ensues in the last step. The old `link_or_copy` implementation would be passed pairs of paths which are already the same inode, then it would blindly delete the destination and re-create the hard-link that it just deleted. This PR lets us skip both those operations. We don't skip the other two hard-links.

`cargo +stage1 b && touch crates/core/main.rs && strace -cfw -elink,linkat,unlink,unlinkat cargo +stage1 b` before and then after on `ripgrep-13.0.0`:
```
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 52.56    0.024950          25       978       485 unlink
 34.38    0.016318          22       727           linkat
 13.06    0.006200          24       249           unlinkat
------ ----------- ----------- --------- --------- ----------------
100.00    0.047467          24      1954       485 total
```
```
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 42.83    0.014521          57       252           unlink
 38.41    0.013021          26       486           linkat
 18.77    0.006362          25       249           unlinkat
------ ----------- ----------- --------- --------- ----------------
100.00    0.033904          34       987           total
```

r? `@ghost`
@bors
Copy link
Contributor

bors commented Jul 29, 2024

☀️ Try build successful - checks-actions
Build commit: a712ff3 (a712ff31daae44e2f087c73598a2922156aae429)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (a712ff3): comparison URL.

Overall result: no relevant changes - no action needed

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

This benchmark run did not return any relevant results for this metric.

Max RSS (memory usage)

This benchmark run did not return any relevant results for this metric.

Cycles

Results (primary 1.0%, secondary -3.7%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
1.0% [1.0%, 1.0%] 1
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-3.7% [-3.7%, -3.7%] 1
All ❌✅ (primary) 1.0% [1.0%, 1.0%] 1

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 767.825s -> 768.215s (0.05%)
Artifact size: 331.40 MiB -> 331.36 MiB (-0.01%)

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jul 29, 2024
@saethlin
Copy link
Member Author

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Nov 22, 2024
@saethlin saethlin closed this Nov 22, 2024
@saethlin saethlin reopened this Nov 22, 2024
@saethlin
Copy link
Member Author

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@bors
Copy link
Contributor

bors commented Nov 22, 2024

⌛ Trying commit 6f5792c with merge 3b0f3ea...

@rustbot
Copy link
Collaborator

rustbot commented Feb 24, 2025

Some changes occurred in compiler/rustc_codegen_cranelift

cc @bjorn3

@saethlin
Copy link
Member Author

r? compiler

@the8472
Copy link
Member

the8472 commented Feb 25, 2025

Another option would be comparing the dev/ino (or the windows equivalents) of the source and destination file and skip the linking if they're already the same.

@saethlin
Copy link
Member Author

The implementation in this PR strictly reduces the amount we interact with the filesystem by using information that's already trivially available to the compiler. Apart from redesigning the incremental compilation CGU reuse system, I don't see any alternative to this optimization that can compete with it for efficiency.

It might be good to also not re-hardlink files that are already hardlinked, but on Linux I'm not sure that is an improvement.

@the8472
Copy link
Member

the8472 commented Feb 25, 2025

Yeah I was thinking of the linked issues that were about hardlink slowness on macos and windows.

@saethlin
Copy link
Member Author

Somebody drive-by reviewed this and made a suggestion that seems to work but now I don't even know how to respond to them.

Please. Do not delete your own comments.

@@ -487,6 +503,7 @@ fn reuse_workproduct_for_cgu(
bytecode: None,
assembly: None,
llvm_ir: None,
links_from_incr_cache: links_from_incr_cache.clone(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think both CompiledModule should use a separate links_from_incr_cache.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh 🤦 I see now

@rust-log-analyzer

This comment has been minimized.

@bjorn3
Copy link
Member

bjorn3 commented Feb 27, 2025

cg_clif changes LGTM

@compiler-errors
Copy link
Member

@bors r+

@bors
Copy link
Contributor

bors commented Mar 21, 2025

📌 Commit e21502c has been approved by compiler-errors

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Mar 21, 2025
@bors
Copy link
Contributor

bors commented Mar 21, 2025

⌛ Testing commit e21502c with merge 48b36c9...

@bors
Copy link
Contributor

bors commented Mar 22, 2025

☀️ Test successful - checks-actions
Approved by: compiler-errors
Pushing 48b36c9 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Mar 22, 2025
@bors bors merged commit 48b36c9 into rust-lang:master Mar 22, 2025
7 checks passed
@rustbot rustbot added this to the 1.87.0 milestone Mar 22, 2025
Copy link

What is this? This is an experimental post-merge analysis report that shows differences in test outcomes between the merged PR and its parent PR.

Comparing be73c1f (parent) -> 48b36c9 (this PR)

Test differences

No test diffs found

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (48b36c9): comparison URL.

Overall result: no relevant changes - no action needed

@rustbot label: -perf-regression

Instruction count

This benchmark run did not return any relevant results for this metric.

Max RSS (memory usage)

Results (primary -7.5%, secondary 3.9%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
3.9% [3.9%, 3.9%] 1
Improvements ✅
(primary)
-7.5% [-7.5%, -7.5%] 1
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -7.5% [-7.5%, -7.5%] 1

Cycles

Results (secondary 7.1%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
7.1% [3.1%, 9.7%] 3
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) - - 0

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 775.146s -> 775.347s (0.03%)
Artifact size: 365.54 MiB -> 365.55 MiB (0.00%)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants