Refactor git change detection in bootstrap #138591

Kobzol · 2025-03-17T07:58:14Z

While working on #138395, I finally found the courage to delve into the insides of git path change detection in bootstrap, which is used (amongst other things) to detect if we should rebuilt od download [llvm|rustc|gcc]. I found it a bit hard to understand, and given that this code was historically quite fragile, I thought that it would be better to rebuild it from scratch.

The previous approach had a bunch of limitations:

It separated the computation of "are there local changes?" and "what upstream SHA should we use?" even though these two things are intertwined.
It used hacks to work around what happens on CI.
It had special cases for CI scattered throughout the codebase, rather than centralized in one place.
It wasn't documented enough and didn't have tests for the git behavior.

The current approach should hopefully resolve all of that. I implemented a single entrypoint called check_path_modifications (naming bikeshed pending, half of the time I spend on this PR was thinking about names, as it's quite tricky here..) that explicitly receives a mode of operation (in CI or outside CI), and accordingly figures out that upstream SHA that we should use for downloading artifacts and it also figures out if there are any local changes. Users of this function can then use this unified output to implement download-ci-X and other functionality. Notably, this change detection no longer uses git merge-base, which makes it easier to use and doesn't require setting up remotes.

I also added a bunch of integration tests that literally spawn a git repository on disk and then check that the function can deal with various situations (PR CI, auto/try CI, local builds).

After I built this inner layer, I used it for downloading GCC, LLVM and rustc. The latter two (and especially rustc) were using the last_modified_commit function before, but in all cases but one this function was actually only used to check if there are any local changes, which was IMO confusing. The LLVM handling would deserve a bit of refactoring, but that's a larger change that can be done as a follow-up.

I hope that the implementation is now clear and easy to understand, so that in combination with the tests we can have more confidence that it does what we want. I tried to include a lot of documentation in the code, so I won't be repeating the actual implementation details here, if there are any questions, I'll add the answers to the documentation too :)

The new approach explicitly supports three scenarios:

Running on PR CI, where we have one upstream bors parent commit and one PR merge commit made by GitHub.
Running on try/auto CI, where we have one upstream bors parent commit and one PR merge commit made by bors.
Running locally, where we assume that we have at least one upstream bors parent commit in our git history.

I removed the handling of upstreams on CI, as I think that it shouldn't be needed and I considered it to be a hack. However, it's possible that there are other use-cases that I haven't considered, so I want to ask around if people have other situations than the three use-cases described above. If there are other such use-cases, I would like to include them in the new centralized implementation and add them to the git test suite, rather than going back to the old ways :)

In particular, the code before relied on git merge-base, but I don't see why we can't just lookup the most recent bors commit and assume that is a merge commit that is also upstream? I might be running into Chesterton's Fence here :)

CC @pietroalbini To make sure that this won't break downstream users of Rust's CI.

Best reviewed commit by commit.

Companion PRs:

For testing beta: [do not merge] beta test for git change detection (#138591) #138597

r? @onur-ozkan

try-job: x86_64-gnu-aux

rustbot · 2025-03-17T07:58:21Z

This PR changes how GCC is built. Consider updating src/bootstrap/download-ci-gcc-stamp.

These commits modify the Cargo.lock file. Unintentional changes to Cargo.lock can be introduced when switching branches and rebasing PRs.

If this was unintentional then you should revert the changes before this PR is merged.
Otherwise, you can ignore this comment.

This PR changes how LLVM is built. Consider updating src/bootstrap/download-ci-llvm-stamp.

Some changes occurred in src/tools/compiletest

cc @jieyouxu

This PR modifies src/bootstrap/src/core/config.

If appropriate, please update CONFIG_CHANGE_HISTORY in src/bootstrap/src/utils/change_tracker.rs.

pietroalbini · 2025-03-17T09:58:05Z

LGTM on the Ferrocene side. There is nothing here that would break our downstream usage.

On the Rust side, I recommend opening this PR against stable and beta too, and running a full bors try on it. We had issues in past releases where changes to this code would unexpectedly break stable or beta CI, and I'd love for those to be catched before merging.

Kobzol · 2025-03-17T10:01:46Z

Yes, I planned to do that, it's a good idea. Actually, I can try that right away.

[do not merge] beta test for git change detection (rust-lang#138591) Opening to test CI/bootstrap changes. r? `@ghost` try-job: x86_64-gnu-stable try-job: x86_64-gnu try-job: x86_64-gnu-llvm-19-1 try-job: dist-x86_64-linux

[do not merge] beta test for git change detection (rust-lang#138591) Opening to test CI/bootstrap changes from rust-lang#138591. r? `@ghost` try-job: x86_64-gnu-stable try-job: x86_64-gnu try-job: x86_64-gnu-llvm-19-1 try-job: dist-x86_64-linux

onur-ozkan · 2025-03-18T06:30:55Z

The changes look good, but I am not sure if they will break the if-unchanged tests and logic in the following cases:

PR that is supposed to use ci-rustc and ci-llvm
PR that is not supposed to use ci-rustc and ci-llvm
Testing the above cases on both stable and beta PRs

I think it's safer to make sure these won't be a problem before merging this.

[do not merge] beta test for git change detection (rust-lang#138591) Opening to test CI/bootstrap changes from rust-lang#138591. r? `@ghost` try-job: x86_64-gnu-aux

src/build_helper/src/git.rs

[do not merge] beta test for git change detection (rust-lang#138591) Opening to test CI/bootstrap changes from rust-lang#138591. r? `@ghost` try-job: x86_64-gnu-aux

Kobzol · 2025-03-18T11:02:31Z

@bors try

Refactor git change detection in bootstrap While working on rust-lang#138395, I finally found the courage to delve into the insides of git path change detection in bootstrap, which is used (amongst other things) to detect if we should rebuilt od download `[llvm|rustc|gcc]`. I found it a bit hard to understand, and given that this code was historically quite fragile, I thought that it would be better to rebuild it from scratch. The previous approach had a bunch of limitations: - It separated the computation of "are there local changes?" and "what upstream SHA should we use?" even though these two things are intertwined. - It used hacks to work around what happens on CI. - It had special cases for CI scattered throughout the codebase, rather than centralized in one place. - It wasn't documented enough and didn't have tests for the git behavior. The current approach should hopefully resolve all of that. I implemented a single entrypoint called `check_path_modifications` (naming bikeshed pending, half of the time I spend on this PR was thinking about names, as it's quite tricky here..) that explicitly receives a mode of operation (in CI or outside CI), and accordingly figures out that upstream SHA that we should use for downloading artifacts and it also figures out if there are any local changes. Users of this function can then use this unified output to implement `download-ci-X` and other functionality. I also added a bunch of integration tests that literally spawn a git repository on disk and then check that the function can deal with various situations (PR CI, auto/try CI, local builds). The tests are super fast and run in parallel, as they are currently in `build_helper` and not in `bootstrap`. After I built this inner layer, I used it for downloading GCC, LLVM and rustc. The latter two (and especially rustc) were using the `last_modified_commit` function before, but in all cases but one this function was actually only used to check if there are any local changes, which was IMO confusing. The LLVM handling would deserve a bit of refactoring, but that's a larger change that can be done as a follow-up. In the future we could cache the results of `check_path_modifications` to reduce the number of git invocations, but I don't think that it should be excessive even now. I hope that the implementation is now clear and easy to understand, so that in combination with the tests we can have more confidence that it does what we want. I tried to include a lot of documentation in the code, so I won't be repeating the actual implementation details here, if there are any questions, I'll add the answers to the documentation too :) The new approach explicitly supports three scenarios: - Running on PR CI, where we have one upstream bors parent commit and one PR merge commit made by GitHub. - Running on try/auto CI, where we have one upstream bors parent commit and one PR merge commit made by bors. - Running locally, where we assume that we have at least one upstream bors parent commit in our git history. I removed the handling of upstreams on CI, as I think that it shouldn't be needed and I considered it to be a hack. However, it's possible that there are other use-cases that I haven't considered, so I want to ask around if people have other situations than the three use-cases described above. If there are other such use-cases, I would like to include them in the new centralized implementation and add them to the git test suite, rather than going back to the old ways :) In particular, the code before relied on `git merge-base`, but I don't see why we can't just lookup the most recent bors commit and assume that is a merge commit that is also upstream? I might be running into Chesterton's Fence here :) CC `@pietroalbini` To make sure that this won't break downstream users of Rust's CI. Best reviewed commit by commit. Companion PRs: - For testing beta: rust-lang#138597 r? `@onur-ozkan` try-job: x86_64-gnu-aux

bors · 2025-03-18T11:03:44Z

⌛ Trying commit afe1f99 with merge 27ee8fc...

Kobzol · 2025-03-18T12:29:45Z

Did a bunch of follow-up clean-ups. Let me know if you want me to split this into multiple PRs! :)

src/bootstrap/src/utils/tests/git.rs

jieyouxu · 2025-03-24T06:52:47Z

I can co-review this, but not today (probably going to be tmrw or a bit later this week). Still looking at the stage 0 redesign PR.

RalfJung · 2025-03-24T07:10:55Z

Do we have a good idea for why CI needs to be special-cased?

I think part of the reason is that CI has a sparse checkout, which makes walking the git history largely meaningless. So what I think we should do there is just get the latest bors commit irrespective of paths. In fact I think that is already what is happening anyway, it's just more confusing since the code looks like it finds the last commit where the paths changed (but on CI, that's not what happens).

RalfJung

Ah, I think I found the comment where this is discussed, and that comment sounds good. :) I didn't check the code.

src/build_helper/src/git.rs

RalfJung · 2025-03-24T07:14:12Z

src/build_helper/src/git.rs

+/// were not modified upstream in the meantime. In that case we would be redownloading CI
+/// artifacts unnecessarily.
+///
+/// - In CI, we always fetch only a single parent merge commit, so we do not have access


Suggested change

/// - In CI, we always fetch only a single parent merge commit, so we do not have access

/// - In CI, we use a sparse checkout. We fetch only a single parent merge commit, so we do not have access

I'm not sure if we should call it a sparse checkout, tbh. Sparse checkout is a separate git thing that we are not using on CI. What you probably meant is a shallow clone, but that's also not accurate, because we actually checkout the last two commits. I'm not sure if there's a common term for that :)

Ah, "shallow clone" is what I meant, yes. And I would say that is accurate, the clone is shallow. It has depth 1. That's not the same as depth 0, but it's still a shallow clone. "shallow" just means "not the entire history has been fetched", it doesn't mean "depth 0".

Suggested change

/// - In CI, we always fetch only a single parent merge commit, so we do not have access

/// - In CI, we use a shallow clone of depth 1, i.e., we fetch only a single parent commit (which will be the most recent bors merge commit) and do not have access

Ok, fair enough, "shallow" as a term is more general I suppose. This might be a bit confusing if someone takes a look at our CI fetch code, which confusingly uses depth 2 (because depth 0 here means "fetch everything"). But that's a small thing. Used your text to hopefully clarify the comment.

If the clone has depth 2 we should say that, "1" was just a guess on my side. :)

Actually, yes, the clone depth is indeed 2. I thought that shallow clone is --depth=0, but I remembered that wrong.

"2-depth checkout"

…hs have been modified locally Also adds several git tests to make sure that the behavior works in common cases (PR CI, auto CI, local usage).

And get rid of `get_closest_merge_commit`.

It shouldn't really happen, but if it does, at least we will have an explicit record of it.

It shouldn't be needed anymore.

The new git tests should be enough to check this scenario. We should ideally not be creating dummy commits on CI.

…ness` functions

Kobzol · 2025-03-24T08:20:34Z

Rebased and pushed two changes based on review.

It was always called with `Some`, so no need to complicate it with `Option`.

bors · 2025-03-24T11:38:03Z

☔ The latest upstream changes (presumably #138878) made this pull request unmergeable. Please resolve the merge conflicts.

…ozkan Remove unneeded LLVM CI test assertions The `download_ci_llvm` bootstrap test was checking implementation details of the LLVM CI download check, which isn't very useful. It was essentially testing "if function_that_checks_if_llvm_ci_is_available returns true, we enable CI LLVM", but the usage of the function was an implementation detail. After rust-lang#138704, the inner implementation has changed, so the test now breaks if LLVM is updated. I don't think that it's very useful to test implementation details like this, without taking the outside git state into account. Ideally, we should mock the git state for the test, otherwise the test will randomly break when executed in environments which the test does not control (e.g. on CI when a LLVM change happens). I only kept the part of the test that checks that LLVM CI isn't used when we specify `download-ci-llvm = false`, as that should hold under all conditions, CI/local, and all git states. I also kept the `if-unchanged` assertion, but only on CI, and as a temporary measure. After rust-lang#138591, we should have a proper way of mocking the git state to make the test robust, and make it test what we actually want. Fixes [this](rust-lang#138784 (comment)). r? `@ghost`

Rollup merge of rust-lang#139015 - Kobzol:llvm-ci-test-fixes, r=onur-ozkan Remove unneeded LLVM CI test assertions The `download_ci_llvm` bootstrap test was checking implementation details of the LLVM CI download check, which isn't very useful. It was essentially testing "if function_that_checks_if_llvm_ci_is_available returns true, we enable CI LLVM", but the usage of the function was an implementation detail. After rust-lang#138704, the inner implementation has changed, so the test now breaks if LLVM is updated. I don't think that it's very useful to test implementation details like this, without taking the outside git state into account. Ideally, we should mock the git state for the test, otherwise the test will randomly break when executed in environments which the test does not control (e.g. on CI when a LLVM change happens). I only kept the part of the test that checks that LLVM CI isn't used when we specify `download-ci-llvm = false`, as that should hold under all conditions, CI/local, and all git states. I also kept the `if-unchanged` assertion, but only on CI, and as a temporary measure. After rust-lang#138591, we should have a proper way of mocking the git state to make the test robust, and make it test what we actually want. Fixes [this](rust-lang#138784 (comment)). r? `@ghost`

rustbot assigned onur-ozkan Mar 17, 2025

rustbot added A-compiletest A-testsuite S-waiting-on-review T-bootstrap T-infra labels Mar 17, 2025

rustbot added the T-release label Mar 17, 2025

Kobzol force-pushed the git-ci branch from 4b7c63b to 0f258ca Compare March 17, 2025 09:45

Kobzol mentioned this pull request Mar 17, 2025

[do not merge] beta test for git change detection (#138591) #138597

Draft

jieyouxu self-assigned this Mar 17, 2025

jieyouxu removed their assignment Mar 18, 2025

Kobzol commented Mar 18, 2025

View reviewed changes

src/build_helper/src/git.rs Outdated Show resolved Hide resolved

Kobzol force-pushed the git-ci branch from 0f258ca to afe1f99 Compare March 18, 2025 09:58

Kobzol force-pushed the git-ci branch from afe1f99 to 1fc0921 Compare March 18, 2025 12:29

Kobzol force-pushed the git-ci branch 2 times, most recently from 63be8ba to e1fe7f2 Compare March 19, 2025 10:07

onur-ozkan reviewed Mar 19, 2025

View reviewed changes

src/bootstrap/src/utils/tests/git.rs Show resolved Hide resolved

rustbot unassigned onur-ozkan Mar 24, 2025

RalfJung reviewed Mar 24, 2025

View reviewed changes

Kobzol added 12 commits March 24, 2025 09:09

Implement a new unified function for figuring out how if a set of pat…

fe57c19

…hs have been modified locally Also adds several git tests to make sure that the behavior works in common cases (PR CI, auto CI, local usage).

Use check_path_modifications for detecting local GCC changes

65a2510

Use check_path_modifications for detecting local LLVM changes

d6231f7

Use check_path_modifications for detecting local rustc changes

50e4238

And get rid of `get_closest_merge_commit`.

Explicitly model missing upstream

916c995

It shouldn't really happen, but if it does, at least we will have an explicit record of it.

Unify usages of path modifications and log them in verbose mode

7de708b

Cache result of check_path_modifications

357a659

Remove setup-upstream-remote.sh and upstream handling.

aa4287c

It shouldn't be needed anymore.

Remove the add_dummy_commit.sh hack

4ac4b74

The new git tests should be enough to check this scenario. We should ideally not be creating dummy commits on CI.

Move freshness test to bootstrap

dfddc3b

Extend ci_rustc_if_unchanged tests

153dcb4

Return PathFreshness::MissingUpstream from `detect_[gcc|llvm]_fresh…

6303b05

…ness` functions

Kobzol force-pushed the git-ci branch from 00e8611 to 4f88a4d Compare March 24, 2025 08:20

This comment has been minimized.

Sign in to view

Kobzol force-pushed the git-ci branch 3 times, most recently from 95b28c8 to c52eefc Compare March 24, 2025 09:11

This comment has been minimized.

Sign in to view

Kobzol added 2 commits March 24, 2025 11:14

Make git_dir required in several git functions

eba0e71

It was always called with `Some`, so no need to complicate it with `Option`.

Clarify comment

55caae4

Kobzol force-pushed the git-ci branch from c52eefc to 55caae4 Compare March 24, 2025 10:14

Kobzol mentioned this pull request Mar 27, 2025

Remove unneeded LLVM CI test assertions #139015

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor git change detection in bootstrap #138591

Refactor git change detection in bootstrap #138591

Kobzol commented Mar 17, 2025 •

edited

Loading

rustbot commented Mar 17, 2025

pietroalbini commented Mar 17, 2025

Kobzol commented Mar 17, 2025

onur-ozkan commented Mar 18, 2025

Kobzol commented Mar 18, 2025

bors commented Mar 18, 2025

Kobzol commented Mar 18, 2025

jieyouxu commented Mar 24, 2025

RalfJung commented Mar 24, 2025 •

edited

Loading

RalfJung left a comment

RalfJung Mar 24, 2025

Kobzol Mar 24, 2025

RalfJung Mar 24, 2025 •

edited

Loading

Kobzol Mar 24, 2025

RalfJung Mar 24, 2025

Kobzol Mar 24, 2025

jieyouxu Mar 24, 2025

Kobzol commented Mar 24, 2025

This comment has been minimized.

This comment has been minimized.

bors commented Mar 24, 2025

	/// - In CI, we always fetch only a single parent merge commit, so we do not have access
	/// - In CI, we use a sparse checkout. We fetch only a single parent merge commit, so we do not have access

	/// - In CI, we always fetch only a single parent merge commit, so we do not have access
	/// - In CI, we use a shallow clone of depth 1, i.e., we fetch only a single parent commit (which will be the most recent bors merge commit) and do not have access

Refactor git change detection in bootstrap #138591

Are you sure you want to change the base?

Refactor git change detection in bootstrap #138591

Conversation

Kobzol commented Mar 17, 2025 • edited Loading

rustbot commented Mar 17, 2025

pietroalbini commented Mar 17, 2025

Kobzol commented Mar 17, 2025

onur-ozkan commented Mar 18, 2025

Kobzol commented Mar 18, 2025

bors commented Mar 18, 2025

Kobzol commented Mar 18, 2025

jieyouxu commented Mar 24, 2025

RalfJung commented Mar 24, 2025 • edited Loading

RalfJung left a comment

Choose a reason for hiding this comment

RalfJung Mar 24, 2025

Choose a reason for hiding this comment

Kobzol Mar 24, 2025

Choose a reason for hiding this comment

RalfJung Mar 24, 2025 • edited Loading

Choose a reason for hiding this comment

Kobzol Mar 24, 2025

Choose a reason for hiding this comment

RalfJung Mar 24, 2025

Choose a reason for hiding this comment

Kobzol Mar 24, 2025

Choose a reason for hiding this comment

jieyouxu Mar 24, 2025

Choose a reason for hiding this comment

Kobzol commented Mar 24, 2025

This comment has been minimized.

This comment has been minimized.

bors commented Mar 24, 2025

Kobzol commented Mar 17, 2025 •

edited

Loading

RalfJung commented Mar 24, 2025 •

edited

Loading

RalfJung Mar 24, 2025 •

edited

Loading