Change one `FxHashMap` to `FxIndexMap` in librustdoc #138871

smoelius · 2025-03-24T00:12:43Z

This PR changes one FxHashMap to FxIndexMap in librustdoc's cache and adds a comment explaining why (i.e., to promote reproducibility across operating systems).

I have been trying to understand why the following command produces different results on Linux and MacOS when run on a project whose lib.rs contains only #![no_std]:

cargo rustdoc --target x86_64-unknown-linux-gnu -Zbuild-std -- -Z unstable-options --output-format=json

I obtained a partial answer in that elements were being read out of this FxHashMap in different orders, even when the elements were inserted in the same order:

rust/src/librustdoc/formats/cache.rs

Line 50 in aa8f0fd

pub(crate) external_paths: FxHashMap<DefId, (Vec<Symbol>, ItemType)>,

I opened an issue on the Hashbrown repo to see whether this was a bug. The response I got was that if one wants the elements to be read out in the same order, one should use something like FxIndexMap.

cc: @aDotInTheVoid

TODO

Change index field (Change one FxHashMap to FxIndexMap in librustdoc #138871 (comment))

rustbot · 2025-03-24T00:12:48Z

r? @notriddle

rustbot has assigned @notriddle.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

aDotInTheVoid · 2025-03-24T23:44:59Z

r? @aDotInTheVoid

Woah, massive congratz on tracking this down.

Is this enough to guarantee reproducible output between hosts? Relying on FxHashMap iteration order seems like something that'd be common across rustc/rustdoc that it'd get us in other places (e.g. it's used for the index field itself here:

rust/src/librustdoc/json/mod.rs

Line 147 in f8c27df

index: Rc::new(RefCell::new(FxHashMap::default())),

)

If reproducible output between hosts is a thing we want to guarantee (and you want to rely on), this defiantly needs a test. This should be #![no_core] so it doesn't change whenever core changes.

What's your use-case for reproducibility across operating systems?

Once again, thanks for taking all the time to debug this.

smoelius · 2025-03-25T00:34:54Z

Is this enough to guarantee reproducible output between hosts?

I honestly don't know. For the cases I've tried (a lib.rs with just #![no_std] and the Rust standard library), the answer seems to be "yes". But there could be code paths I haven't executed, and thus bugs I don't know about.

Relying on FxHashMap iteration order seems like something that'd be common across rustc/rustdoc that it'd get us in other places (e.g. it's used for the index field itself here:

rust/src/librustdoc/json/mod.rs

Line 147 in f8c27df

index: Rc::new(RefCell::new(FxHashMap::default())),

)

I did a quick search and I couldn't find anywhere that field was iterated over. In other words, it may not matter whether elements are read out of that structure in the same order or not.

To elaborate, here is where elements are read out of the external_paths field:

rust/src/librustdoc/json/mod.rs

Line 261 in aa8f0fd

.chain(&self.cache.external_paths)

If elements are read out in different orders on different operating systems, they will be assigned different ids.

I couldn't immediately find anything similar for index.

If reproducible output between hosts is a thing we want to guarantee (and you want to rely on), this defiantly needs a test.

Something like the following?

A simple, "expected" JSON file is stored in the repo.
Rustdoc is run and its result is compared to that JSON file.

This should be #![no_core] so it doesn't change whenever core changes.

Oooooh... So everything that appears in the JSON file comes from definitions in the lib.rs. Is that the idea?

What's your use-case for reproducibility across operating systems?

I have a library that generates wrappers for standard library functions. One of its correctness checks is to generate a JSON file with rustdoc and compare it to a file that was already generated. Currently, the test passes only when the JSON file is generated, and the test is run, on the same operating system. Merging this will allow that restriction to be lifted.

Once again, thanks for taking all the time to debug this.

🙏

aDotInTheVoid · 2025-03-25T20:19:36Z

I did a quick search and I couldn't find anywhere that field was iterated over.

Eventually that becomes part of the output here:

rust/src/librustdoc/json/mod.rs

Lines 252 to 256 in f8c27df

    
           let output_crate = types::Crate { 
        
               root: self.id_from_item_default(e.def_id().into()), 
        
               crate_version: self.cache.crate_version.clone(), 
        
               includes_private: self.cache.document_private, 
        
               index,

rust/src/librustdoc/json/mod.rs

Lines 300 to 301 in f8c27df

    
           self.serialize_and_write( 
        
               output_crate,

rust/src/librustdoc/json/mod.rs

Lines 109 to 117 in f8c27df

    
           fn serialize_and_write<T: Write>( 
        
               &self, 
        
               output_crate: types::Crate, 
        
               mut writer: BufWriter<T>, 
        
               path: &str, 
        
           ) -> Result<(), Error> { 
        
               self.sess().time("rustdoc_json_serialize_and_write", || { 
        
                   try_err!( 
        
                       serde_json::ser::to_writer(&mut writer, &output_crate).map_err(|e| e.to_string()),

so it'd get iterated over to convert to json by serde.

A simple, "expected" JSON file is stored in the repo.

Rustdoc is run and its result is compared to that JSON file.

Yeah, but that'd mean needing to update that checked-in file whenever anything about the json output changes, and that's a pretty high mainenence burden that i'd like to avoid if at all possible. I'm happy to pay the cost now to get a test in place, but I'd like to then be able to forget it exists until it catches actuall breakage (rather than a json format change that's still reproducible).

Oooooh... So everything that appears in the JSON file comes from definitions in the lib.rs. Is that the idea?

Almost. The test would probably need to be split across a few input files/crates given that this is external_paths which has items from other crates. But if we can avoid saying anything about the contents of the JSON (only that it's the same) than it's fine to have it change with core.

smoelius · 2025-03-25T21:38:14Z

Eventually that becomes part of the output here: ... so it'd get iterated over to convert to json by serde.

👍

Yeah, but that'd mean needing to update that checked-in file whenever anything about the json output changes, and that's a pretty high mainenence burden that i'd like to avoid if at all possible.

That makes sense to me.

I'm happy to pay the cost now to get a test in place, but I'd like to then be able to forget it exists until it catches actuall breakage (rather than a json format change that's still reproducible).

But then what sort of a test do you imagine?

Here is what I am struggling with. The test will run on some platform X. The test will produce some JSON and verify that it would be the same if it were produced on platform Y. The only way I can think to do that is to have a pre-generated JSON that the test compares against.

Sorry if I am being dense, but is there another way?

aDotInTheVoid · 2025-03-25T23:05:48Z

The only way I can think to do that is to have a pre-generated JSON that the test compares against.

Ideally we'd """""just""""" compare the json on one host with the json on another. But I don't think our CI lets us have results "cross over" between runners like that.

CC @jieyouxu, any ideas here?

aDotInTheVoid · 2025-03-25T23:06:00Z

@bors try @rust-timer queue

bors · 2025-03-25T23:07:12Z

⌛ Trying commit fb97d2f with merge 10b0e2d102a660fa773d90e67f8249911a5a205a...

…<try> Change one `FxHashMap` to `FxIndexMap` in librustdoc This PR changes one `FxHashMap` to `FxIndexMap` in librustdoc's cache and adds a comment explaining why (i.e., to promote reproducibility across operating systems). I have been [trying to understand](rust-lang/rustdoc-types#44) why the following command produces different results on Linux and MacOS when run on a project whose lib.rs contains only `#![no_std]`: ```sh cargo rustdoc --target x86_64-unknown-linux-gnu -Zbuild-std -- -Z unstable-options --output-format=json ``` I obtained a partial answer in that elements were being read out of this `FxHashMap` in different orders, even when the elements were inserted in the same order: https://github.com/rust-lang/rust/blob/aa8f0fd7163a2f23aa958faed30c9c2b77b934a5/src/librustdoc/formats/cache.rs#L50 I [opened an issue](rust-lang/hashbrown#612) on the Hashbrown repo to see whether this was a bug. The response I got was that if one wants the elements to be read out in the same order, one should use something like `FxIndexMap`. cc: `@aDotInTheVoid`

bors · 2025-03-26T01:15:17Z

☀️ Try build successful - checks-actions
Build commit: 10b0e2d (10b0e2d102a660fa773d90e67f8249911a5a205a)

rust-timer · 2025-03-26T03:03:32Z

Finished benchmarking commit (10b0e2d): comparison URL.

Overall result: ❌ regressions - no action needed

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

This is the most reliable metric that we have; it was used to determine the overall result at the top of this comment. However, even this metric can sometimes exhibit noise.

	mean	range	count
Regressions ❌ (primary)	0.1%	[0.1%, 0.1%]	1
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	0.1%	[0.1%, 0.1%]	1

Max RSS (memory usage)

Results (primary -2.0%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-2.0%	[-2.0%, -2.0%]	1
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-2.0%	[-2.0%, -2.0%]	1

Cycles

Results (primary -2.8%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-2.8%	[-2.8%, -2.8%]	1
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-2.8%	[-2.8%, -2.8%]	1

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 777.618s -> 777.935s (0.04%)
Artifact size: 365.81 MiB -> 365.85 MiB (0.01%)

jieyouxu · 2025-03-26T06:48:27Z

I'll take a look at this in a bit.

jieyouxu · 2025-03-26T10:12:20Z

Relying on FxHashMap iteration order seems like something that'd be common across rustc/rustdoc that it'd get us in other places

And you would be right. Neither rustc nor rustdoc (nor anything really) may rely on FxHashMap's iteration order if determinism is important, because that's intentionally not-a-guarantee. rustc has an internal lint called rustc::potential_query_instability

rust/compiler/rustc_lint/src/internal.rs

Lines 78 to 89 in 6e8abb5

    
           declare_tool_lint! { 
        
               /// The `potential_query_instability` lint detects use of methods which can lead to 
        
               /// potential query instability, such as iterating over a `HashMap`. 
        
               /// 
        
               /// Due to the [incremental compilation](https://rustc-dev-guide.rust-lang.org/queries/incremental-compilation.html) model, 
        
               /// queries must return deterministic, stable results. `HashMap` iteration order can change 
        
               /// between compilations, and will introduce instability if query results expose the order. 
        
               pub rustc::POTENTIAL_QUERY_INSTABILITY, 
        
               Allow, 
        
               "require explicit opt-in when using potentially unstable methods or functions", 
        
               report_in_external_macro: true 
        
           }

which flags attempts to rely on iteration order of hash maps.

jieyouxu · 2025-03-26T10:17:17Z

The only way I can think to do that is to have a pre-generated JSON that the test compares against.

Ideally we'd """""just""""" compare the json on one host with the json on another. But I don't think our CI lets us have results "cross over" between runners like that.

That's not currently a thing that the CI jobs can support, unfortunately.

Also, are jsons between different hosts necessarily identical?

smoelius · 2025-03-26T10:47:10Z

Also, are jsons between different hosts necessarily identical?

To the best of my knowledge, if the JSON files differ, the programs that consume them can still function correctly.

More specifically, the differences we're talking about involve the assignment of ids, which are largely arbitrary anyway.

What's weird though is when you run rustdoc with something like --target x86_64-unknown-linux-gnu (see rust-lang/rustdoc-types#44). Right now, you may get slightly different JSON files depending on whether you run the command on Linux or MacOS, say. This comes from nuances in how Hashbrown is implemented on Linux and MacOS.

If we're willing to live with those minor differences, then no change to librustdoc is needed. But if it's "this should be fixed"-level weird, then something like this PR is needed.

jieyouxu · 2025-03-26T10:53:32Z

Yeah, that's what I'm alluding to, because different host platforms will necessarily have different implementation details. IMO, I don't think we can really test that the json is being consistent across different host platforms unless there's a platform-independent way to influence this specific external_paths content and iteration order.

smoelius · 2025-03-26T11:48:26Z

I ran into the potential_query_instability lint w.r.t. HashMap::keys while testing. I see that HashMap::iter also has the rustc_lint_query_instability attribute, so I am curious why it is not triggering in librustdoc/json/mod.rs:

rust/library/std/src/collections/hash/map.rs

Lines 545 to 549 in 65899c0

    
               #[rustc_lint_query_instability] 
        
               #[stable(feature = "rust1", since = "1.0.0")] 
        
               pub fn iter(&self) -> Iter<'_, K, V> { 
        
                   Iter { base: self.base.iter() } 
        
               }

I'll do some digging and reply. Also, I'll mark this PR as draft for now.

jieyouxu · 2025-03-26T11:49:48Z

@smoelius it's because those are rustc::* internal lints, i.e. they only work against compiler/* code. (Whereas the code you are modifying here is rustdoc code :D)

smoelius · 2025-03-26T11:51:31Z

@smoelius it's because those are rustc::* internal lints, i.e. they only work against compiler/* code. (Whereas the code you are modifying here is rustdoc code :D)

Oh. Is there a way to enable that lint for rustdoc code?

EDIT: If the answer's not obvious, I can dig.

jieyouxu · 2025-03-26T11:54:12Z

Possibly, but I would not recommend it. Those are intentionally rustc-internal lints because they lint rustc code and serve compiler needs, not rustdoc-internal lints.

Change one FxHashMap to FxIndexMap in librustdoc

fb97d2f

rustbot assigned notriddle Mar 24, 2025

rustbot added S-waiting-on-review T-rustdoc labels Mar 24, 2025

rustbot assigned aDotInTheVoid and unassigned notriddle Mar 24, 2025

aDotInTheVoid added A-rustdoc-json A-reproducibility labels Mar 24, 2025

This comment has been minimized.

Sign in to view

rustbot added the S-waiting-on-perf label Mar 25, 2025

This comment has been minimized.

Sign in to view

rustbot removed the S-waiting-on-perf label Mar 26, 2025

jieyouxu self-assigned this Mar 26, 2025

smoelius marked this pull request as draft March 26, 2025 11:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change one `FxHashMap` to `FxIndexMap` in librustdoc #138871

Change one `FxHashMap` to `FxIndexMap` in librustdoc #138871

smoelius commented Mar 24, 2025 •

edited

Loading

rustbot commented Mar 24, 2025

aDotInTheVoid commented Mar 24, 2025

smoelius commented Mar 25, 2025

aDotInTheVoid commented Mar 25, 2025

smoelius commented Mar 25, 2025

aDotInTheVoid commented Mar 25, 2025

aDotInTheVoid commented Mar 25, 2025

This comment has been minimized.

bors commented Mar 25, 2025

bors commented Mar 26, 2025

This comment has been minimized.

rust-timer commented Mar 26, 2025

jieyouxu commented Mar 26, 2025

jieyouxu commented Mar 26, 2025 •

edited

Loading

jieyouxu commented Mar 26, 2025 •

edited

Loading

smoelius commented Mar 26, 2025

jieyouxu commented Mar 26, 2025 •

edited

Loading

smoelius commented Mar 26, 2025

jieyouxu commented Mar 26, 2025 •

edited

Loading

smoelius commented Mar 26, 2025 •

edited

Loading

jieyouxu commented Mar 26, 2025 •

edited

Loading

Change one FxHashMap to FxIndexMap in librustdoc #138871

Are you sure you want to change the base?

Change one FxHashMap to FxIndexMap in librustdoc #138871

Conversation

smoelius commented Mar 24, 2025 • edited Loading

rustbot commented Mar 24, 2025

aDotInTheVoid commented Mar 24, 2025

smoelius commented Mar 25, 2025

aDotInTheVoid commented Mar 25, 2025

smoelius commented Mar 25, 2025

aDotInTheVoid commented Mar 25, 2025

aDotInTheVoid commented Mar 25, 2025

This comment has been minimized.

bors commented Mar 25, 2025

bors commented Mar 26, 2025

This comment has been minimized.

rust-timer commented Mar 26, 2025

Overall result: ❌ regressions - no action needed

Instruction count

Max RSS (memory usage)

Cycles

Binary size

jieyouxu commented Mar 26, 2025

jieyouxu commented Mar 26, 2025 • edited Loading

jieyouxu commented Mar 26, 2025 • edited Loading

smoelius commented Mar 26, 2025

jieyouxu commented Mar 26, 2025 • edited Loading

smoelius commented Mar 26, 2025

jieyouxu commented Mar 26, 2025 • edited Loading

smoelius commented Mar 26, 2025 • edited Loading

jieyouxu commented Mar 26, 2025 • edited Loading

Change one `FxHashMap` to `FxIndexMap` in librustdoc #138871

Change one `FxHashMap` to `FxIndexMap` in librustdoc #138871

smoelius commented Mar 24, 2025 •

edited

Loading

jieyouxu commented Mar 26, 2025 •

edited

Loading

jieyouxu commented Mar 26, 2025 •

edited

Loading

jieyouxu commented Mar 26, 2025 •

edited

Loading

jieyouxu commented Mar 26, 2025 •

edited

Loading

smoelius commented Mar 26, 2025 •

edited

Loading

jieyouxu commented Mar 26, 2025 •

edited

Loading