Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking Issue for secure random data generation in std #130703

Open
2 of 4 tasks
joboet opened this issue Sep 22, 2024 · 131 comments
Open
2 of 4 tasks

Tracking Issue for secure random data generation in std #130703

joboet opened this issue Sep 22, 2024 · 131 comments
Labels
C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC I-libs-api-nominated Nominated for discussion during a libs-api team meeting. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@joboet
Copy link
Member

joboet commented Sep 22, 2024

Feature gate: #![feature(random)]

This is a tracking issue for secure random data generation support in std.

Central to this feature are the Random and RandomSource traits inside core::random. The Random trait defines a method to create a new random value of the implementing type from random bytes generated by a RandomSource. std also exposes the platform's secure random number generator via the DefaultRandomSource type which can be conveniently access via the random::random function.

Public API

// core::random

pub trait RandomSource {
    fn fill_bytes(&mut self, bytes: &mut [u8]);
}

pub trait Random {
    fn random(source: &mut (impl RandomSource + ?Sized)) -> Self;
}

impl Random for bool { ... }
impl Random for /* all integer types */ { ... }

// std::random (additionally)

pub struct DefaultRandomSource;

impl RandomSource for DefaultRandomSource { ... }

pub fn random<T: Random>() -> T { ... }

Steps / History

Unresolved Questions

Footnotes

  1. https://std-dev-guide.rust-lang.org/feature-lifecycle/stabilization.html

@joboet joboet added C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Sep 22, 2024
@newpavlov
Copy link
Contributor

newpavlov commented Nov 29, 2024

Disclaimer: I am one of the getrandom developers.

I think it's important for RandomSource methods to properly return potential errors. Getting randomness is an IO operation and it may fail. In some context it's important to process such errors instead of panicking. The error may be either io::Error or something like getrandom::Error (i.e. a thin wrapper around NonZeroU32).

It may be worth to add the following methods to RandomSource:

  • fill_bytes which works with uninitialized buffers, e.g. based on BorrowedBuf. Yes, zeroization of buffers is usually a very small cost compared to a syscall, but it still goes against the zero-cost spirit.
  • Generation of u32 and u64. Some platforms support direct generation of such values (e.g. RDRAND, WASI, etc.). Going through fill_bytes will be a bit less efficient in such cases.
  • Methods for potentially "insecure" generation of random values, but which are less prone to blocking. The HashMap seeding is the most obvious use-case for this.

It's also not clear whether it's allowed to overwrite the default RandomSource supplied with std similarly to GlobalAlloc.

@joboet
Copy link
Member Author

joboet commented Nov 29, 2024

Would you consider rust-lang/libs-team#159 to be a better solution? That one used the Read trait to fulfil everything you mention.

@newpavlov
Copy link
Contributor

No, I don't think it's an appropriate solution. Firstly, it relies on io::Error, while IIUC you intend for this API to be available in core. Secondly, it does not provide methods for generation of u32 and u64. As I wrote, going through the byte interface is not always efficient. Finally, most of io::Read methods are not relevant here.

For the last point I guess we could define a separate DefaultInsecureRandomSource type.

@bstrie
Copy link
Contributor

bstrie commented Nov 29, 2024

Firstly, it relies on io::Error, while IIUC you intend for this API to be available in core.

I don't think this needs to be a blocker. IMO a lot of std::io should be moved into core--not the OS-specific implementations obviously, but all the cross-platform things like type definitions, same as what happened with core::net.

@wwylele
Copy link
Contributor

wwylele commented Nov 29, 2024

fn random(source: &mut (impl RandomSource + ?Sized)) -> Self;

Would it be better to make this explicit with generics? Like fn random<R: RandomSource + ?Sized>(source: &mut R) -> Self;. This gives user the ability to specify the type name when needed.

@ericlagergren
Copy link

I think it's important for RandomSource methods to properly return potential errors.

I (mostly) disagree. CSPRNGs should almost never fail. When they do, users are almost never not qualified to diagnose the problem.

For example: golang/go#66821

A compromise is something like this:

trait RandomSource {
    type Error;
    fn fill_bytes(...) {
        self.try_fill_bytes(...).unwrap();
    }
    fn try_fill_bytes(...) -> Result<..., Self::Error>
}

This allows most CSPRNGs to use Error = Infallible, but still has support for weird HSMs, etc.

@newpavlov
Copy link
Contributor

@ericlagergren
I've assumed this trait is for a "system" RNG, which will work together with a #[global_allocator]-like way to register implementation. I don't think that we need a general RNG trait in std/core as I wrote in this comment.

As for design of fallible RNG traits, see the new rand_core crate.

@dhardy
Copy link
Contributor

dhardy commented Nov 30, 2024

pub trait Random {
    fn random(source: &mut (impl RandomSource + ?Sized)) -> Self;
}

This trait (and the topic of random value generation) should be removed from this discussion entirely in my opinion, focussing only on "secure random data generation" as in the title. Why: because (1) provision of secure random data is an important topic by itself (with many users only wanting a byte slice and with methods like from_ne_bytes already providing safe conversion) and (2) because random value generation is a whole other topic (including uniform-ranged samples and much more).

Disclaimer: I am one of the rand developers. rand originally had a similar trait which got removed; the closest surviving equivalent is StandardUniform.

@hanna-kruppe
Copy link
Contributor

hanna-kruppe commented Nov 30, 2024

@newpavlov

It's also not clear whether it's allowed to overwrite the default RandomSource supplied with std similarly to GlobalAlloc.

Overriding the default source in an application that already has one from linking std seems questionable. It's not that I can't imagine any use case for it, but the established pattern for such overrides allows any crate in the dependency graph to do it (it's only an error if you link two such crates), instead of putting the leaf binary/cdylib/staticlib artifact in charge. As you articulated in the context of getrandom, that's a security risk for applications. So a RandomSource equivalent should probably be more restrictive in who can override it, but that's not the existing pattern. It also doesn't seem to fit with the proposed generalization of that pattern via "externally implementable functions" (rust-lang/rfcs#3632) -- if that RFC is accepted, any new API surface should use it instead of adding new one-off mechanisms.

If overriding the std source isn't supported, then it could work the same way as #[panic_handler]: you must supply an implementation if you don't link std, but if you do link std then supplying your own is an error. This would still be extremely useful. Currently, every crate that's (optionally) no_std and needs some randomness, most commonly for seeding Hashers, has to cobble together some sub-par ad-hoc solution to try and get some entropy from somewhere. There's a bunch of partial solutions that are better than nothing (const-random, taking addresses of global/local variables and praying that there's some ASLR, a global counter when atomics are available, cfg-gated access to target-specific sources like CPU cycle counters or x86 RDRAND) but:

  1. How well this works ends up highly platform-specific, in particular none of them work well for wasm32-unknown-unknown and wasm32v1-none targets .
  2. Applications that have access to a better source of entropy and (directly or transitively) use such libraries don't have a good way to enumerate them and make all of them use the better source.

This wouldn't be a problem if the entire ecosystem could agree to always delegate this problem to on one specific crate (version) with appropriate hooks, like getrandom, but evidently that's not happening. Putting this capability into core (or a new no_std sysroot crate, comparable to alloc) has a better chance of solving this coordination problem. Well, at least eventually, once everyone's MSRV has caught up.

Edit: almost forgot that even std::collections::Hash{Map,Set} depend on having a source of random seeds. A way to supply such a source without linking std could help with moving those types to alloc, although as #27242 (comment) points out, it's not backwards compatible to make such a source mandatory for no_std + alloc applications.

@newpavlov
Copy link
Contributor

@hanna-kruppe

Overriding the default source in an application that already has one from linking std seems questionable.

There is a number of reasons to allow overriding:

  1. An alternative interface may be more efficient than the default one (e.g. reading the RNDR register vs doing syscall)
  2. It may help reduce binary size and eliminate potentially problematic fallback paths (e.g. if you know that you do not need the file fallback on Linux)
  3. In some cases it's useful to eliminate non-deterministic inputs (testing, fuzzing)

So a RandomSource equivalent should probably be more restrictive in who can override it, but that's not the existing pattern.

Yes. How about following the getrandom path and allow override only when a special configuration flag is passed to the compiler?

Either way, overriding is probably can be left for later. I think we both agree that we need a way to expose "system" entropy source in std and a way to define this source for std-less targets.

It also doesn't seem to fit with the proposed generalization of that pattern via "externally implementable functions" -- if that RFC is accepted, any new API surface should use it instead of adding new one-off mechanisms.

I agree that ideally we need a unified approach for this kind of problem. I made a similar proposal once upon a time.

But I think it fits fine? Targets with std could implicitly use std_random_impl crate for "external implementation" of the getrandom-like functions and users will be able to override it in application crates if necessary.

How well this works ends up highly platform-specific, in particular none of them work well for wasm32-unknown-unknown and wasm32v1-none targets .

I believe that having std for wasm32-unknown-unknown was a big mistake in the first place and the wasm32v1-none target is a good step in the direction of amending it. So I hope we will not give too much attention to its special circumstances.

This wouldn't be a problem if the entire ecosystem could agree to always delegate this problem to on one specific crate (version) with appropriate hooks, like getrandom, but evidently that's not happening.

Well, it has happened, sort of. getrandom is reasonably popular in the ecosystem even after excluding rand users.

The problem is that std already effectively includes its variant of getrandom for HashMap seeding and people reasonably want to get access to that. And I think problem of getting "system" entropy is fundamental enough for having it in std (well, not in the std per se, let's say in the sysroot crate set).

A way to supply such a source without linking std could help with moving those types to alloc, although as #27242 (comment) points out, it's not backwards compatible to make such a source mandatory for no_std + alloc applications.

Can we add yet another sysroot crate for HashMap which will depend on both alloc and the hypothetical "system entropy" crate?

@hanna-kruppe
Copy link
Contributor

But I think it fits fine? Targets with std could implicitly use std_random_impl crate for "external implementation" of the getrandom-like functions and users will be able to override it in application crates if necessary.

The RFC (and the competing ones I've looked at) only supports a default implementation in the crate that "declares" the externally-implementable thing. If that crate isn't std, then an implementation from std would not count as "default" but conflict with any other definition. So we'd need another special carve-out for std (the very thing we'd want to avoid by adding a general language feature), or the language feature needs to become much more general to support overrideable default implementations from another source.

I believe that having std for wasm32-unknown-unknown was a big mistake in the first place and the wasm32v1-none target is a good step in the direction of amending it. So I hope we will not give too much attention to its special circumstances.

I was specifically talking about no_std libraries, for which the two targets are basically equivalent. Both don't have any source of entropy implied by the target tuple (instruction set, OS, env, etc.), and if you want to add one it'll have to involve whatever application-specific interface the wasm module has with its host.

Well, it has happened, sort of. getrandom is reasonably popular in the ecosystem even after excluding rand users.

Not to point any fingers but a counter example that's fresh on my mind because I looked at its code recently is foldhash. As another example, ahash only uses getrandom optionally (though it's on by default). If you're only using ahash indirectly through another library that disables the feature, then it's not gonna use getrandom unless you happen to notice this and add a direct dependency to enable the feature. In that case there is a solution, at least, but it's still not discoverable.

Can we add yet another sysroot crate for HashMap which will depend on both alloc and the hypothetical "system entropy" crate?

Possibly, but people may object to a proliferation of sysroot crates so let's hope there's a better solution.

@newpavlov
Copy link
Contributor

newpavlov commented Nov 30, 2024

or the language feature needs to become much more general to support overrideable default implementations from another source.

Yes, it's not as if RFC is a technical specification which must be followed word-by-word. There is a number of cases where the original RFC vision has somewhat changed during implementation stages. If anything, I would say it's an oversight/deficiency of the RFC to not cover cases like this.

foldhash

If a crate aims to minimize its number of dependencies as far as possible even at the cost of code quality and security, it obviously will not depend on getrandom, despite it being the de facto standard for getting system entropy. I think most people will agree that hacks like this have a strong smell. The same applies to fastrand (amusingly, it still uses getrandom for Web WASM) and other "partial solutions" listed by you. As you can notice, both hack their way into using std to get system entropy and either use a fixed seed or pile even more hacks when std is not available.

@hanna-kruppe
Copy link
Contributor

As I said, I have no intention of pointing fingers at any crates. They have to navigate tricky trade-offs and complexities due to Rust's standard library (as a whole, not just std) not yet providing any entropy access. Let's keep this issue focused on changing that.

@workingjubilee
Copy link
Member

The Random trait seems weakly motivated in terms of coupling it to RandomSource, as its design seems like it will be a much more hotly contested space, and it is (mostly) unrelated to RandomSource.

@theemathas
Copy link
Contributor

Note that @dhardy (maintainer of rand) wrote some criticism of this at rust-lang/libs-team#393 (comment)

@abgros
Copy link

abgros commented Feb 6, 2025

Could Random be implemented for arrays? That way, we could write something like:

let random_array: [u64; 100] = random();

which is a lot more convenient than using the fill_bytes() method.

@sorairolake
Copy link
Contributor

I think it would be useful to have a data type like rand::rngs::mock::StepRng to represent a source of randomness. This is useful for testing whether Random for an arbitrary data type T is implemented as expected.

@dhardy
Copy link
Contributor

dhardy commented Feb 8, 2025

I think it would be useful to have a data type like rand::rngs::mock::StepRng to represent a source of randomness. This is useful for testing whether Random for an arbitrary data type T is implemented as expected.

I disagree that it is useful in testing implementations of traits like Random, since there should not be any implied restrictions on such a distribution except that outputs are uniformly distributed, given that the outputs of the random source are truly random. Testing the distribution of output values requires statistical tools, e.g. the KS test as we have here.

@hanna-kruppe
Copy link
Contributor

As far as I know there’s no plan to make the RandomSource trait sealed, so other implementations can be provided by third party crates without having to debate whether it’s a good idea for std to include them.

@jstarks
Copy link

jstarks commented Feb 9, 2025

It seems like one might regret adding Random later--for some times (e.g., u32), there's a natural definition and implementation that produces a uniform distribution, but it happens to correspond to using RandomSource with something like zerocopy/bytemuck/the safe transmute effort. I.e., there's already a reasonable solution for these types.

For more complicated types (e.g., f32 or (T, U)), Random is under-defined. It is unclear what distribution is expected, and without implementations on tuples or a derive for enums/structs, there's really no "reference implementation" for third-party code to model.

It seems better to leave this out rather than leave it in this poorly specified state. Just stabilize RandomSource.

Although... even RandomSource is lacking, in that it doesn't support writing to an uninitialized buffer... Would it be better to wait until this problem is solved in Read (#78485) before committing to this new interface? Adding BorrowedBuf support later will be a headache.

@hanna-kruppe
Copy link
Contributor

Would it be better to wait until this problem is solved in Read (#78485) before committing to this new interface?

I really don't think so. The current interface works perfectly fine, it's just slightly suboptimal in some use cases (fewer than in the context of general byte-centric I/O). Providing only a Read::read_buf-style API would be strictly worse for many common use cases, in that it would be more complicated to use for no benefit. Usually you either want to generate one conveniently-sized integer at a time (probably should have a dedicated trait method for performance reasons), or you want to fill a small fixed-size buffer completely (e.g., cryptographic key material). Scenarios where the cost of buffer initialization is significant, e.g. preparing a large ephemeral buffer but only filling a small part of it before discarding it, are rare.

So I think it's pretty clear that the method for filling a &mut [u8] should exist even if io::Read had already arrived at a stable solution for reading into uninitialized buffers. Adding the latter after stabilizing the former does pose some challenges, but io::Read already has to solve similar challenges -- more of them, actually, because it also has to work well for several layers of I/O adapters, while PRNGs are rarely composed in a way that stresses the performance of data flowing through several layers of composition.

@Kixunil
Copy link
Contributor

Kixunil commented Feb 9, 2025

@hanna-kruppe

Perhaps something like this would work:

// Users don't need to concern themselves with this type.
pub strut OutBuf([MaybeUninit<u8>]);

impl OutBuf {
    // various methods to construct it and write to it, no method to read from it
}

// Users don't need to worry about implementing this, just use it
pub unsafe trait AsOutBuf {
    type Init: ?Sized;

    fn as_out_buf(&mut self) -> &mut OutBuf;
    // Only allowed if the entire buffer was overwritten.
    unsafe fn assume_init(&mut self) -> &mut Self::Init;
}

impl AsOutBuf for [u8] { type Init = [u8]; ... }
impl AsOutBuf for [MaybeUninit<u8>] { type Init = [u8]; ... }
impl<const N: usize> AsOutBuf for [u8; N] { type Init = [u8; N]; ... }
impl<const N: usize> AsOutBuf for [MaybeUninit<u8>; N] { type Init = [u8; N]; ... }

// Users can pass in any byte slice, byte array or their uninitialized counterparts
// The entire value is guaranteed to get overwritten, so if using the returned value is undesirable due to borrowing issues the users can simply use the original value safely, if it was an initialized type to begin with or they may just soundly call `assume_init` on it.
pub fn fill_random_bytes<T: AsOutBuf>(buf: &mut T) -> &mut T::Init { ... }

This way it already supports all reasonable cases and people are not forced into using uninitialized API. They can write let mut buf = [0; 32]; fill_random_bytes(&mut buf); exactly as if they were using the simple API or calling Read::read_exact.

@hanna-kruppe
Copy link
Contributor

That helps the simple call sites remain simple, but it has several downsides:

  1. The more complicated signature makes it less obvious that it can be used like that for simple cases, both for users who aren't reading the rustdoc page from start to finish, and also to the compiler's type inference in some cases.
  2. The reliance on generics means RandomSource is no longer dyn-compatible. This could be fixed by gating the AsOutBuf-using methods on Self: Sized and providing other methods that are dyn-compatible, but at that point you might as well provide two concrete methods for the initialized and possibly-uninitialized case.
  3. The new abstractions compete with Borrowed{Buf,Cursor} and other proposed APIs for "writing into a possibly-uninitialized buffer". std should minimize the number of subtly different and incompatible abstractions for the same problem space, especially if it's something third party libraries would want to use as vocabulary type/trait. Going in this direction would give more reasons to block stabilization of secure random data generation on other APIs that have been in limbo for a long time.
  4. Even if we ignore the previous point and focus only on the narrow use case of RandomSource, there's considerable design space to be explored there. For example, a downside of the specific approach you outlined is that it forces every RandomSource implementation to use unsafe internally while also requiring every caller who wants to use the MaybeUninit capabilities to use another bit of unsafe justified by "the implementation is correct" (and we'd have to make RandomSource unsafe to implement to make this sound). In contrast, the more complicated interface of Borrowed{Buf,Cursor} provides a bunch of useful functionality as safe APIs.

@newpavlov
Copy link
Contributor

newpavlov commented Feb 15, 2025

As noted above, panicking or returning an error on unsupported functionality in std is an unfortunate reality. Ideally, Rust should've introduced a set of PAL crates instead of a monolithic std, but alas...

The WASI/WASM case is yet another argument why we need to introduce an "insecure" randomness source. We can mandate that this source may use PRNG seeded with a fixed seed on platforms which do not provide proper system randomness.

@abgros
Copy link

abgros commented Mar 11, 2025

In conclusion and after reading the above discussion, I think std should just expose a function like

// in std::io

fn read_random(bytes: &mut [u8]);
and call it a day.

This seems okay for general use but I think it could actually be very dangerous.

If I'm given an uninitialized buffer:

let mut buf = [MaybeUninit::<u8>::uninit(); 100];

I might be tempted to convert it to a &mut [u8] and use it in your function like so:

let slice = unsafe {
	std::slice::from_raw_parts_mut(buf.as_mut_ptr() as *mut u8, buf.len())
};
read_random(slice);

Unfortunately, calling from_raw_parts_mut on an uninitialized slice is immediate undefined behaviour. In fact, there's no way around this besides initializing buf with some garbage data.

I would recommend changing the signature to:

fn read_random(bytes: &mut [MaybeUninit<u8>]);

and also adding a guarantee that if this function succeeds, all the bytes in the slice will have been initialized (i.e. calling assume_init() is always sound).

EDIT: upon further discussion this might be a good use case for BorrowedBuf?

@hanna-kruppe
Copy link
Contributor

hanna-kruppe commented Mar 12, 2025

Requiring every caller to write a bit of unsafe is a non-starter, even if the soundness argument is formulaic. As for BorrowedBuf or other ways of safely abstracting over both initialized and uninit buffers, this was discussed at length earlier and has some downsides plus relatively weak motivation (compared to other I/O).

I don’t find your new argument very convincing: casting an uninitialized slice to &mut [T] is always unsound, as you say. On some level we have to trust users who choose to invoke unsafe to actually read the preconditions of the functions they’re calling. That’s not to say unsafe code should be unnecessarily difficult to write correctly or be full of “gotchas”. For example , it might be a good idea to also add an example to read_random’s documentation calling out this specific case (if you have an uninit buffer, you must initialize it first before casting it to &mut [u8] and passing it to read_random). But making the API unsafe to use or more complicated for the common uses seems like a bad trade off to reduce the impact of “careless” deployment of unsafe.

@bjoernager
Copy link
Contributor

bjoernager commented Mar 12, 2025

What if we make the read_random function generic?

zerocopy and bytemuck, for example, define the FromBytes and AnyBitPattern traits, respectively. Maybe we could also borrow from them?

// core::marker

// Possibly as a language item:
pub /* unsafe */ trait FromBytes { } // Or with some other name...

// std::io

pub fn read_random<T: FromBytes>(buf: &mut T);

We could then add an extra promise that the buffer referenced by buf will be fully initialised after a call to read_random (regardless of whether T supports uninitialised states or not).

(edit: I would personally prefer the name fill_random in this case, but this was just to build upon the previous suggestion).

@abgros
Copy link

abgros commented Mar 12, 2025

Requiring every caller to write a bit of unsafe is a non-starter, even if the soundness argument is formulaic. [...] making the API unsafe to use or more complicated for the common uses seems like a bad trade off [...]

To be fair, it seems like many people here are overtly hostile to the "common uses", e.g. see above

No, I expect a beginner to use a crate that provides a PRNG of some sort and sampling within a distribution using said PRNG.

But since every low-level OS API that I'm familiar with takes in a pointer and a length, it might be wise to have a lower-level function giving you direct access to it and avoiding any overhead or abstraction like:

pub unsafe fn read_bytes_raw(dst: *mut u8, len: usize);

and then also have higher-level APIs covering plenty of common stuff like generating numbers or vectors which don't require using unsafe.


What if we make the read_random function generic?

It seems like your proposed read_random() function is virtually identical to the exising std::random::random() which is generic over any type implementing Random.

@bjoernager
Copy link
Contributor

bjoernager commented Mar 12, 2025

It seems like your proposed read_random() function is virtually identical to the exising std::random::random() which is generic over any type implementing Random.

I would argue that it is different in that my proposal would randomise the raw representation of the buffer – regardless of how this effects its effective value (whereas the current random function seems to only support "raw" types, i.e. non-padded integers). The proposed trait would also fill a largely different need (although with notable overlap) than the existing Random trait, namely that it can be used with non-random sources.

@abgros
Copy link

abgros commented Mar 12, 2025

Well, random() will support arrays when #136732 lands, which in principle should be optimal as long as the compiler realizes that it can reuse an existing allocation. But your method is admittedly more powerful since it lets you use a &mut anywhere, like a subslice of a larger allocation. Probably having both options would be useful.

@joboet
Copy link
Member Author

joboet commented Mar 28, 2025

Hey, T-libs-api! How do you feel about reducing the scope of this feature to the following API:

In conclusion and after reading the above discussion, I think std should just expose a function like

// in std::io

fn read_random(bytes: &mut [u8]);

and call it a day. Everything else involves trade-offs between complexity, stability and security that conflict with the stability guarantees of the standard library. And quoting from stds crate documentation:

Besides basic data types, the standard library is largely concerned with abstracting over differences in common platforms

Providing only a getrandom-style platform abstraction fits this scope description very nicely.

@rustbot label +I-libs-api-nominated

@rustbot rustbot added the I-libs-api-nominated Nominated for discussion during a libs-api team meeting. label Mar 28, 2025
@newpavlov
Copy link
Contributor

newpavlov commented Mar 28, 2025

In my opinion, this function should not be defined in the std::io module and instead a new dedicated module (random?) should be created. Additionally, I am not sure about the read_random name. I think it should indicate that we read "system" entropy.

UPD: It seems we already have the random module.

@ChrisDenton
Copy link
Member

UPD: It seems we already have the random module

Many people were not happy with the idea of a strong entropy source sharing a module with high level random traits and (potentially) weaker random functions. The first may be used for cryptographic purposes but the later must not.

@newpavlov
Copy link
Contributor

Many people were not happy with the idea of a strong entropy source sharing a module with high level random traits and (potentially) weaker random functions.

It's a weird argument in my opinion. Where else should items related to RNG live if not in a module named random?

Applicability for cryptographic applications should be resolved by a CryptoRng-like trait. As for "high level random traits", as I wrote previously, I don't think we should have such traits in std in the first place. Instead they should be defined in third-party crates like rand. But arguably it's off-topic for this thread, let's focus solely on "secure random data generation".

@hanna-kruppe
Copy link
Contributor

The exact path (module, name) seems less important than the question raised about scope of the feature.

@newpavlov
Copy link
Contributor

I agree with the limited scope, i.e. starting with a relatively simple potentially panicking function with &mut [u8]-based API. But it may be worth to introduce two functions: read_sys_entropy and read_insecure_sys_entropy. The latter function would expose the code which is currently used by HashMap, while the former would need to be implemented separately on targets which support non-blocking entropy generation.

@joboet
Copy link
Member Author

joboet commented Mar 28, 2025

In my opinion, this function should not be defined in the std::io module and instead a new dedicated module (random?) should be created. Additionally, I am not sure about the read_random name. I think it should indicate that we read "system" entropy.

UPD: It seems we already have the random module.

I think that makes sense, but only if we want to add other randomness functions in the future (otherwise it'd be a module with just one function). As I'm arguing that read_random should be the only randomness function that std provides (well, okay, I'm not opposed to adding an insecure version), I think std::io makes more sense – just like with the newly added pipe support.

@joboet
Copy link
Member Author

joboet commented Mar 28, 2025

I agree with the limited scope, i.e. starting with a relatively simple potentially panicking function with &mut [u8]-based API. But it may be worth to introduce two functions: read_sys_entropy and read_insecure_sys_entropy. The latter function would expose the code which is currently used by HashMap, while the former would need to be implemented separately on targets which support non-blocking entropy generation.

I'm not at all opposed, but I'd prefer to defer that discussion.

@alex
Copy link
Member

alex commented Mar 28, 2025

It's a weird argument in my opinion. Where else should items related to RNG live if not in a module named random?

(👋, I maintain the Python cryptography library, in addition to the Rust stuff I do)

I think the specific anti-pattern that we want to avoid is something like random::secure_random() vs. random::random() or some other API distinction where users who don't have deep familiarity with CSPRNGs will find the non-CSPRNG to be the obvious API.

@newpavlov
Copy link
Contributor

I think that makes sense, but only if we want to add other randomness functions in the future (otherwise it'd be a module with just one function)

Yes, this is why it's important to decide whether we plan to add rand_core-like traits in future or not. Frankly, std::io is a weird place for a piece of functionality which arguably should have a separate sysroot crate.

I think the specific anti-pattern that we want to avoid is something like random::secure_random() vs. random::random() or some other API distinction where users who don't have deep familiarity with CSPRNGs will find the non-CSPRNG to be the obvious API.

Note that I intentionally used read_insecure_sys_entropy following the "syntactic salt" principle. Also, Rust is very different from Python. For example, in cryptographic libraries (e.g. RustCrypto) we heavily rely on the CryptoRng marker trait. Users simply are not able to accidentally pass a weak RNG into such APIs, their code will be rejected at compile time.

@alex
Copy link
Member

alex commented Mar 28, 2025

The different traits work great for APIs that take an RNG as an argument. But there's also plenty of usecases for things like "I would like 16 random bytes please", and for those it can be relatively easy to mix up a CSPRNG vs. a non-CSPRNG.

@jstarks
Copy link

jstarks commented Mar 28, 2025

I agree with the limited scope, i.e. starting with a relatively simple potentially panicking function with &mut [u8]-based API. But it may be worth to introduce two functions: read_sys_entropy and read_insecure_sys_entropy. The latter function would expose the code which is currently used by HashMap, while the former would need to be implemented separately on targets which support non-blocking entropy generation.

Keep it simple, leave out read_insecure_sys_entropy for now. It's highly specialized, its behavior is hard to specify, and it invites misuse.

You can always add it later if it really becomes important (i.e., to make it easier for non-std random-seeded data structures to be written to run on obscure platforms or early in boot on Linux--is this really that common a use case?).

@dhardy
Copy link
Contributor

dhardy commented Mar 28, 2025

The exact path (module, name) seems less important than the question raised about scope of the feature.

The scope is what I would like to see: an interface suitable for initializing a CSPRNG, rendering the getrandom crate largely obsolete (for most uses I think; some may make use of the u32 / u64 / fill_uninit interfaces or support for custom backends).

I agree with @newpavlov: std::random::read_sys_entropy is a good name (saying at a glance that this is an interface over an external random data source).

@newpavlov
Copy link
Contributor

Keep it simple, leave out read_insecure_sys_entropy for now. It's highly specialized, its behavior is hard to specify, and it invites misuse.

No one says it should be stabilized together with read_sys_entropy. The main argument for exposing it is that std uses it internally already.

And it's pretty easy to specify in my opinion: "reasonable" effort randomness source which may emit predictable random numbers in the worst case scenario, to the point of using PRNG seeded with a fixed seed (e.g. on WASM targets).

I would say we have a bigger issue with read_sys_entropy. Not all std targets provide a suitable API for it. WASM is the most prominent example, but there is also non-x86 Hermit and maybe some other targets.

@hanna-kruppe
Copy link
Contributor

I would say we have a bigger issue with read_sys_entropy. Not all std targets provide a suitable API for it. WASM is the most prominent example, but there is also non-x86 Hermit and maybe some other targets.

This was discussed before, see #130703 (comment) and surrounding comments -- it affects many other std APIs on various targets besides wasm and is generally handled by panicking or returning an error. This tracking issue is really getting far too long, even the people who have been involved in it the entire time are going in circles because it's become so unwieldy 🙃

@Kixunil
Copy link
Contributor

Kixunil commented Mar 28, 2025

endering the getrandom crate largely obsolete

I might be biased but no_std crates are quite frequent. It'd suck if they still had to use getrandom. (Though std is a good start.)

@newpavlov I don't get what's the deal with WASM. It's not OS/hardware, so it shouldn't even have std but specific platform like wasi does and that one also has some sort of randomness: https://github.com/WebAssembly/wasi-random

@ericlagergren
Copy link

This issue is for secure random data generation: reading bytes directly from the system's CSPRNG. Everything else should happen in a separate issue.

@newpavlov
Copy link
Contributor

newpavlov commented Mar 28, 2025

It's not OS/hardware, so it shouldn't even have std but specific platform like wasi does and that one also has some sort of randomness

I agree, but the unfortunate reality is that Rust does provide std for wasm32-unknown-unknown (I strongly believe it was a mistake, but alas) and will continue to do so because of the backward compatibility guarantees. Yes, we could provide an always panicking implementation of read_sys_entropy on this target, but considering that we don't have a clear path towards a #[global_allocator]-like ability to overwrite the system entropy source, usefulness of the introduced function will be limited. People would have to continue to rely on crates like getrandom for a more universal interface.

Imagine a function/method like KeyInit::generate_key which would be documented as "does not work on no_std targets, always panics on some std targets". Would you be happy with such API?

@hanna-kruppe
Copy link
Contributor

Libraries that don’t require std or want to work on targets with half-broken std are common, but it’s significantly less common for entire applications to be #[no_std] or use the exact same main function on e.g., Linux and wasm32-unknown-unknown. If the program entry point selects the entropy source and explicitly passes it through all libraries, a whole lot of programs written in Rust could still use the std API even if libraries have reasons to avoid it calling it directly. It’s different from the model offered by getrandom (globally accessible source of entropy even in no_std code) and more annoying to write code in this style, but there are also software engineering advantages to explicit dependency injection for such non-determinism. So while this may not be a drop in replacement for getrandom, the delta amounts to “provide a way for consumers to plug in their own global source on platforms where std doesn’t have one” — something std could provide as well in theory, but also doesn’t have to, especially since the best interface isn’t clear.

@briansmith
Copy link
Contributor

I agree, but the unfortunate reality is that Rust does provide std for wasm32-unknown-unknown (I strongly believe it was a mistake, but alas) and will continue to do so because of the backward compatibility guarantees. Yes, we could provide an always panicking implementation of read_sys_entropy on this target,

I think that is the best thing to do. The real issue is that we haven't deprecated the wasm32-unknown-unknown target yet; it is inevitable, but not everybody sees it yet. wasm32-unknown-unknown stuff shouldn't block progress on this.

@Kixunil
Copy link
Contributor

Kixunil commented Mar 28, 2025

will continue to do so because of the backward compatibility guarantees

I don't see any reason to not deprecate it at least and provide more sensible alternative. std already has a bunch of stuff deprecated and replaced with something better. I don't know what the specific use case for wasm32-unknown-unknown is but it seems that any other more specific target is just better when it come to std.

"does not work on no_std targets, always panics on some std targets". Would you be happy with such API?

Being unavailable on no_std and not having a stable RNG trait sucks to be honest, speaking from experience. But that's the current state of things anyway. Some sort of global registry similar to global allocator would indeed help. Anyway, if the second part was "panics on deprecated std targets that go replaced with something sensible years ago" I wouldn't mind.

it’s significantly less common for entire applications to be #[no_std]

Well, minimizing WASM module size by getting rid of std is nice. But that potentially has other challenges.

@bstrie
Copy link
Contributor

bstrie commented Mar 29, 2025

This issue is for secure random data generation: reading bytes directly from the system's CSPRNG. Everything else should happen in a separate issue.

Seconded. All I want is something like io::read_entropy as an alternative to the getrandom crate. Does the libs team need to accept an ACP first or should someone presumptuously make an issue so we can move discussion there?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC I-libs-api-nominated Nominated for discussion during a libs-api team meeting. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests