Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document that str, slices, (more?) can't safely straddle allocation boundaries #62765

Open
hanna-kruppe opened this issue Jul 17, 2019 · 8 comments
Labels
A-docs Area: Documentation for any part of the project, including the compiler, standard library, and tools C-enhancement Category: An issue proposing an enhancement or a PR with one. T-lang Relevant to the language team, which will review and decide on the PR/issue.

Comments

@hanna-kruppe
Copy link
Contributor

Slicing, indexing, and other safe operations on slices and strings pervasively use <*T>::offset and APIs built on top of it. These have the requirement that

Both the starting and resulting pointer must be either in bounds or one byte past the end of the same allocated object.

So if one allocates two pieces of memory and after proper checking miraculously finds they are directly adjacent, one can't safely construct a slice/str/etc. that spans both of these allocations. At least, one can't do very many things with the result it without causing UB from crossing the boundary between the allocations.

I couldn't find anything documenting this. It should be noted on the unsafe constructors (from_raw_parts etc.) at minimum. These already link to offset's documentation but only refer to its "no larger than isize::MAX" requirement, with no mention that the other requirements are also relevant.

cc oberien/str-concat#8
cc @rust-lang/wg-unsafe-code-guidelines

(Similar issues apply to references-to-arrays and field accesses in aggregates, but this is due to the compiler's codegen for language primitives rather than due to standard library code, so it should go into the UCG and I believe we're more or less covering that already.)

@jonas-schievink jonas-schievink added the A-docs Area: Documentation for any part of the project, including the compiler, standard library, and tools label Jul 17, 2019
@RalfJung
Copy link
Member

Agreed. I am pretty sure this already came up last year or before with another "slice concat" library that was found to be unsound, but I cannot find any trace of that right now.

@HeroicKatora
Copy link
Contributor

HeroicKatora commented Jul 19, 2019

Pretty sure you mean this one 😄

@RalfJung
Copy link
Member

@HeroicKatora I do, nice find!

@cramertj
Copy link
Member

I don't suppose there's any way to mark that offsetof from one of the addresses could also be used to refer to a separately allocated object, e.g. ptr::join(ptr_into_allocation_a, ptr_into_allocation_b)? I don't know of any LLVM-level operation to mirror this, but it seems pretty clearly useful.

@RalfJung
Copy link
Member

@cramertj In principle, you can implement a cross_object_offset as follows:

fn cross_object_offset<T>(ptr: *const T, n: usize) -> *const T {
  ((ptr as usize) + n*mem::size_of::<T>()) as *const T
}

However, there are some long-standing LLVM bugs in this area and with the LLVM model being as awfully imprecise as it is for UB, we can't really tell where the bug is.

The good news though is that it seems like C++ is moving towards "pointer provenance does not propagate via integers", which would basically force LLVM to implement the semantics we want for the above operation.

@hanna-kruppe
Copy link
Contributor Author

@RalfJung showed how to perform pointer arithmetic that crosses allocation boundaries. It's certainly possible to do this manually when handling raw pointers, but it's something that has to be done on every operation, you can't simply claim to LLVM that two separate allocations are actually the same. Therefore, fixing the safety issue that this issue is about would require us to implement all operations on slices/strings/etc. with cross_object_offset. This will most likely cause serious performance regressions, both due to the added flexibility making desirable optimizations illegal, and because analyses and optimizations are often tailored to getelementpointers even if they may after careful consideration turn out to be applicable to cross_object_offset-using code too.

@hanna-kruppe
Copy link
Contributor Author

#66111 did this for slices

@RalfJung
Copy link
Member

Well it did it in from_raw_parts. I wonder if the top-level module and/or primitive type documentation should also have a section on this? It doesn't talk about data layout at all so far.

I didn't do anything for str as there is no from_raw_parts for them.

@jonas-schievink jonas-schievink added C-enhancement Category: An issue proposing an enhancement or a PR with one. T-lang Relevant to the language team, which will review and decide on the PR/issue. labels Mar 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-docs Area: Documentation for any part of the project, including the compiler, standard library, and tools C-enhancement Category: An issue proposing an enhancement or a PR with one. T-lang Relevant to the language team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

5 participants