Skip to content
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.

Commit 78529d9

Browse files
authoredJul 15, 2024
Rollup merge of #124921 - RalfJung:offset-from-same-addr, r=oli-obk
offset_from: always allow pointers to point to the same address This PR implements the last remaining part of the t-opsem consensus in rust-lang/unsafe-code-guidelines#472: always permits offset_from when both pointers have the same address, no matter how they are computed. This is required to achieve *provenance monotonicity*. Tracking issue: #117945 ### What is provenance monotonicity and why does it matter? Provenance monotonicity is the property that adding arbitrary provenance to any no-provenance pointer must never make the program UB. More specifically, in the program state, data in memory is stored as a sequence of [abstract bytes](https://rust-lang.github.io/unsafe-code-guidelines/glossary.html#abstract-byte), where each byte can optionally carry provenance. When a pointer is stored in memory, all of the bytes it is stored in carry that provenance. Provenance monotonicity means: if we take some byte that does not have provenance, and give it some arbitrary provenance, then that cannot change program behavior or introduce UB into a UB-free program. We care about provenance monotonicity because we want to allow the optimizer to remove provenance-stripping operations. Removing a provenance-stripping operation effectively means the program after the optimization has provenance where the program before the optimization did not -- since the provenance removal does not happen in the optimized program. IOW, the compiler transformation added provenance to previously provenance-free bytes. This is exactly what provenance monotonicity lets us do. We care about removing provenance-stripping operations because `*ptr = *ptr` is, in general, (likely) a provenance-stripping operation. Specifically, consider `ptr: *mut usize` (or any integer type), and imagine the data at `*ptr` is actually a pointer (i.e., we are type-punning between pointers and integers). Then `*ptr` on the right-hand side evaluates to the data in memory *without* any provenance (because [integers do not have provenance](https://rust-lang.github.io/rfcs/3559-rust-has-provenance.html#integers-do-not-have-provenance)). Storing that back to `*ptr` means that the abstract bytes `ptr` points to are the same as before, except their provenance is now gone. This makes `*ptr = *ptr` a provenance-stripping operation (Here we assume `*ptr` is fully initialized. If it is not initialized, evaluating `*ptr` to a value is UB, so removing `*ptr = *ptr` is trivially correct.) ### What does `offset_from` have to do with provenance monotonicity? With `ptr = without_provenance(N)`, `ptr.offset_from(ptr)` is always well-defined and returns 0. By provenance monotonicity, I can now add provenance to the two arguments of `offset_from` and it must still be well-defined. Crucially, I can add *different* provenance to the two arguments, and it must still be well-defined. In other words, this must always be allowed: `ptr1.with_addr(N).offset_from(ptr2.with_addr(N))` (and it returns 0). But the current spec for `offset_from` says that the two pointers must either both be derived from an integer or both be derived from the same allocation, which is not in general true for arbitrary `ptr1`, `ptr2`. To obtain provenance monotonicity, this PR hence changes the spec for offset_from to say that if both pointers have the same address, the function is always well-defined. ### What further consequences does this have? It means the compiler can no longer transform `end2 = begin.offset(end.offset_from(begin))` into `end2 = end`. However, it can still be transformed into `end2 = begin.with_addr(end.addr())`, which later parts of the backend (when provenance has been erased) can trivially turn into `end2 = end`. The only alternative I am aware of is a fundamentally different handling of zero-sized accesses, where a "no provenance" pointer is not allowed to do zero-sized accesses and instead we have a special provenance that indicates "may be used for zero-sized accesses (and nothing else)". `offset` and `offset_from` would then always be UB on a "no provenance" pointer, and permit zero-sized offsets on a "zero-sized provenance" pointer. This achieves provenance monotonicity. That is, however, a breaking change as it contradicts what we landed in #117329. It's also a whole bunch of extra UB, which doesn't seem worth it just to achieve that transformation. ### What about the backend? LLVM currently doesn't have an intrinsic for pointer difference, so we anyway cast to integer and subtract there. That's never UB so it is compatible with any relaxation we may want to apply. If LLVM gets a `ptrsub` in the future, then plausibly it will be consistent with `ptradd` and [consider two equal pointers to be inbounds](#124921 (comment)).
2 parents d3dd34a + f6c377c commit 78529d9

File tree

7 files changed

+67
-61
lines changed

7 files changed

+67
-61
lines changed
 

‎compiler/rustc_const_eval/src/interpret/intrinsics.rs

+18-15
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ use super::{
2020
err_inval, err_ub_custom, err_unsup_format, memory::MemoryKind, throw_inval, throw_ub_custom,
2121
throw_ub_format, util::ensure_monomorphic_enough, Allocation, CheckInAllocMsg, ConstAllocation,
2222
GlobalId, ImmTy, InterpCx, InterpResult, MPlaceTy, Machine, OpTy, Pointer, PointerArithmetic,
23-
Scalar,
23+
Provenance, Scalar,
2424
};
2525

2626
use crate::fluent_generated as fluent;
@@ -259,25 +259,28 @@ impl<'tcx, M: Machine<'tcx>> InterpCx<'tcx, M> {
259259
// This will always return 0.
260260
(a, b)
261261
}
262-
(Err(_), _) | (_, Err(_)) => {
263-
// We managed to find a valid allocation for one pointer, but not the other.
264-
// That means they are definitely not pointing to the same allocation.
262+
_ if M::Provenance::OFFSET_IS_ADDR && a.addr() == b.addr() => {
263+
// At least one of the pointers has provenance, but they also point to
264+
// the same address so it doesn't matter; this is fine. `(0, 0)` means
265+
// we pass all the checks below and return 0.
266+
(0, 0)
267+
}
268+
// From here onwards, the pointers are definitely for different addresses
269+
// (or we can't determine their absolute address).
270+
(Ok((a_alloc_id, a_offset, _)), Ok((b_alloc_id, b_offset, _)))
271+
if a_alloc_id == b_alloc_id =>
272+
{
273+
// Found allocation for both, and it's the same.
274+
// Use these offsets for distance calculation.
275+
(a_offset.bytes(), b_offset.bytes())
276+
}
277+
_ => {
278+
// Not into the same allocation -- this is UB.
265279
throw_ub_custom!(
266280
fluent::const_eval_offset_from_different_allocations,
267281
name = intrinsic_name,
268282
);
269283
}
270-
(Ok((a_alloc_id, a_offset, _)), Ok((b_alloc_id, b_offset, _))) => {
271-
// Found allocation for both. They must be into the same allocation.
272-
if a_alloc_id != b_alloc_id {
273-
throw_ub_custom!(
274-
fluent::const_eval_offset_from_different_allocations,
275-
name = intrinsic_name,
276-
);
277-
}
278-
// Use these offsets for distance calculation.
279-
(a_offset.bytes(), b_offset.bytes())
280-
}
281284
};
282285

283286
// Compute distance.

‎library/core/src/ptr/const_ptr.rs

+6-6
Original file line numberDiff line numberDiff line change
@@ -604,9 +604,9 @@ impl<T: ?Sized> *const T {
604604
///
605605
/// * `self` and `origin` must either
606606
///
607+
/// * point to the same address, or
607608
/// * both be *derived from* a pointer to the same [allocated object], and the memory range between
608-
/// the two pointers must be either empty or in bounds of that object. (See below for an example.)
609-
/// * or both be derived from an integer literal/constant, and point to the same address.
609+
/// the two pointers must be in bounds of that object. (See below for an example.)
610610
///
611611
/// * The distance between the pointers, in bytes, must be an exact multiple
612612
/// of the size of `T`.
@@ -653,14 +653,14 @@ impl<T: ?Sized> *const T {
653653
/// let ptr1 = Box::into_raw(Box::new(0u8)) as *const u8;
654654
/// let ptr2 = Box::into_raw(Box::new(1u8)) as *const u8;
655655
/// let diff = (ptr2 as isize).wrapping_sub(ptr1 as isize);
656-
/// // Make ptr2_other an "alias" of ptr2, but derived from ptr1.
657-
/// let ptr2_other = (ptr1 as *const u8).wrapping_offset(diff);
656+
/// // Make ptr2_other an "alias" of ptr2.add(1), but derived from ptr1.
657+
/// let ptr2_other = (ptr1 as *const u8).wrapping_offset(diff).wrapping_offset(1);
658658
/// assert_eq!(ptr2 as usize, ptr2_other as usize);
659659
/// // Since ptr2_other and ptr2 are derived from pointers to different objects,
660660
/// // computing their offset is undefined behavior, even though
661-
/// // they point to the same address!
661+
/// // they point to addresses that are in-bounds of the same object!
662662
/// unsafe {
663-
/// let zero = ptr2_other.offset_from(ptr2); // Undefined Behavior
663+
/// let one = ptr2_other.offset_from(ptr2); // Undefined Behavior! ⚠️
664664
/// }
665665
/// ```
666666
#[stable(feature = "ptr_offset_from", since = "1.47.0")]

‎library/core/src/ptr/mut_ptr.rs

+6-6
Original file line numberDiff line numberDiff line change
@@ -829,9 +829,9 @@ impl<T: ?Sized> *mut T {
829829
///
830830
/// * `self` and `origin` must either
831831
///
832+
/// * point to the same address, or
832833
/// * both be *derived from* a pointer to the same [allocated object], and the memory range between
833-
/// the two pointers must be either empty or in bounds of that object. (See below for an example.)
834-
/// * or both be derived from an integer literal/constant, and point to the same address.
834+
/// the two pointers must be in bounds of that object. (See below for an example.)
835835
///
836836
/// * The distance between the pointers, in bytes, must be an exact multiple
837837
/// of the size of `T`.
@@ -878,14 +878,14 @@ impl<T: ?Sized> *mut T {
878878
/// let ptr1 = Box::into_raw(Box::new(0u8));
879879
/// let ptr2 = Box::into_raw(Box::new(1u8));
880880
/// let diff = (ptr2 as isize).wrapping_sub(ptr1 as isize);
881-
/// // Make ptr2_other an "alias" of ptr2, but derived from ptr1.
882-
/// let ptr2_other = (ptr1 as *mut u8).wrapping_offset(diff);
881+
/// // Make ptr2_other an "alias" of ptr2.add(1), but derived from ptr1.
882+
/// let ptr2_other = (ptr1 as *mut u8).wrapping_offset(diff).wrapping_offset(1);
883883
/// assert_eq!(ptr2 as usize, ptr2_other as usize);
884884
/// // Since ptr2_other and ptr2 are derived from pointers to different objects,
885885
/// // computing their offset is undefined behavior, even though
886-
/// // they point to the same address!
886+
/// // they point to addresses that are in-bounds of the same object!
887887
/// unsafe {
888-
/// let zero = ptr2_other.offset_from(ptr2); // Undefined Behavior
888+
/// let one = ptr2_other.offset_from(ptr2); // Undefined Behavior! ⚠️
889889
/// }
890890
/// ```
891891
#[stable(feature = "ptr_offset_from", since = "1.47.0")]

‎library/core/src/ptr/non_null.rs

+7-6
Original file line numberDiff line numberDiff line change
@@ -735,9 +735,9 @@ impl<T: ?Sized> NonNull<T> {
735735
///
736736
/// * `self` and `origin` must either
737737
///
738+
/// * point to the same address, or
738739
/// * both be *derived from* a pointer to the same [allocated object], and the memory range between
739-
/// the two pointers must be either empty or in bounds of that object. (See below for an example.)
740-
/// * or both be derived from an integer literal/constant, and point to the same address.
740+
/// the two pointers must be in bounds of that object. (See below for an example.)
741741
///
742742
/// * The distance between the pointers, in bytes, must be an exact multiple
743743
/// of the size of `T`.
@@ -789,14 +789,15 @@ impl<T: ?Sized> NonNull<T> {
789789
/// let ptr1 = NonNull::new(Box::into_raw(Box::new(0u8))).unwrap();
790790
/// let ptr2 = NonNull::new(Box::into_raw(Box::new(1u8))).unwrap();
791791
/// let diff = (ptr2.addr().get() as isize).wrapping_sub(ptr1.addr().get() as isize);
792-
/// // Make ptr2_other an "alias" of ptr2, but derived from ptr1.
793-
/// let ptr2_other = NonNull::new(ptr1.as_ptr().wrapping_byte_offset(diff)).unwrap();
792+
/// // Make ptr2_other an "alias" of ptr2.add(1), but derived from ptr1.
793+
/// let diff_plus_1 = diff.wrapping_add(1);
794+
/// let ptr2_other = NonNull::new(ptr1.as_ptr().wrapping_byte_offset(diff_plus_1)).unwrap();
794795
/// assert_eq!(ptr2.addr(), ptr2_other.addr());
795796
/// // Since ptr2_other and ptr2 are derived from pointers to different objects,
796797
/// // computing their offset is undefined behavior, even though
797-
/// // they point to the same address!
798+
/// // they point to addresses that are in-bounds of the same object!
798799
///
799-
/// let zero = unsafe { ptr2_other.offset_from(ptr2) }; // Undefined Behavior
800+
/// let one = unsafe { ptr2_other.offset_from(ptr2) }; // Undefined Behavior! ⚠️
800801
/// ```
801802
#[inline]
802803
#[cfg_attr(miri, track_caller)] // even without panics, this helps for Miri backtraces

‎src/tools/miri/tests/pass/zero-sized-accesses-and-offsets.rs

-3
Original file line numberDiff line numberDiff line change
@@ -39,8 +39,6 @@ fn test_ptr(ptr: *mut ()) {
3939
// Distance.
4040
let ptr = ptr.cast::<i32>();
4141
ptr.offset_from(ptr);
42-
/*
43-
FIXME: this is disabled for now as these cases are not yet allowed.
4442
// Distance from other "bad" pointers that have the same address, but different provenance. Some
4543
// of this is library UB, but we don't want it to be language UB since that would violate
4644
// provenance monotonicity: if we allow computing the distance between two ptrs with no
@@ -54,6 +52,5 @@ fn test_ptr(ptr: *mut ()) {
5452
// - Distance from use-after-free pointer
5553
drop(b);
5654
ptr.offset_from(other_ptr.with_addr(ptr.addr()));
57-
*/
5855
}
5956
}

‎tests/ui/consts/offset_from_ub.rs

+19-14
Original file line numberDiff line numberDiff line change
@@ -32,12 +32,6 @@ pub const NOT_MULTIPLE_OF_SIZE: isize = {
3232
//~| 1_isize cannot be divided by 2_isize without remainder
3333
};
3434

35-
pub const OFFSET_FROM_NULL: isize = {
36-
let ptr = 0 as *const u8;
37-
// Null isn't special for zero-sized "accesses" (i.e., the range between the two pointers)
38-
unsafe { ptr_offset_from(ptr, ptr) }
39-
};
40-
4135
pub const DIFFERENT_INT: isize = { // offset_from with two different integers: like DIFFERENT_ALLOC
4236
let ptr1 = 8 as *const u8;
4337
let ptr2 = 16 as *const u8;
@@ -63,14 +57,6 @@ const OUT_OF_BOUNDS_2: isize = {
6357
//~| pointer to 10 bytes starting at offset 0 is out-of-bounds
6458
};
6559

66-
const OUT_OF_BOUNDS_SAME: isize = {
67-
let start_ptr = &4 as *const _ as *const u8;
68-
let length = 10;
69-
let end_ptr = (start_ptr).wrapping_add(length);
70-
// Out-of-bounds is fine as long as the range between the pointers is empty.
71-
unsafe { ptr_offset_from(end_ptr, end_ptr) }
72-
};
73-
7460
pub const DIFFERENT_ALLOC_UNSIGNED: usize = {
7561
let uninit = std::mem::MaybeUninit::<Struct>::uninit();
7662
let base_ptr: *const Struct = &uninit as *const _ as *const Struct;
@@ -130,4 +116,23 @@ pub const OFFSET_VERY_FAR2: isize = {
130116
//~^ inside
131117
};
132118

119+
// If the pointers are the same, OOB/null/UAF is fine.
120+
pub const OFFSET_FROM_NULL_SAME: isize = {
121+
let ptr = 0 as *const u8;
122+
unsafe { ptr_offset_from(ptr, ptr) }
123+
};
124+
const OUT_OF_BOUNDS_SAME: isize = {
125+
let start_ptr = &4 as *const _ as *const u8;
126+
let length = 10;
127+
let end_ptr = (start_ptr).wrapping_add(length);
128+
unsafe { ptr_offset_from(end_ptr, end_ptr) }
129+
};
130+
const UAF_SAME: isize = {
131+
let uaf_ptr = {
132+
let x = 0;
133+
&x as *const i32
134+
};
135+
unsafe { ptr_offset_from(uaf_ptr, uaf_ptr) }
136+
};
137+
133138
fn main() {}

‎tests/ui/consts/offset_from_ub.stderr

+11-11
Original file line numberDiff line numberDiff line change
@@ -24,55 +24,55 @@ LL | unsafe { ptr_offset_from(field_ptr, base_ptr as *const u16) }
2424
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ exact_div: 1_isize cannot be divided by 2_isize without remainder
2525

2626
error[E0080]: evaluation of constant value failed
27-
--> $DIR/offset_from_ub.rs:44:14
27+
--> $DIR/offset_from_ub.rs:38:14
2828
|
2929
LL | unsafe { ptr_offset_from(ptr2, ptr1) }
3030
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^ `ptr_offset_from` called on different pointers without provenance (i.e., without an associated allocation)
3131

3232
error[E0080]: evaluation of constant value failed
33-
--> $DIR/offset_from_ub.rs:53:14
33+
--> $DIR/offset_from_ub.rs:47:14
3434
|
3535
LL | unsafe { ptr_offset_from(end_ptr, start_ptr) }
3636
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ out-of-bounds `offset_from`: ALLOC0 has size 4, so pointer to 10 bytes starting at offset 0 is out-of-bounds
3737

3838
error[E0080]: evaluation of constant value failed
39-
--> $DIR/offset_from_ub.rs:62:14
39+
--> $DIR/offset_from_ub.rs:56:14
4040
|
4141
LL | unsafe { ptr_offset_from(start_ptr, end_ptr) }
4242
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ out-of-bounds `offset_from`: ALLOC1 has size 4, so pointer to 10 bytes starting at offset 0 is out-of-bounds
4343

4444
error[E0080]: evaluation of constant value failed
45-
--> $DIR/offset_from_ub.rs:79:14
45+
--> $DIR/offset_from_ub.rs:65:14
4646
|
4747
LL | unsafe { ptr_offset_from_unsigned(field_ptr, base_ptr) }
4848
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ `ptr_offset_from_unsigned` called on pointers into different allocations
4949

5050
error[E0080]: evaluation of constant value failed
51-
--> $DIR/offset_from_ub.rs:86:14
51+
--> $DIR/offset_from_ub.rs:72:14
5252
|
5353
LL | unsafe { ptr_offset_from(ptr2, ptr1) }
5454
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^ `ptr_offset_from` called when first pointer is too far ahead of second
5555

5656
error[E0080]: evaluation of constant value failed
57-
--> $DIR/offset_from_ub.rs:92:14
57+
--> $DIR/offset_from_ub.rs:78:14
5858
|
5959
LL | unsafe { ptr_offset_from(ptr1, ptr2) }
6060
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^ `ptr_offset_from` called when first pointer is too far before second
6161

6262
error[E0080]: evaluation of constant value failed
63-
--> $DIR/offset_from_ub.rs:100:14
63+
--> $DIR/offset_from_ub.rs:86:14
6464
|
6565
LL | unsafe { ptr_offset_from(ptr1, ptr2) }
6666
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^ `ptr_offset_from` called when first pointer is too far before second
6767

6868
error[E0080]: evaluation of constant value failed
69-
--> $DIR/offset_from_ub.rs:107:14
69+
--> $DIR/offset_from_ub.rs:93:14
7070
|
7171
LL | unsafe { ptr_offset_from_unsigned(p, p.add(2) ) }
7272
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ `ptr_offset_from_unsigned` called when first pointer has smaller offset than second: 0 < 8
7373

7474
error[E0080]: evaluation of constant value failed
75-
--> $DIR/offset_from_ub.rs:114:14
75+
--> $DIR/offset_from_ub.rs:100:14
7676
|
7777
LL | unsafe { ptr_offset_from_unsigned(ptr2, ptr1) }
7878
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ `ptr_offset_from_unsigned` called when first pointer is too far ahead of second
@@ -85,7 +85,7 @@ error[E0080]: evaluation of constant value failed
8585
note: inside `std::ptr::const_ptr::<impl *const u8>::offset_from`
8686
--> $SRC_DIR/core/src/ptr/const_ptr.rs:LL:COL
8787
note: inside `OFFSET_VERY_FAR1`
88-
--> $DIR/offset_from_ub.rs:123:14
88+
--> $DIR/offset_from_ub.rs:109:14
8989
|
9090
LL | unsafe { ptr2.offset_from(ptr1) }
9191
| ^^^^^^^^^^^^^^^^^^^^^^
@@ -98,7 +98,7 @@ error[E0080]: evaluation of constant value failed
9898
note: inside `std::ptr::const_ptr::<impl *const u8>::offset_from`
9999
--> $SRC_DIR/core/src/ptr/const_ptr.rs:LL:COL
100100
note: inside `OFFSET_VERY_FAR2`
101-
--> $DIR/offset_from_ub.rs:129:14
101+
--> $DIR/offset_from_ub.rs:115:14
102102
|
103103
LL | unsafe { ptr1.offset_from(ptr2.wrapping_offset(1)) }
104104
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

0 commit comments

Comments
 (0)
Failed to load comments.