You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Auto merge of #37642 - nnethercote:no-HirVec-of-P, r=michaelwoerister
Change HirVec<P<T>> to HirVec<T> in hir:: Expr.
This PR changes data structures like this:
```
[ ExprArray | 8 | P ]
|
v
[ P | P | P | P | P | P | P | P ]
|
v
[ ExprTup | 2 | P ]
|
v
[ P | P ]
|
v
[ Expr ]
```
to this:
```
[ ExprArray | 8 | P ]
|
v
[ [ ExprTup | 2 | P ] | ... ]
|
v
[ Expr | Expr ]
```
I thought this would be a win for #36799, and on a cut-down version of that workload this reduces the peak heap size (as measured by Massif) from 885 MiB to 875 MiB. However, the peak RSS as measured by `-Ztime-passes` and by `/usr/bin/time` increases by about 30 MiB.
I'm not sure why. Just look at the picture above -- the second data structure clearly takes up less space than the first. My best idea relates to unused elements in the slices. `HirVec<Expr>` is a typedef for `P<[Expr]>`. If there were any unused excess elements then I could see that memory usage would increase, because those excess elements are larger in `HirVec<Expr>` than in `HirVec<P<Expr>>`. But AIUI there are no such excess elements, and Massif's measurements corroborate that.
However, the two main creation points for these data structures are these lines from `lower_expr`:
```rust
ExprKind::Vec(ref exprs) => {
hir::ExprArray(exprs.iter().map(|x| self.lower_expr(x)).collect())
}
ExprKind::Tup(ref elts) => {
hir::ExprTup(elts.iter().map(|x| self.lower_expr(x)).collect())
}
```
I suspect what is happening is that temporary excess elements are created within the `collect` calls. The workload from #36799 has many 2-tuples and 3-tuples and when `Vec` gets doubled it goes from a capacity of 1 to 4, which would lead to excess elements. Though, having said that, `Vec::with_capacity` doesn't create excess AFAICT. So I'm not sure. What goes on inside `collect` is complex.
Anyway, in its current form this PR is a memory consumption regression and so not worth landing but I figured I'd post it in case anyone has additional insight.
0 commit comments