Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
266 changes: 266 additions & 0 deletions doc/src/challenges/0031-enriching-mir-consts.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,266 @@
# Challenge 31: Enriching MIR constants

- **Status:** *Open*
- **Solution:** *Option field to point to the PR that solved this challenge.*
- **Tracking Issue:** *https://github.com/model-checking/verify-rust-std/issues/594*
- **Start date:** *TBD*
- **End date:** *TBD*
- **Reward:** *TBD*

-------------------


## Goal

Investigate, and where feasible implement, changes to the way MIR constants
are represented and exposed, on both the internal (`mir::Const`, `ty::Const`)
and public (`MirConst`, `TyConst`) sides, so that verification tools can
reason about Rust constants in a form that matches their semantic role rather
than being forced through whichever byte-level representation the current
access path happens to hand them. Two distinct gaps are in scope:

1. **Preserve `Unevaluated` variants.** In rustc today, monomorphisation
and constant evaluation are mutually entangled in the same flow, a
uroboros: substitution feeds evaluation with the concrete types and
arguments it needs, while evaluation feeds substitution back with values
for const generic parameters and feeds the mono collector with the
cross-item provenance through which further reachable items are
discovered. The pipeline that delivers monomorphised MIR to a consumer
therefore consumes the evaluated values rather than propagating the
`Unevaluated` form through. A tool that wants to reason about a symbolic
constant reference at a high level has to either fall back to
pre-monomorphisation MIR, or reverse the evaluation step by hand to
reconstruct high-level values (as
[KMIR](https://github.com/runtimeverification/mir-semantics/) and
[mir-json](https://github.com/GaloisInc/mir-json) do today).
2. **Enrich Public MIR evaluated representation.** Bridge structured forms
(`ValTree`) on the public side for use of high-level data, instead of
collapsing every evaluated constant into `Allocation`. This gap is
acknowledged upstream by comment
["FIXME: These should be a valtree"](https://github.com/rust-lang/rust/blob/36ba2c7712052d731a7082d0eba5ed3d9d56c133/compiler/rustc_public/src/ty.rs#L149).

This is an infrastructure challenge: it does not verify a specific standard
library module, but it enables a class of verification techniques that today
must either reason over layout-dependent byte blobs or reverse the compiler's
evaluation steps at the tool level.

## Motivation

Both the internal MIR surface (`mir::Const`, `ty::Const`) and the public
`rustc_public` surface (`MirConst`, `TyConst`) already model unevaluated
constants through their `Unevaluated` variants. What does not exist, in
practice, is a path by which a tool can obtain a monomorphised MIR body that
still contains those unevaluated forms. The access routes that deliver
monomorphised MIR to consumers run constant evaluation as part of the same
flow, and constants are lowered to `ConstValue` before becoming accesible
again.

Evidence that this matters to tool authors:
- [mir-json](https://github.com/GaloisInc/mir-json) (the MIR exporter behind
`crux-mir` / Crucible) ships a hand-rolled post-evaluation
[MIR decoder](https://github.com/GaloisInc/mir-json/blob/2fc80d73eaede1e394a4ec6970fb5ca9e33efd73/src/analyz/ty_json.rs#L1240-L1464).
While the problem is solved for `mir-json`, without an upstream solution it
is likely that any other project that has the same requirements will attempt
to re-implement this work.
- [mir-semantics](https://github.com/runtimeverification/mir-semantics/)
implements a similar solution decoding constant `Allocation`s Public MIR
[during parsing](https://github.com/runtimeverification/mir-semantics/blob/master/kmir/src/kmir/decoding.py)
of the exported Stable MIR JSON.
- Open project-stable-mir issues
[#96](https://github.com/rust-lang/project-stable-mir/issues/96) and
[#101](https://github.com/rust-lang/project-stable-mir/issues/101) are
concrete symptoms of the brittleness. Improvements to the constant
representations or access patterns should be solutions to both.

The objective is to make a high-level constant form (`UnevaluatedConst`
variants, or an equivalent structured representation that preserves
cross-item reachability without committing to a particular byte layout)
available in MIR bodies that are otherwise monomorphised and evaluated,
without harming downstream codegen performance. Which of several candidate
implementations is the right one is itself part of the engagement.

## Description

There are two intertwined gaps, corresponding to the two bullets in the Goal:
the MIR pipeline couples monomorphisation with constant evaluation, and the
public constant surface collapses several distinct internal shapes into a
single byte-level form. Each is described in turn below.

**Monomorphisation and constant evaluation.** Inside
[collect_and_partition_mono_items](https://github.com/rust-lang/rust/blob/37d85e592f9ae5f20f7d9a9f99785246fa7298da/compiler/rustc_monomorphize/src/partitioning.rs#L1132-L1237),
constant evaluation is reached on the mono path through three sites in
`compiler/rustc_monomorphize/src/collector.rs`:

- [Const::eval(...)](https://github.com/rust-lang/rust/blob/37d85e592f9ae5f20f7d9a9f99785246fa7298da/compiler/rustc_monomorphize/src/collector.rs#L707-L722)
from `MirUsedCollector::eval_constant`, hit on every MIR const operand
encountered during a body walk.
- [tcx.eval_static_initializer](https://github.com/rust-lang/rust/blob/37d85e592f9ae5f20f7d9a9f99785246fa7298da/compiler/rustc_monomorphize/src/collector.rs#L440-L444)
run once per `MonoItem::Static` root.
- [tcx.const_eval_poly](https://github.com/rust-lang/rust/blob/37d85e592f9ae5f20f7d9a9f99785246fa7298da/compiler/rustc_monomorphize/src/collector.rs#L1580-L1582)
from `RootCollector::process_item`, run on top-level const items under
`MonoItemCollectionStrategy::Eager`. In the default Lazy mode this site is
inert; consts there are reached through their use sites via first method.

By the time a downstream tool sees the resulting MIR,
`ConstantKind::Unevaluated` and `ty::ConstKind::Unevaluated` have been
consumed and replaced with thier evaluated representations.

The design space admits several candidate approaches:

1. **Skip "`evaluate_const`" on the monomorphisation path.** Do not run
constant evaluation as part of producing the MIR that downstream tools
receive; tools that need evaluation call "`evaluate_const`" themselves.
This is reasonable for any constant whose value is `ValTree`-shaped
(pointer-free data: integers, bools, chars, etc.), where the evaluated form
carries no information beyond the structured value itself. It is *not*
(currently) reasonable for pointer-bearing constants. The cross-item
reachability the collector relies on (function pointers in const tables,
vtables in `&dyn` consts, etc.) lives in the provenance field of the
evaluated `Allocation`. Skipping evaluation there silently drops those edges
from the discovered item set.
2. **Evaluate, then lift back to a high-level form as a compiler query.**
Continue evaluating constants, but expose the result through a query that
lifts an evaluated form into a structured tree shaped like an extended
`ValTree`. This tree form would encode `GlobalAlloc::Function`, `Static`,
and `VTable` references as named edges rather than as opaque provenance
bytes. `mir-json` already does this in
[`try_render_opty`](https://github.com/GaloisInc/mir-json/blob/2fc80d73eaede1e394a4ec6970fb5ca9e33efd73/src/analyz/ty_json.rs#L1240-L1464);
upstreaming a query of that shape gives every downstream tool the
same lifting without each having to re-derive it, preserves
cross-item reachability, and abandons commitment to a particular byte
layout.
3. **Process both pre- and post-evaluation MIR and merge.** Produce both
shapes, keep the monomorphised, optimised body, and replace the
evaluated constant leaves with their unevaluated counterparts where
the type admits it.
4. **Something else.** An approach not yet enumerated, for example an
opt-in callback positioned between optimisation and constant lowering.

The constraint these approaches must respect is that on the
monomorphisation path, evaluation is what surfaces cross-item
reachability via allocation provenance. Approach 1 therefore cannot
stand alone for any program that uses pointer-bearing constants. To
make it cover those cases, the constant types themselves would have to grow
a high level representation for refernces (for example, extending `ValTree` to
carry `Function`, `Static`, and `VTable` references as named branches), at
which point it converges with Approach 2's lift. Approaches 2 and 3 execute
constant evaluation fully and so retain the allocation information; their
feasibility turns on landing the lifting or the merge without regressing codegen
quality and performance.

Each candidate may also turn out to be infeasible for reasons not
visible from outside the compiler (evaluation-order dependencies inside
later passes, query-graph cycles, performance regressions); the
engagement starts with the candidate the workgroup considers most
promising and falls back to the next as evidence accumulates.

**Constant representations across the bridge.** The public API collapses
several distinct internal shapes (`ValTree`, `ConstValue`) into a single
byte-level `Allocation`. That collapse loses placeless type-identity
semantics (the FIXME at `compiler/rustc_public/src/ty.rs:149` reads simply
"These should be a valtree") and makes the `TyConst -> MirConst` lowering
lossy, since padding and allocation identifiers survive the bridge even
though they carry no const-identity information. The diagram below shows the
type surface on each side. Variants marked `(no link)` have no constructor
in the bridge: for most of these the internal converter hits
`unimplemented!()` or `unreachable!()`; `ConstantKind::Param` on the public
side is declared but never produced. The `ZSTValue` special case on the
public side is a one-type-class preservation of placelessness; extending it
to general `ValTree`-shaped types is a scaling of an existing pattern, not a
new direction.

```text
INTERNAL (rustc) PUBLIC (rustc_public)
================= ======================

ty::Const / ty::ConstKind TyConst / TyConstKind
+----------------------+ +------------------------------+
| Param | -- mirror ----> | Param |
| Unevaluated(Def,..) | -- mirror ----> | Unevaluated(ConstDef,Args) |
| Value(ValTree) | -- collapse --> | Value(Ty, Allocation) |
| ValTreeKind: | | ValTree to | |
| Leaf(ScalarInt) | | Allocation | |
| Branch([...]) | +------------> | ZSTValue(Ty) |
| Error etc. (no link) | or ZSTValue | |
| Bound (no link) | | Bound (no link) |
+----------+-----------+ +--------------+---------------+
| |
| wrapped as | wrapped as
| mir::Const::Ty | ConstantKind::Ty
v v
mir::Const MirConst / ConstantKind
+----------------------+ +-------------------------------+
| Ty(Ty, ty::Const) | -- mirror ----> | Ty(TyConst) |
| Unevaluated(.., Ty) | -- mirror ----> | Unevaluated(UnevaluatedConst) |
| Val(ConstValue, Ty) | -- collapse --> | Allocated(Allocation) |
| ConstValue: | | ConstValue to | |
| Scalar | | Allocation | |
| ZeroSized | +------------> | ZeroSized |
| Slice{..} | or ZeroSized | |
| Indirect{..} | | Param(ParamConst) (no link) |
+----------------------+ +-------------------------------+
```

### Assumptions

- Changes land upstream in `rust-lang/rust`. Patches carried only in a fork do
not satisfy the challenge.
- Performance and correctness of the existing codegen path are preserved.
Approaches that produce slower or incorrect code, even as an opt-in, are out
of scope unless the performance cost is demonstrably small and the opt-in is
off by default.
- The goal is high-level constant visibility for downstream tools, whether
surfaced as `UnevaluatedConst` or as an equivalent structured
representation. Exposing that visibility through `rustc_public` is in
scope only to the extent needed to demonstrate the feature; broader
`rustc_public` stabilisation work belongs to a separate challenge.

### Success Criteria

This is an infrastructure challenge. The engagement targets a landed
implementation upstream; the criteria below describe that target and the
analysis that accompanies it.

**A. Implementation landed upstream.** An opt-in change is merged into
`rust-lang/rust` that makes a high-level constant form (`UnevaluatedConst`,
or an equivalent structured representation) visible to downstream tools
consuming monomorphisation-path MIR. The engagement begins with the
candidate the workgroup considers most promising and pivots between
candidates as feasibility evidence and upstream review dictate.

**B. Engineering analysis.** The implementation is delivered alongside a
written analysis of the candidate approaches the engagement explored: the
codegen invariants each preserves or breaks, performance cost or benefit
(measured where measurable), and any compiler-internal assumptions or
upstream feedback that constrained the choice. The analysis grounds the
candidate that landed and records the reasoning behind those that did
not, so future contributors revisiting the design space have a starting
point. In the tail case where no candidate clears upstream review during
the engagement, the analysis substitutes for the merged change as the
deliverable.

## Correctness Criteria

The Rust reference's list of undefined behaviour does not apply to this
challenge; the deliverable is compiler infrastructure rather than a run-time
construct. Correctness properties instead:

- Any MIR construct reachable through existing APIs before the change is
reachable after the change, with the same semantics, unless the change is
opt-in and the caller has opted in.
- High-level constant visibility, where achieved, is additive: a tool that
does not engage with constants at all observes no difference in the MIR
it consumes.
- Any non-default mode that surfaces a high-level constant form preserves
the cross-item reachability the existing eval path discovers via
allocation provenance. Where reachability is recovered through a
different mechanism, gaps relative to the eval-path baseline are
documented.
- The default behaviour of the compiler, including optimisation quality and
codegen output, is unchanged outside of the specific opt-in path introduced
by the change.
- Solutions are landed as merged pull requests against `rust-lang/rust`.

Note: All solutions to verification challenges need to satisfy the criteria
established in the [challenge book](../general-rules.md) in addition to the
ones listed above.
Loading