Skip to content

intrinsic-test: sve support #2160

Open
davidtwco wants to merge 15 commits into
rust-lang:mainfrom
davidtwco:intrinsic-test-sve
Open

intrinsic-test: sve support #2160
davidtwco wants to merge 15 commits into
rust-lang:mainfrom
davidtwco:intrinsic-test-sve

Conversation

@davidtwco

@davidtwco davidtwco commented Jun 15, 2026

Copy link
Copy Markdown
Member

This patch contains all the changes necessary for intrinsic-test to generate tests for the SVE intrinsics:

@rustbot

This comment was marked as resolved.

@davidtwco davidtwco force-pushed the intrinsic-test-sve branch 2 times, most recently from 95f286a to ed132a7 Compare June 17, 2026 10:15
Comment thread crates/intrinsic-test/src/common/intrinsic.rs
} = **self
{
let quad = if self.num_lanes() * bl > 64 { "q" } else { "" };
fn comparison_function(&self) -> String {

@sayantn sayantn Jun 17, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there some way to print the diff for SVE? I would imagine it is difficult due to SVE being non-const sized, but is there some way? That would massively improve debuggability. One approach I can suggest is have some small functions that convert from SVE vectors to &[T], e.g.

#[target_feature(enable = "sve")]
pub fn svfloat32_to_slice(a: &svfloat32_t) -> &[NanEqF32] {
    unsafe {
        core::slice::from_raw_parts(core::ptr::from_ref(a).cast(), svcntw() as usize)
    }
}

this might work, with significantly less complexity

View changes since the review

@davidtwco davidtwco Jun 18, 2026

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I've needed to debug them, I've disabled the invocation of intrinsic-test in intrinsic-test.sh and added these snippets to the generated tests that I wanted to debug.

For non-bool vectors, just replace the x0_val and x1_val with the vectors that you want to inspect, e.g. __rust_return_value or __c_return_value:

{
    let num_elems = svcnth() as usize;
    let mut x0_buf = Vec::with_capacity(num_elems);
    let mut x1_buf = Vec::with_capacity(num_elems);
    svst1_u16(__pred, x0_buf.as_mut_ptr(), x0_val);
    x0_buf.set_len(num_elems);
    svst1_u16(__pred, x1_buf.as_mut_ptr(), x1_val);
    x1_buf.set_len(num_elems);
    for i in 0..num_elems {
        let x0_val = x0_buf[i];
        dbg!(i, x0_val);
    }
    for i in 0..num_elems {
        let x1_val = x1_buf[i];
        dbg!(i, x1_val);
    }
}

Similarly for bool vectors:

{
    let num_elems = svcntb() as usize;
    let mut _op_val = Vec::with_capacity(num_elems);
    svst1_u8(op_val, _op_val.as_mut_ptr(), svdup_n_u8(1));
    _op_val.set_len(num_elems);
    for i in 0..num_elems {
        let _op_val_el = if _op_val[i] == 1 { true } else { false };
        dbg!(i, _op_val_el);
    }
}

I'm not sure if what you've suggested will work, happy to try. I'm more than happy to make some improvements here for debuggability of the tests when they do fail, but could that be left to a follow-up?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ended up doing this in the current patch because it was useful for working out what was going on with CI failures

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, didn't understand what you meant by this and the current patch, the GH version shows the sveor version. Also, https://godbolt.org/z/a481YdW1P seems to work.

"
{intrinsics}
"#,
intrinsics = intrinsics.iter().format_with("", |intrinsic, fmt| {

@sayantn sayantn Jun 17, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very minor nit, but you can use something like intrinsic.specializations().for_each(|imm_values| fmt(...))

View changes since the review

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't have a fmt to call if I used for_each

@sayantn

sayantn commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

also, it would be nice if this can be split into a few focused PRs (I can help if you want)

@davidtwco

Copy link
Copy Markdown
Member Author

Submitted rust-lang/rust#158088 to fix the failure in debug mode - should pass CI after that

@davidtwco

Copy link
Copy Markdown
Member Author

also, it would be nice if this can be split into a few focused PRs (I can help if you want)

I'll try and split some out tomorrow

GuillaumeGomez added a commit to GuillaumeGomez/rust that referenced this pull request Jun 18, 2026
…o-no-opt, r=lqd

codegen_ssa: no dbginfo for scalable vec local w/ `-O0`

LLVM uses GlobalISel with `-O0` that doesn't support scalable vectors. It normally falls back to SDAG which does support scalable vectors, but there's a bug that means that isn't happening for debuginfo - so temporarily don't emit debuginfo for scalable vector locals when there are no optimisations until that bug is fixed.

cc llvm/llvm-project#204585
cc rust-lang/stdarch#2160

r? @lqd
rust-timer added a commit to rust-lang/rust that referenced this pull request Jun 18, 2026
Rollup merge of #158088 - davidtwco:scalable-vector-debuginfo-no-opt, r=lqd

codegen_ssa: no dbginfo for scalable vec local w/ `-O0`

LLVM uses GlobalISel with `-O0` that doesn't support scalable vectors. It normally falls back to SDAG which does support scalable vectors, but there's a bug that means that isn't happening for debuginfo - so temporarily don't emit debuginfo for scalable vector locals when there are no optimisations until that bug is fixed.

cc llvm/llvm-project#204585
cc rust-lang/stdarch#2160

r? @lqd
github-actions Bot pushed a commit that referenced this pull request Jun 19, 2026
… r=lqd

codegen_ssa: no dbginfo for scalable vec local w/ `-O0`

LLVM uses GlobalISel with `-O0` that doesn't support scalable vectors. It normally falls back to SDAG which does support scalable vectors, but there's a bug that means that isn't happening for debuginfo - so temporarily don't emit debuginfo for scalable vector locals when there are no optimisations until that bug is fixed.

cc llvm/llvm-project#204585
cc #2160

r? @lqd
@davidtwco

Copy link
Copy Markdown
Member Author

Split out the first batch of changes in #2169

@davidtwco

Copy link
Copy Markdown
Member Author

Split out the second batch of changes in #2170

@davidtwco

Copy link
Copy Markdown
Member Author

Split out the third batch of changes in #2171

@davidtwco

Copy link
Copy Markdown
Member Author

Split out fourth batch of changes in #2172

@davidtwco

Copy link
Copy Markdown
Member Author

Split out final batch of changes in #2173. Shouldn't be much less other than the actual SVE testing implementation here after those are merged, but there will be a few merge conflicts between them as they land.

@rustbot

This comment has been minimized.

@davidtwco davidtwco force-pushed the intrinsic-test-sve branch from ea73af3 to 060d803 Compare June 21, 2026 21:30
@davidtwco davidtwco marked this pull request as ready for review June 21, 2026 21:31
@davidtwco

Copy link
Copy Markdown
Member Author

Rebased after all the prerequisites landed - should pass CI now, works in dev and release locally and the rustc patches are all in nightly - but I'll check back in tomorrow

davidtwco added 14 commits June 22, 2026 11:10
This is just more helpful for knowing what all needs to be fixed when CI
fails.
Replacing `iter_specializations` (which repeatedly invokes a callback)
with an iterator implementation allows `Itertools::format_with` to be
used more broadly, which in turn allows disparate string interpolation
to be combined and hopefully provide greater context to the reader.
Updates `get_load_function` to return `svld{n}_{ty}` when loading a
scalable vector type. Caller of `get_load_function` will still need
updated to handle passing the predicate arguments to these load
functions.
Updates the headers used by generated C code and the target feature flags
passed to the C compiler to enable SVE.
Some SVE intrinsics take booleans as arguments, so there is a need to
support generating a test value array for booleans.
Constraints that correspond to enum types - such as `svpattern` and
`svprfop` - need to be converted to the enum type in order to be used in
a generic instantiation - so introduce a const function for both types
that provides this mapping.
Predicate arguments of type `svbool_t` do not need test value arrays to
be generated as the same enable-all-lanes predicate will be passed to all
invocations of the intrinsic under test. There is no `svld1` equivalent
for `svbool_t` that could be used even if there were test values to
use.
Support defining a local variable containing the predicate that will be
used with all subsequent scalable vector intrinsics.
Implementation of `get_comparison_function` and `get_predicate_function`
for SVE which uses the relevant SVE intrinsics.
Instead of assuming that any scalable boolean argument is a predicate,
handle predicates specifically and generate test values for `svbool_t`
values.
SVE isn't a baseline target feature for `aarch64-unknown-linux-gnu` but
should be enabled when running tests.
Like with non-SVE test generation, comparison of float results in
scalable vectors need special-handling of comparisons.
Enables generation of tests for SVE intrinsics leveraging the changes
from the previous commits.
This makes it far easier to debug what's potentially gone wrong with an
intrinsic test.
@davidtwco davidtwco force-pushed the intrinsic-test-sve branch from 060d803 to c04381b Compare June 22, 2026 12:35
Same as previous redefinitions in stdarch#2163 - these were missed in
that PR because the hardware being tested on was missing the hardware
feature required for the instructions these use.
@davidtwco

davidtwco commented Jun 22, 2026

Copy link
Copy Markdown
Member Author

Pushed c89cacc to fix the failures with svtrn{1,2}, missed these locally as they require f32mm/f64mm and my local test machine didn't have those, but it's the same fix as #2163; and also pushed c04381b to print out the contents of scalable vectors when the tests fail.

Only remaining failures are svcreate and svset. These are curious because I already fixed them locally a while ago with a rustc patch that has long landed. They only fail with the second, third and fourth vectors in the tuple, and when I had investigated this previously, we were generating a memcpy that just didn't copy the second, third and fourth vectors. I fixed that in rust-lang/rust#157915 by correcting how rustc understood the size of these types and that fixed these tests locally for me - but they seem to still be failing in a very similar way here, it's very strange.

@davidtwco

Copy link
Copy Markdown
Member Author

Only remaining failures are svcreate and svset. These are curious because I already fixed them locally a while ago with a rustc patch that has long landed. They only fail with the second, third and fourth vectors in the tuple, and when I had investigated this previously, we were generating a memcpy that just didn't copy the second, third and fourth vectors. I fixed that in rust-lang/rust#157915 by correcting how rustc understood the size of these types and that fixed these tests locally for me - but they seem to still be failing in a very similar way here, it's very strange.

I think it's just that rust-lang/rust#157915 is wrong - it works by chance for me locally because of the vector length my hardware has (and let me assume that maybe llvm.memcpy has some special handling when the type being copied was a scalable vector), but under QEMU w/ sve512, it doesn't copy enough.

@davidtwco

Copy link
Copy Markdown
Member Author

This will be fixed once rust-lang/rust#158253 lands and is on nightly

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants