Skip to content

[runtime/onnxruntime] Paraformer::Forward heap-buffer-overflow: encoder_out_lens reads ONNX int32 tensor as int64_t #2848

@wangxiuwen

Description

@wangxiuwen

Summary

Paraformer::Forward in runtime/onnxruntime/src/paraformer.cpp reads an ONNX int32 tensor (outputTensor[1], encoder_out_lens) as int64_t*, dereferencing 8 bytes from a 4-byte allocation. AddressSanitizer catches a heap-buffer-overflow read on every single inference call.

Location

runtime/onnxruntime/src/paraformer.cpp (current main), line 512 (and consumers at lines 529, 531, 538, 540):

auto outputTensor = m_session_->Run(...);
...
auto encoder_out_lens = outputTensor[1].GetTensorMutableData<int64_t>();
...
result = GreedySearch(floatData, *encoder_out_lens, outputShape[2]);   // line 538

For at least the paraformer-large-contextual ONNX model (speech_paraformer-large-contextual_asr_nat-zh-cn-16k-common-vocab8404-onnx from ModelScope), outputTensor[1] is allocated as a 4-byte int32 tensor. Treating it as int64_t* and dereferencing reads 8 bytes, overflowing 4 bytes past the buffer.

ASAN evidence

Rebuilt with -fsanitize=address and triggered a single inference call:

==1==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x... at pc 0x... thread T...
READ of size 8 at 0x... thread T...
    #0 0x... in funasr::Paraformer::Forward[abi:cxx11](float**, int*, bool, ...) /onnxruntime/src/paraformer.cpp:538
    #1 0x... in FunTpassInferBuffer ... /onnxruntime/src/funasrruntime.cpp:575
    #2 0x... in funasr_tpass_offline_infer ... /onnxruntime/src/funasr_capi.cpp:287

0x...d44 is located 0 bytes to the right of 4-byte region [0x...d40, 0x...d44)
allocated by thread T... here:
    #0 0x... in __interceptor_posix_memalign
    #1 0x... in libonnxruntime.so.1.14.0
    #2 0x... in libonnxruntime.so.1.14.0

The allocation is exactly 4 bytes (the int32 element). The read is 8 bytes (int64_t). Triggered every time the model runs, on the very first inference.

Why production typically doesn't crash

The 4-byte overrun reads into ONNX runtime's posix_memalign padding / glibc tcache freelist metadata, which usually contains zeros. *encoder_out_lens then truncates back to int when passed to GreedySearch/BeamSearch, and the result happens to equal the correct value.

This is undefined behavior. Any change in glibc allocator behavior (a tcache fill pattern change in a future glibc version, swapping in jemalloc, ASAN/MSAN/HWASAN builds, or simply different load patterns) can make those 4 bytes return garbage and produce wrong decoder output — silently, with no crash to flag it.

Proposed fix

GreedySearch(float*, int n_len, ...) and BeamSearch(... int len, ...) (declared in the same file) take their length argument as int. Reading the tensor as int32_t* and passing *encoder_out_lens (4 bytes) to a function expecting int is semantically identical to the current int64_t* → truncate path, but without the UB:

- auto encoder_out_lens = outputTensor[1].GetTensorMutableData<int64_t>();
+ auto encoder_out_lens = outputTensor[1].GetTensorMutableData<int32_t>();

Verified: with the one-line change, ASAN no longer reports the OOB on inference, and the decoded text is byte-identical to what the existing build produces. Tested on paraformer-large-contextual end-to-end with hundreds of requests.

Suggested follow-up

If the ONNX model can in principle emit either int32 or int64 for this output across model variants, the proper fix is to inspect the tensor element type at runtime via GetTensorTypeAndShapeInfo().GetElementType() and branch. But for at least the published ModelScope paraformer-large-contextual model, the emitted dtype is unambiguously int32, and the current code is reading it incorrectly.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions