Skip to content

Adapt to KernelIntrinsics#868

Draft
christiangnrd wants to merge 5 commits into
JuliaGPU:mainfrom
christiangnrd:intrinsics
Draft

Adapt to KernelIntrinsics#868
christiangnrd wants to merge 5 commits into
JuliaGPU:mainfrom
christiangnrd:intrinsics

Conversation

@christiangnrd

Copy link
Copy Markdown
Member

No description provided.

@christiangnrd

christiangnrd commented Dec 18, 2025

Copy link
Copy Markdown
Member Author

id for AMDGPU is used to identify whether we need to launch hostcalls, based on the specific id of the global variable. It is not needed for KA, so can be dropped as in https://github.com/JuliaGPU/AMDGPU.jl/pull/868/files#diff-082b94339c8f038178ee472ca9b6feec6f27f434c469138f168031f248f223f9R197.

(From JuliaGPU/KernelAbstractions.jl#666)

Is my changing static local code generation from LLVMExternalLinkage to LLVMInternalLinkage going to cause any unintended effects? I had to do so to prevent local memory allocations from being treated as the same one if there happen to be more than one in a kernel. Otherwise I think we’ll have to readd the id parameter.

@pxl-th

pxl-th commented Dec 19, 2025

Copy link
Copy Markdown
Member

Is my changing static local code generation from LLVMExternalLinkage to LLVMInternalLinkage going to cause any unintended effects?

I don't think. I think it is a remnant from pre-HIP times.

These always run on the CPU backend which is currently broken in 1.12, creating false negatives for the GPU tests

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMDGPU.jl Benchmarks

Details
Benchmark suite Current: ac192c6 Previous: 756602c Ratio
amdgpu/synchronization/context/device 590 ns 600 ns 0.98
amdgpu/synchronization/stream/blocking 250 ns 240 ns 1.04
amdgpu/synchronization/stream/nonblocking 330 ns 340 ns 0.97
array/accumulate/Float32/1d 86881 ns 86251 ns 1.01
array/accumulate/Float32/dims=1 391965 ns 393845 ns 1.00
array/accumulate/Float32/dims=1L 134921 ns 131681 ns 1.02
array/accumulate/Float32/dims=2 131872 ns 103022 ns 1.28
array/accumulate/Float32/dims=2L 2698909 ns 2827930 ns 0.95
array/accumulate/Int64/1d 94831 ns 96412 ns 0.98
array/accumulate/Int64/dims=1 284074 ns 285244 ns 1.00
array/accumulate/Int64/dims=1L 170523 ns 160812 ns 1.06
array/accumulate/Int64/dims=2 125882 ns 120772 ns 1.04
array/accumulate/Int64/dims=2L 2927692 ns 3014433 ns 0.97
array/broadcast 117332 ns 128932 ns 0.91
array/construct 1700 ns 1680 ns 1.01
array/copy 37031 ns 39371 ns 0.94
array/copyto!/cpu_to_gpu 120822 ns 114832 ns 1.05
array/copyto!/gpu_to_cpu 124632 ns 152432 ns 0.82
array/copyto!/gpu_to_gpu 98461 ns 88321 ns 1.11
array/iteration/findall/bool 180012 ns 181912 ns 0.99
array/iteration/findall/int 192042 ns 190933 ns 1.01
array/iteration/findfirst/bool 119882 ns 114451 ns 1.05
array/iteration/findfirst/int 115222 ns 116331 ns 0.99
array/iteration/findmin/1d 168752 ns 166203 ns 1.02
array/iteration/findmin/2d 157093 ns 156173 ns 1.01
array/iteration/logical 355825 ns 346025 ns 1.03
array/iteration/scalar 289824 ns 289864 ns 1.00
array/permutedims/2d 75511 ns 64761 ns 1.17
array/permutedims/3d 76151 ns 73791 ns 1.03
array/permutedims/4d 86581 ns 76481 ns 1.13
array/random/rand/Float32 52520 ns 51540 ns 1.02
array/random/rand/Int64 60340 ns 56210 ns 1.07
array/random/rand!/Float32 93992 ns 142162 ns 0.66
array/random/rand!/Int64 124721 ns 141832 ns 0.88
array/random/randn/Float32 102982 ns 86921 ns 1.18
array/random/randn!/Float32 109462 ns 152202 ns 0.72
array/reductions/mapreduce/Float32/1d 131772 ns 132902 ns 0.99
array/reductions/mapreduce/Float32/dims=1 94731 ns 95052 ns 1.00
array/reductions/mapreduce/Float32/dims=1L 774772 ns 777081 ns 1.00
array/reductions/mapreduce/Float32/dims=2 96521 ns 96731 ns 1.00
array/reductions/mapreduce/Float32/dims=2L 302164 ns 299584 ns 1.01
array/reductions/mapreduce/Int64/1d 132812 ns 133322 ns 1.00
array/reductions/mapreduce/Int64/dims=1 94811 ns 78081 ns 1.21
array/reductions/mapreduce/Int64/dims=1L 780651 ns 783471 ns 1.00
array/reductions/mapreduce/Int64/dims=2 95942 ns 96252 ns 1.00
array/reductions/mapreduce/Int64/dims=2L 300344 ns 308254 ns 0.97
array/reductions/reduce/Float32/1d 132252 ns 132802 ns 1.00
array/reductions/reduce/Float32/dims=1 94301 ns 94832 ns 0.99
array/reductions/reduce/Float32/dims=1L 774352 ns 774621 ns 1.00
array/reductions/reduce/Float32/dims=2 96151 ns 96802 ns 0.99
array/reductions/reduce/Float32/dims=2L 295874 ns 307245 ns 0.96
array/reductions/reduce/Int64/1d 132822 ns 129672 ns 1.02
array/reductions/reduce/Int64/dims=1 94751 ns 78151 ns 1.21
array/reductions/reduce/Int64/dims=1L 781571 ns 781931 ns 1.00
array/reductions/reduce/Int64/dims=2 95961 ns 96192 ns 1.00
array/reductions/reduce/Int64/dims=2L 306864 ns 298414 ns 1.03
array/reverse/1d 43951 ns 44380 ns 0.99
array/reverse/1dL 75331 ns 74131 ns 1.02
array/reverse/1dL_inplace 114591 ns 108282 ns 1.06
array/reverse/1d_inplace 78351 ns 86471 ns 0.91
array/reverse/2d 51401 ns 50661 ns 1.01
array/reverse/2dL 100962 ns 100341 ns 1.01
array/reverse/2dL_inplace 126372 ns 117622 ns 1.07
array/reverse/2d_inplace 72271 ns 95391 ns 0.76
array/sorting/1d 342675 ns 341945 ns 1.00
integration/byval/reference 38860 ns 38830 ns 1.00
integration/byval/slices=1 39931 ns 40880 ns 0.98
integration/byval/slices=2 135882 ns 158462 ns 0.86
integration/byval/slices=3 239904 ns 238013 ns 1.01
integration/volumerhs 5044422 ns 4942659 ns 1.02
kernel/indexing 54831 ns 43630 ns 1.26
kernel/indexing_checked 63931 ns 128022 ns 0.50
kernel/launch 1310 ns 1290 ns 1.02
kernel/rand 197883 ns 106671 ns 1.86
latency/import 1570558951 ns 1501349912 ns 1.05
latency/precompile 11972948400 ns 12041117438 ns 0.99
latency/ttfp 13963592706 ns 10491950084 ns 1.33

This comment was automatically generated by workflow using github-action-benchmark.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants