Adapt to KernelIntrinsics#868
Conversation
30edeac to
e0c661b
Compare
(From JuliaGPU/KernelAbstractions.jl#666) Is my changing static local code generation from |
I don't think. I think it is a remnant from pre-HIP times. |
bc6c228 to
7810464
Compare
7810464 to
7195d90
Compare
These always run on the CPU backend which is currently broken in 1.12, creating false negatives for the GPU tests
c2a607e to
ac192c6
Compare
There was a problem hiding this comment.
AMDGPU.jl Benchmarks
Details
| Benchmark suite | Current: ac192c6 | Previous: 756602c | Ratio |
|---|---|---|---|
amdgpu/synchronization/context/device |
590 ns |
600 ns |
0.98 |
amdgpu/synchronization/stream/blocking |
250 ns |
240 ns |
1.04 |
amdgpu/synchronization/stream/nonblocking |
330 ns |
340 ns |
0.97 |
array/accumulate/Float32/1d |
86881 ns |
86251 ns |
1.01 |
array/accumulate/Float32/dims=1 |
391965 ns |
393845 ns |
1.00 |
array/accumulate/Float32/dims=1L |
134921 ns |
131681 ns |
1.02 |
array/accumulate/Float32/dims=2 |
131872 ns |
103022 ns |
1.28 |
array/accumulate/Float32/dims=2L |
2698909 ns |
2827930 ns |
0.95 |
array/accumulate/Int64/1d |
94831 ns |
96412 ns |
0.98 |
array/accumulate/Int64/dims=1 |
284074 ns |
285244 ns |
1.00 |
array/accumulate/Int64/dims=1L |
170523 ns |
160812 ns |
1.06 |
array/accumulate/Int64/dims=2 |
125882 ns |
120772 ns |
1.04 |
array/accumulate/Int64/dims=2L |
2927692 ns |
3014433 ns |
0.97 |
array/broadcast |
117332 ns |
128932 ns |
0.91 |
array/construct |
1700 ns |
1680 ns |
1.01 |
array/copy |
37031 ns |
39371 ns |
0.94 |
array/copyto!/cpu_to_gpu |
120822 ns |
114832 ns |
1.05 |
array/copyto!/gpu_to_cpu |
124632 ns |
152432 ns |
0.82 |
array/copyto!/gpu_to_gpu |
98461 ns |
88321 ns |
1.11 |
array/iteration/findall/bool |
180012 ns |
181912 ns |
0.99 |
array/iteration/findall/int |
192042 ns |
190933 ns |
1.01 |
array/iteration/findfirst/bool |
119882 ns |
114451 ns |
1.05 |
array/iteration/findfirst/int |
115222 ns |
116331 ns |
0.99 |
array/iteration/findmin/1d |
168752 ns |
166203 ns |
1.02 |
array/iteration/findmin/2d |
157093 ns |
156173 ns |
1.01 |
array/iteration/logical |
355825 ns |
346025 ns |
1.03 |
array/iteration/scalar |
289824 ns |
289864 ns |
1.00 |
array/permutedims/2d |
75511 ns |
64761 ns |
1.17 |
array/permutedims/3d |
76151 ns |
73791 ns |
1.03 |
array/permutedims/4d |
86581 ns |
76481 ns |
1.13 |
array/random/rand/Float32 |
52520 ns |
51540 ns |
1.02 |
array/random/rand/Int64 |
60340 ns |
56210 ns |
1.07 |
array/random/rand!/Float32 |
93992 ns |
142162 ns |
0.66 |
array/random/rand!/Int64 |
124721 ns |
141832 ns |
0.88 |
array/random/randn/Float32 |
102982 ns |
86921 ns |
1.18 |
array/random/randn!/Float32 |
109462 ns |
152202 ns |
0.72 |
array/reductions/mapreduce/Float32/1d |
131772 ns |
132902 ns |
0.99 |
array/reductions/mapreduce/Float32/dims=1 |
94731 ns |
95052 ns |
1.00 |
array/reductions/mapreduce/Float32/dims=1L |
774772 ns |
777081 ns |
1.00 |
array/reductions/mapreduce/Float32/dims=2 |
96521 ns |
96731 ns |
1.00 |
array/reductions/mapreduce/Float32/dims=2L |
302164 ns |
299584 ns |
1.01 |
array/reductions/mapreduce/Int64/1d |
132812 ns |
133322 ns |
1.00 |
array/reductions/mapreduce/Int64/dims=1 |
94811 ns |
78081 ns |
1.21 |
array/reductions/mapreduce/Int64/dims=1L |
780651 ns |
783471 ns |
1.00 |
array/reductions/mapreduce/Int64/dims=2 |
95942 ns |
96252 ns |
1.00 |
array/reductions/mapreduce/Int64/dims=2L |
300344 ns |
308254 ns |
0.97 |
array/reductions/reduce/Float32/1d |
132252 ns |
132802 ns |
1.00 |
array/reductions/reduce/Float32/dims=1 |
94301 ns |
94832 ns |
0.99 |
array/reductions/reduce/Float32/dims=1L |
774352 ns |
774621 ns |
1.00 |
array/reductions/reduce/Float32/dims=2 |
96151 ns |
96802 ns |
0.99 |
array/reductions/reduce/Float32/dims=2L |
295874 ns |
307245 ns |
0.96 |
array/reductions/reduce/Int64/1d |
132822 ns |
129672 ns |
1.02 |
array/reductions/reduce/Int64/dims=1 |
94751 ns |
78151 ns |
1.21 |
array/reductions/reduce/Int64/dims=1L |
781571 ns |
781931 ns |
1.00 |
array/reductions/reduce/Int64/dims=2 |
95961 ns |
96192 ns |
1.00 |
array/reductions/reduce/Int64/dims=2L |
306864 ns |
298414 ns |
1.03 |
array/reverse/1d |
43951 ns |
44380 ns |
0.99 |
array/reverse/1dL |
75331 ns |
74131 ns |
1.02 |
array/reverse/1dL_inplace |
114591 ns |
108282 ns |
1.06 |
array/reverse/1d_inplace |
78351 ns |
86471 ns |
0.91 |
array/reverse/2d |
51401 ns |
50661 ns |
1.01 |
array/reverse/2dL |
100962 ns |
100341 ns |
1.01 |
array/reverse/2dL_inplace |
126372 ns |
117622 ns |
1.07 |
array/reverse/2d_inplace |
72271 ns |
95391 ns |
0.76 |
array/sorting/1d |
342675 ns |
341945 ns |
1.00 |
integration/byval/reference |
38860 ns |
38830 ns |
1.00 |
integration/byval/slices=1 |
39931 ns |
40880 ns |
0.98 |
integration/byval/slices=2 |
135882 ns |
158462 ns |
0.86 |
integration/byval/slices=3 |
239904 ns |
238013 ns |
1.01 |
integration/volumerhs |
5044422 ns |
4942659 ns |
1.02 |
kernel/indexing |
54831 ns |
43630 ns |
1.26 |
kernel/indexing_checked |
63931 ns |
128022 ns |
0.50 |
kernel/launch |
1310 ns |
1290 ns |
1.02 |
kernel/rand |
197883 ns |
106671 ns |
1.86 |
latency/import |
1570558951 ns |
1501349912 ns |
1.05 |
latency/precompile |
11972948400 ns |
12041117438 ns |
0.99 |
latency/ttfp |
13963592706 ns |
10491950084 ns |
1.33 |
This comment was automatically generated by workflow using github-action-benchmark.
No description provided.