Skip to content

Register Bank conflict fix#348

Closed
jel221 wants to merge 2 commits into
vortexgpgpu:bug_fixesfrom
jel221:bug_fixes
Closed

Register Bank conflict fix#348
jel221 wants to merge 2 commits into
vortexgpgpu:bug_fixesfrom
jel221:bug_fixes

Conversation

@jel221

@jel221 jel221 commented May 10, 2026

Copy link
Copy Markdown
Collaborator

Performance improvement

MNK = 32x32
Operand bank conflicts (TCU): 0x280 => 0x100
PERF: instrs=3088, cycles=10596, IPC=0.291
PERF: instrs=3088, cycles=9661, IPC=0.320

MNK = 128x128
Operand bank conflicts (TCU): 0xa000=>0x4000
PERF: instrs=142864, cycles=534563, IPC=0.267
PERF: instrs=142864, cycles=405638, IPC=0.352

The remaining conflicts are unavoidable

Request if you want the perf counter code

mx meta registers do not seem to be used in the current RTL, so either

  1. microops for future mx may need to be hardcoded
  2. the current changes may need to be refactored

@NikhilRout NikhilRout force-pushed the bug_fixes branch 11 times, most recently from aaaa491 to 22a0999 Compare May 17, 2026 16:24
@NikhilRout NikhilRout force-pushed the bug_fixes branch 17 times, most recently from ca39687 to 0c4c451 Compare May 29, 2026 17:07
@NikhilRout NikhilRout force-pushed the bug_fixes branch 7 times, most recently from d53968c to 2452382 Compare May 31, 2026 17:35
@tinebp

tinebp commented Jun 14, 2026

Copy link
Copy Markdown
Collaborator

Thanks for tracking this down, @jel221. The register-bank conflict this targets has since been fixed on master via the bank-conflict-free register permutation (bcfree/rb_idx) merged in e1e7715 from the wmma_fixes work, which permutes the B-fragment data→register mapping rather than shifting TCU_RB. The surrounding code has also moved (kernel header → sw/kernel/include/, SimX WMMA → macro-op sequencer), so this patch no longer applies. Closing as already-resolved — appreciated the catch.

@tinebp tinebp closed this Jun 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants