Skip to content

Provide Neuron kmods#407

Merged
piyush-jena merged 5 commits into
bottlerocket-os:developfrom
piyush-jena:neuron-div
May 28, 2026
Merged

Provide Neuron kmods#407
piyush-jena merged 5 commits into
bottlerocket-os:developfrom
piyush-jena:neuron-div

Conversation

@piyush-jena
Copy link
Copy Markdown
Contributor

@piyush-jena piyush-jena commented Apr 16, 2026

Description of changes:
This series of changes provides the neuron driver as a standalone kmod package. We are doing this to ensure that the kernel module compilation does not accidentally interfere with or modify the existing kernel build artifacts.

Testing done:

  1. Both x86_64 and aarch64 hosts boot
  2. Tested inf1 and inf2 instance with each kind of kernel and made sure the driver was loaded
[root@admin]# sheltie
bash-5.2# modinfo neuron
filename:       /lib/modules/6.12.77/kernel/drivers/neuron/neuron.ko
alias:          pci:v00001d0fd00007064sv*sd*bc*sc*i*
version:        2.24.13.0
license:        GPL
description:    Neuron Driver, built from SHA: f3d11aaca3951440c7e47a8c74361815fc8ddee7
import_ns:      DMA_BUF
srcversion:     F148FDF5D12696D55723F7F
depends:
name:           neuron
retpoline:      Y
vermagic:       6.12.77 SMP preempt mod_unload modversions
parm:           userver_ctl:ultraserver election control (int)
parm:           userver_etimeout:ultraserver election timeout (int)
parm:           force_userver:Force Neuron UltraServer (int)
parm:           force_die_flip:Force Neuron Core Mapping APIs to give back DIE flip mappings (int)
parm:           nmetric_metric_post_delay:Minimum time to wait (in milliseconds) before posting metrics again (uint)
parm:           nmetric_log_posts:1: send metrics to CW, 2: send metrics to trace, 3: send metrics to both (uint)
parm:           no_reset:Dont reset device (int)
parm:           nc_per_dev_param:Number of neuron cores (int)
parm:           dev_nc_map:Map of active neuron cores (int)
parm:           mempool_min_alloc_size:Minimum size for memory allocation (int)
parm:           mempool_host_memory_size:Host memory to reserve(in bytes) (int)
parm:           mempool_small_pool_size:Size of genpool for small allocations (in bytes) (ulong)
parm:           mempool_small_alloc_max:Threshold (in bytes) for deciding if an allocation is small (ulong)
parm:           dup_helper_enable:enable duplicate routing id unload helper (int)
parm:           wc_enable:enable write combining (int)
bash-5.2# systemctl status load-neuron-inf1-modules
● load-neuron-inf1-modules.service - Load Neuron Inf1 modules
     Loaded: loaded (/x86_64-bottlerocket-linux-gnu/sys-root/usr/lib/systemd/system/load-neuron-inf1-modules.service; enabled; preset: enabled)
    Drop-In: /x86_64-bottlerocket-linux-gnu/sys-root/usr/lib/systemd/system/service.d
             └─00-aws-config.conf, 10-requires-tmp.conf
     Active: active (exited) since Thu 2026-04-16 01:01:24 UTC; 1min 57s ago
 Invocation: f5b556a05a4f424cbf9f98288b2f7f1f
   Main PID: 1567 (code=exited, status=0/SUCCESS)
   Mem peak: 153.2M
        CPU: 253ms

Apr 16 01:01:23 localhost systemd[1]: Starting Load Neuron Inf1 modules...
Apr 16 01:01:23 localhost driverdog[1538]: 01:01:23 [INFO] Copied neuron.ko
Apr 16 01:01:24 localhost driverdog[1567]: 01:01:24 [INFO] Updated modules dependencies
Apr 16 01:01:24 localhost driverdog[1567]: 01:01:24 [INFO] Loaded kernel modules
Apr 16 01:01:24 localhost systemd[1]: Finished Load Neuron Inf1 modules.
bash-5.2# uname -r
6.12.77

bash-5.2# modinfo neuron
filename:       /lib/modules/6.1.166/kernel/drivers/neuron/neuron.ko.gz
alias:          pci:v00001d0fd00007064sv*sd*bc*sc*i*
version:        2.24.13.0
license:        GPL
description:    Neuron Driver, built from SHA: f3d11aaca3951440c7e47a8c74361815fc8ddee7
import_ns:      DMA_BUF
srcversion:     F148FDF5D12696D55723F7F
depends:
retpoline:      Y
name:           neuron
vermagic:       6.1.166 SMP preempt mod_unload modversions
parm:           userver_ctl:ultraserver election control (int)
parm:           userver_etimeout:ultraserver election timeout (int)
parm:           force_userver:Force Neuron UltraServer (int)
parm:           force_die_flip:Force Neuron Core Mapping APIs to give back DIE flip mappings (int)
parm:           nmetric_metric_post_delay:Minimum time to wait (in milliseconds) before posting metrics again (uint)
parm:           nmetric_log_posts:1: send metrics to CW, 2: send metrics to trace, 3: send metrics to both (uint)
parm:           no_reset:Dont reset device (int)
parm:           nc_per_dev_param:Number of neuron cores (int)
parm:           dev_nc_map:Map of active neuron cores (int)
parm:           mempool_min_alloc_size:Minimum size for memory allocation (int)
parm:           mempool_host_memory_size:Host memory to reserve(in bytes) (int)
parm:           mempool_small_pool_size:Size of genpool for small allocations (in bytes) (ulong)
parm:           mempool_small_alloc_max:Threshold (in bytes) for deciding if an allocation is small (ulong)
parm:           dup_helper_enable:enable duplicate routing id unload helper (int)
parm:           wc_enable:enable write combining (int)
bash-5.2# systemctl status load-neuron-inf1-modules
● load-neuron-inf1-modules.service - Load Neuron Inf1 modules
     Loaded: loaded (/x86_64-bottlerocket-linux-gnu/sys-root/usr/lib/systemd/system/load-neuron-inf1-modules.service; enabled; preset: enabled)
    Drop-In: /x86_64-bottlerocket-linux-gnu/sys-root/usr/lib/systemd/system/service.d
             └─00-aws-config.conf, 10-requires-tmp.conf
     Active: active (exited) since Thu 2026-04-16 01:03:18 UTC; 2min 35s ago
   Main PID: 1162 (code=exited, status=0/SUCCESS)
        CPU: 224ms

Apr 16 01:03:11 localhost systemd[1]: Starting Load Neuron Inf1 modules...
Apr 16 01:03:11 localhost driverdog[1159]: 01:03:11 [INFO] Copied neuron.ko.gz
Apr 16 01:03:18 localhost driverdog[1162]: 01:03:18 [INFO] Updated modules dependencies
Apr 16 01:03:18 localhost driverdog[1162]: 01:03:18 [INFO] Loaded kernel modules
Apr 16 01:03:18 localhost systemd[1]: Finished Load Neuron Inf1 modules.
bash-5.2# uname -r
6.1.166

bash-5.2# modinfo neuron
filename:       /lib/modules/6.18.16/kernel/drivers/neuron/neuron.ko
alias:          pci:v00001d0fd00007064sv*sd*bc*sc*i*
version:        2.26.10.0
license:        GPL
description:    Neuron Driver, built from SHA: dd4b234dc87473b76914082d9dc25785149c978a
import_ns:      DMA_BUF
srcversion:     02A0F96A8B18C6B24CA5EF9
depends:
name:           neuron
retpoline:      Y
vermagic:       6.18.16 SMP preempt mod_unload modversions
parm:           userver_pds_node_cnt:pds ultraserver node count (int)
parm:           userver_pds_server_id:pds ultraserver id (int)
parm:           userver_ctl:ultraserver election control (int)
parm:           userver_etimeout:ultraserver election timeout (int)
parm:           force_userver:Force Neuron UltraServer (int)
parm:           force_die_flip:Force Neuron Core Mapping APIs to give back DIE flip mappings (int)
parm:           nmetric_metric_post_delay:Minimum time to wait (in milliseconds) before posting metrics again (uint)
parm:           nmetric_log_posts:1: send metrics to CW, 2: send metrics to trace, 3: send metrics to both (uint)
parm:           no_reset:Dont reset device (int)
parm:           nc_per_dev_param:Number of neuron cores (int)
parm:           dev_nc_map:Map of active neuron cores (int)
parm:           dma_teardown_on_exit:Reset the DMA state on user process exit (int)
parm:           zerocopy_trn1_override:override zerocopy for trn1 (int)
parm:           mempool_min_alloc_size:Minimum size for memory allocation (int)
parm:           mempool_host_memory_size:Host memory to reserve(in bytes) (int)
parm:           mempool_small_pool_size:Size of genpool for small allocations (in bytes) (ulong)
parm:           mempool_small_alloc_max:Threshold (in bytes) for deciding if an allocation is small (ulong)
parm:           dup_helper_enable:enable duplicate routing id unload helper (int)
parm:           wc_enable:enable write combining (int)
bash-5.2# systemctl status load-neuron-latest-modules
● load-neuron-latest-modules.service - Load Neuron Latest modules
     Loaded: loaded (/x86_64-bottlerocket-linux-gnu/sys-root/usr/lib/systemd/system/load-neuron-latest-modules.service; enabled; preset: enabled)
    Drop-In: /x86_64-bottlerocket-linux-gnu/sys-root/usr/lib/systemd/system/service.d
             └─00-aws-config.conf, 10-requires-tmp.conf
     Active: active (exited) since Thu 2026-04-16 01:01:44 UTC; 5min ago
 Invocation: b6d3177b620e4a61afad68a20f2d5540
   Main PID: 1530 (code=exited, status=0/SUCCESS)
   Mem peak: 156M
        CPU: 243ms

Apr 16 01:01:43 localhost systemd[1]: Starting Load Neuron Latest modules...
Apr 16 01:01:43 localhost driverdog[1517]: 01:01:43 [INFO] Copied neuron.ko
Apr 16 01:01:44 localhost driverdog[1530]: 01:01:44 [INFO] Updated modules dependencies
Apr 16 01:01:44 localhost driverdog[1530]: 01:01:44 [INFO] Loaded kernel modules
Apr 16 01:01:44 localhost systemd[1]: Finished Load Neuron Latest modules.
bash-5.2# uname -r
6.18.16

bash-5.2# modinfo neuron
filename:       /lib/modules/6.12.77/kernel/drivers/neuron/neuron.ko
alias:          pci:v00001d0fd00007064sv*sd*bc*sc*i*
version:        2.26.10.0
license:        GPL
description:    Neuron Driver, built from SHA: dd4b234dc87473b76914082d9dc25785149c978a
import_ns:      DMA_BUF
srcversion:     FC1EE0EDC8DBFD28B7008D3
depends:
name:           neuron
retpoline:      Y
vermagic:       6.12.77 SMP preempt mod_unload modversions
parm:           userver_pds_node_cnt:pds ultraserver node count (int)
parm:           userver_pds_server_id:pds ultraserver id (int)
parm:           userver_ctl:ultraserver election control (int)
parm:           userver_etimeout:ultraserver election timeout (int)
parm:           force_userver:Force Neuron UltraServer (int)
parm:           force_die_flip:Force Neuron Core Mapping APIs to give back DIE flip mappings (int)
parm:           nmetric_metric_post_delay:Minimum time to wait (in milliseconds) before posting metrics again (uint)
parm:           nmetric_log_posts:1: send metrics to CW, 2: send metrics to trace, 3: send metrics to both (uint)
parm:           no_reset:Dont reset device (int)
parm:           nc_per_dev_param:Number of neuron cores (int)
parm:           dev_nc_map:Map of active neuron cores (int)
parm:           dma_teardown_on_exit:Reset the DMA state on user process exit (int)
parm:           zerocopy_trn1_override:override zerocopy for trn1 (int)
parm:           mempool_min_alloc_size:Minimum size for memory allocation (int)
parm:           mempool_host_memory_size:Host memory to reserve(in bytes) (int)
parm:           mempool_small_pool_size:Size of genpool for small allocations (in bytes) (ulong)
parm:           mempool_small_alloc_max:Threshold (in bytes) for deciding if an allocation is small (ulong)
parm:           dup_helper_enable:enable duplicate routing id unload helper (int)
parm:           wc_enable:enable write combining (int)
bash-5.2# systemctl status load-neuron-latest-modules
● load-neuron-latest-modules.service - Load Neuron Latest modules
     Loaded: loaded (/x86_64-bottlerocket-linux-gnu/sys-root/usr/lib/systemd/system/load-neuron-latest-modules.service; enabled; preset: enabled)
    Drop-In: /x86_64-bottlerocket-linux-gnu/sys-root/usr/lib/systemd/system/service.d
             └─00-aws-config.conf, 10-requires-tmp.conf
     Active: active (exited) since Thu 2026-04-16 01:05:23 UTC; 4min 1s ago
 Invocation: 62eb0c3b6a7f40e885b9583dd0b190ef
   Main PID: 1624 (code=exited, status=0/SUCCESS)
   Mem peak: 153.2M
        CPU: 244ms

Apr 16 01:05:22 localhost systemd[1]: Starting Load Neuron Latest modules...
Apr 16 01:05:22 localhost driverdog[1611]: 01:05:22 [INFO] Copied neuron.ko
Apr 16 01:05:23 localhost driverdog[1624]: 01:05:23 [INFO] Updated modules dependencies
Apr 16 01:05:23 localhost driverdog[1624]: 01:05:23 [INFO] Loaded kernel modules
Apr 16 01:05:23 localhost systemd[1]: Finished Load Neuron Latest modules.
bash-5.2# uname -r
6.12.77

bash-5.2# modinfo neuron
filename:       /lib/modules/6.18.16/kernel/drivers/neuron/neuron.ko
alias:          pci:v00001d0fd00007064sv*sd*bc*sc*i*
version:        2.24.13.0
license:        GPL
description:    Neuron Driver, built from SHA: f3d11aaca3951440c7e47a8c74361815fc8ddee7
import_ns:      DMA_BUF
srcversion:     66370BC42B5C7647DD5723F
depends:
name:           neuron
retpoline:      Y
vermagic:       6.18.16 SMP preempt mod_unload modversions
parm:           userver_ctl:ultraserver election control (int)
parm:           userver_etimeout:ultraserver election timeout (int)
parm:           force_userver:Force Neuron UltraServer (int)
parm:           force_die_flip:Force Neuron Core Mapping APIs to give back DIE flip mappings (int)
parm:           nmetric_metric_post_delay:Minimum time to wait (in milliseconds) before posting metrics again (uint)
parm:           nmetric_log_posts:1: send metrics to CW, 2: send metrics to trace, 3: send metrics to both (uint)
parm:           no_reset:Dont reset device (int)
parm:           nc_per_dev_param:Number of neuron cores (int)
parm:           dev_nc_map:Map of active neuron cores (int)
parm:           mempool_min_alloc_size:Minimum size for memory allocation (int)
parm:           mempool_host_memory_size:Host memory to reserve(in bytes) (int)
parm:           mempool_small_pool_size:Size of genpool for small allocations (in bytes) (ulong)
parm:           mempool_small_alloc_max:Threshold (in bytes) for deciding if an allocation is small (ulong)
parm:           dup_helper_enable:enable duplicate routing id unload helper (int)
parm:           wc_enable:enable write combining (int)
bash-5.2# systemctl status load-neuron-inf1-modules
● load-neuron-inf1-modules.service - Load Neuron Inf1 modules
     Loaded: loaded (/x86_64-bottlerocket-linux-gnu/sys-root/usr/lib/systemd/system/load-neuron-inf1-modules.service; enabled; preset: enabled)
    Drop-In: /x86_64-bottlerocket-linux-gnu/sys-root/usr/lib/systemd/system/service.d
             └─00-aws-config.conf, 10-requires-tmp.conf
     Active: active (exited) since Thu 2026-04-16 01:09:40 UTC; 1min 4s ago
 Invocation: 9b977d428aa148229b18304c5e9fe3b3
   Main PID: 1496 (code=exited, status=0/SUCCESS)
   Mem peak: 156.1M
        CPU: 242ms

Apr 16 01:09:33 localhost systemd[1]: Starting Load Neuron Inf1 modules...
Apr 16 01:09:33 localhost driverdog[1480]: 01:09:33 [INFO] Copied neuron.ko
Apr 16 01:09:40 localhost driverdog[1496]: 01:09:40 [INFO] Updated modules dependencies
Apr 16 01:09:40 localhost driverdog[1496]: 01:09:40 [INFO] Loaded kernel modules
Apr 16 01:09:40 localhost systemd[1]: Finished Load Neuron Inf1 modules.
bash-5.2# uname -r
6.18.16

bash-5.2# modinfo neuron
filename:       /lib/modules/6.1.166/kernel/drivers/neuron/neuron.ko.gz
alias:          pci:v00001d0fd00007064sv*sd*bc*sc*i*
version:        2.26.10.0
license:        GPL
description:    Neuron Driver, built from SHA: dd4b234dc87473b76914082d9dc25785149c978a
import_ns:      DMA_BUF
srcversion:     FC1EE0EDC8DBFD28B7008D3
depends:
retpoline:      Y
name:           neuron
vermagic:       6.1.166 SMP preempt mod_unload modversions
parm:           userver_pds_node_cnt:pds ultraserver node count (int)
parm:           userver_pds_server_id:pds ultraserver id (int)
parm:           userver_ctl:ultraserver election control (int)
parm:           userver_etimeout:ultraserver election timeout (int)
parm:           force_userver:Force Neuron UltraServer (int)
parm:           force_die_flip:Force Neuron Core Mapping APIs to give back DIE flip mappings (int)
parm:           nmetric_metric_post_delay:Minimum time to wait (in milliseconds) before posting metrics again (uint)
parm:           nmetric_log_posts:1: send metrics to CW, 2: send metrics to trace, 3: send metrics to both (uint)
parm:           no_reset:Dont reset device (int)
parm:           nc_per_dev_param:Number of neuron cores (int)
parm:           dev_nc_map:Map of active neuron cores (int)
parm:           dma_teardown_on_exit:Reset the DMA state on user process exit (int)
parm:           zerocopy_trn1_override:override zerocopy for trn1 (int)
parm:           mempool_min_alloc_size:Minimum size for memory allocation (int)
parm:           mempool_host_memory_size:Host memory to reserve(in bytes) (int)
parm:           mempool_small_pool_size:Size of genpool for small allocations (in bytes) (ulong)
parm:           mempool_small_alloc_max:Threshold (in bytes) for deciding if an allocation is small (ulong)
parm:           dup_helper_enable:enable duplicate routing id unload helper (int)
parm:           wc_enable:enable write combining (int)
bash-5.2# systemctl status load-neuron-latest-modules
● load-neuron-latest-modules.service - Load Neuron Latest modules
     Loaded: loaded (/x86_64-bottlerocket-linux-gnu/sys-root/usr/lib/systemd/system/load-neuron-latest-modules.service; enabled; preset: enabled)
    Drop-In: /x86_64-bottlerocket-linux-gnu/sys-root/usr/lib/systemd/system/service.d
             └─00-aws-config.conf, 10-requires-tmp.conf
     Active: active (exited) since Thu 2026-04-16 01:09:06 UTC; 2min 47s ago
   Main PID: 1233 (code=exited, status=0/SUCCESS)
        CPU: 268ms

Apr 16 01:09:02 localhost systemd[1]: Starting Load Neuron Latest modules...
Apr 16 01:09:02 localhost driverdog[1222]: 01:09:02 [INFO] Copied neuron.ko.gz
Apr 16 01:09:06 localhost driverdog[1233]: 01:09:06 [INFO] Updated modules dependencies
Apr 16 01:09:06 localhost driverdog[1233]: 01:09:06 [INFO] Loaded kernel modules
Apr 16 01:09:06 localhost systemd[1]: Finished Load Neuron Latest modules.
bash-5.2# uname -r
6.1.166
  1. In a inf2 instance I also experimented switching the neuron driver to test the additional drivers that are packaged
[root@admin]# sheltie
bash-5.2# driverdog link-modules
19:26:30 [INFO] Copied neuron.ko
19:26:30 [INFO] Copied neuron.ko
bash-5.2# ls -al /lib/modules/6.12.77/kernel/drivers/neuron/neuron.ko
-rw-r--r--. 1 root root 862848 Apr 23 19:26 /lib/modules/6.12.77/kernel/drivers/neuron/neuron.ko
bash-5.2# find . -name "neuron.ko"
./var/lib/kernel-modules/.overlay/upper/6.12.77/kernel/drivers/neuron/neuron.ko
./x86_64-bottlerocket-linux-gnu/sys-root/usr/lib/modules/6.12.77/kernel/drivers/neuron/neuron.ko
./x86_64-bottlerocket-linux-gnu/sys-root/usr/libexec/neuron/neuron_2_24/neuron.ko
./x86_64-bottlerocket-linux-gnu/sys-root/usr/libexec/neuron/neuron_2x_7372/neuron.ko
./x86_64-bottlerocket-linux-gnu/sys-root/usr/libexec/neuron/neuron_2x_7693/neuron.ko
./x86_64-bottlerocket-linux-gnu/sys-root/usr/libexec/neuron/neuron_2x_8072/neuron.ko
./x86_64-bottlerocket-linux-gnu/sys-root/usr/libexec/neuron/neuron_latest/neuron.ko
bash-5.2# modprobe neuron
bash-5.2# modinfo neuron
filename:       /lib/modules/6.12.77/kernel/drivers/neuron/neuron.ko
alias:          pci:v00001d0fd00007064sv*sd*bc*sc*i*
version:        2.24.13.0
license:        GPL
description:    Neuron Driver, built from SHA: f3d11aaca3951440c7e47a8c74361815fc8ddee7
import_ns:      DMA_BUF
srcversion:     F148FDF5D12696D55723F7F
depends:
name:           neuron
retpoline:      Y
vermagic:       6.12.77 SMP preempt mod_unload modversions
parm:           userver_ctl:ultraserver election control (int)
parm:           userver_etimeout:ultraserver election timeout (int)
parm:           force_userver:Force Neuron UltraServer (int)
parm:           force_die_flip:Force Neuron Core Mapping APIs to give back DIE flip mappings (int)
parm:           nmetric_metric_post_delay:Minimum time to wait (in milliseconds) before posting metrics again (uint)
parm:           nmetric_log_posts:1: send metrics to CW, 2: send metrics to trace, 3: send metrics to both (uint)
parm:           no_reset:Dont reset device (int)
parm:           nc_per_dev_param:Number of neuron cores (int)
parm:           dev_nc_map:Map of active neuron cores (int)
parm:           mempool_min_alloc_size:Minimum size for memory allocation (int)
parm:           mempool_host_memory_size:Host memory to reserve(in bytes) (int)
parm:           mempool_small_pool_size:Size of genpool for small allocations (in bytes) (ulong)
parm:           mempool_small_alloc_max:Threshold (in bytes) for deciding if an allocation is small (ulong)
parm:           dup_helper_enable:enable duplicate routing id unload helper (int)
parm:           wc_enable:enable write combining (int)
bash-5.2# rm /lib/modules/6.12.77/kernel/drivers/neuron/neuron.ko
bash-5.2# cp ./x86_64-bottlerocket-linux-gnu/sys-root/usr/libexec/neuron/neuron_2x_8072/neuron.ko /lib/modules/6.12.77/kernel/drivers/neuron/neuron.ko
bash-5.2# modprobe neuron
bash-5.2# modinfo neuron
filename:       /lib/modules/6.12.77/kernel/drivers/neuron/neuron.ko
alias:          pci:v00001d0fd00007064sv*sd*bc*sc*i*
version:        2.x.8072.0
license:        GPL
description:    Neuron Driver, built from SHA: 381f0e424131a6a8ab4599de7edd7170ee3388c1
import_ns:      DMA_BUF
srcversion:     55CAB5764D30359F7D21735
depends:
name:           neuron
retpoline:      Y
vermagic:       6.12.77 SMP preempt mod_unload modversions
parm:           userver_pds_node_cnt:pds ultraserver node count (int)
parm:           userver_pds_server_id:pds ultraserver id (int)
parm:           userver_ctl:ultraserver election control (int)
parm:           userver_etimeout:ultraserver election timeout (int)
parm:           force_userver:Force Neuron UltraServer (int)
parm:           force_die_flip:Force Neuron Core Mapping APIs to give back DIE flip mappings (int)
parm:           nmetric_metric_post_delay:Minimum time to wait (in milliseconds) before posting metrics again (uint)
parm:           nmetric_log_posts:1: send metrics to CW, 2: send metrics to trace, 3: send metrics to both (uint)
parm:           no_reset:Dont reset device (int)
parm:           nc_per_dev_param:Number of neuron cores (int)
parm:           dev_nc_map:Map of active neuron cores (int)
parm:           dma_teardown_on_exit:Reset the DMA state on user process exit (int)
parm:           zerocopy_trn1_override:override zerocopy for trn1 (int)
parm:           mempool_min_alloc_size:Minimum size for memory allocation (int)
parm:           mempool_host_memory_size:Host memory to reserve(in bytes) (int)
parm:           mempool_small_pool_size:Size of genpool for small allocations (in bytes) (ulong)
parm:           mempool_small_alloc_max:Threshold (in bytes) for deciding if an allocation is small (ulong)
parm:           dup_helper_enable:enable duplicate routing id unload helper (int)
parm:           wc_enable:enable write combining (int)

Terms of contribution:

By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.

Comment thread packages/kernel-6.1/EFACMakeLists.txt.in Outdated
@piyush-jena piyush-jena force-pushed the neuron-div branch 2 times, most recently from 87e0d2e to 3d44809 Compare April 17, 2026 18:36
@piyush-jena
Copy link
Copy Markdown
Contributor Author

^ Force pushes fixes the following:

  1. I had removed latest-kernel-srpm-url.sh for in kernel-6.1 by mistake. I put it back.
  2. I had forgotten removing the references for neuron kmods in Cargo.toml in kernel-6.1. I removed them.

Comment thread packages/kernel-6.1/EFACMakeLists.txt.in
Comment thread packages/kmod-6.1-neuron/kmod-6.1-neuron.spec
Comment thread packages/kernel-6.1/kernel-6.1.spec
Comment thread packages/kernel-6.1/kernel-6.1.spec
@piyush-jena piyush-jena force-pushed the neuron-div branch 3 times, most recently from 89fb0e1 to 9dd6af2 Compare April 23, 2026 05:16
@piyush-jena
Copy link
Copy Markdown
Contributor Author

^ The force push above addresses the following:

  1. Separate out the leftover EFA related CMakeLists to a separate commit
  2. Add additional kmods - separate latest, inf1 and extras package. This will allow us to have correct versioning in the RPM packages.
  3. Add end of support BRSA for kernel-{kmajor}-modules-neuron because we will no longer be shipping that sub-package.

@piyush-jena piyush-jena force-pushed the neuron-div branch 3 times, most recently from 130fa05 to e539103 Compare May 11, 2026 20:39
Comment thread packages/kmod-6.18-neuron-extras/Cargo.toml Outdated
@piyush-jena piyush-jena force-pushed the neuron-div branch 3 times, most recently from d4fc253 to bed6da6 Compare May 22, 2026 09:49
…package

Signed-off-by: Piyush Jena <jepiyush@amazon.com>
@piyush-jena
Copy link
Copy Markdown
Contributor Author

piyush-jena commented May 27, 2026

rebased, removed commit for RPM v4 package verification, merged -inf1 packages with the main kmod-neuron packages

Use https://github.com/bottlerocket-os/bottlerocket-kernel-kit/compare/a17a3d21e08ba225487de29a3a8cbffcf7f91be7..01b0328ec526ed3f5deb9631ba33fa84b95b3f66 to compare

Comment thread packages/kmod-6.1-neuron/aws-neuronx-dkms-2.26.10.0.noarch.rpm Outdated
Comment thread packages/kmod-6.18-neuron/kmod-6.18-neuron.spec Outdated
Comment thread packages/kmod-6.18-neuron/kmod-6.18-neuron.spec
@@ -0,0 +1,183 @@
%global kmajor 6.18
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doesn't have any _cross_arch checks - which functionally is fine, since we'll only pull this into kernel packages on x86 with these changes, but for builds, we will now be doing this work for both aarch64 and x86_64

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Let me think if there is a good to not do it. Put the entire thing inside if block?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it builds (which it does) leave it be. Doesn't hurt to build it, and even better if Neuron supports Graviton eventually.

Comment thread packages/kmod-6.12-neuron/kmod-6.12-neuron.spec Outdated
Comment thread packages/kmod-6.12-neuron/kmod-6.12-neuron.spec Outdated
@piyush-jena piyush-jena force-pushed the neuron-div branch 2 times, most recently from 4b2086d to 07eb2ed Compare May 28, 2026 00:29
Signed-off-by: Piyush Jena <jepiyush@amazon.com>
Signed-off-by: Piyush Jena <jepiyush@amazon.com>
Signed-off-by: Piyush Jena <jepiyush@amazon.com>
Signed-off-by: Piyush Jena <jepiyush@amazon.com>
@piyush-jena
Copy link
Copy Markdown
Contributor Author

@@ -0,0 +1,183 @@
%global kmajor 6.18
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it builds (which it does) leave it be. Doesn't hurt to build it, and even better if Neuron supports Graviton eventually.

@piyush-jena
Copy link
Copy Markdown
Contributor Author

Final round of tests:

  1. AMIs with kernel-6.1, 6.12 and 6.18 build for both arches.
  2. Instances with above AMIs join the cluster.
  3. Test results on kernel-6.18 in a inf2 instance - correct default, all drivers available and can be switched.
[root@admin]# sheltie
bash-5.2# apiclient get os
{
  "os": {
    "arch": "x86_64",
    "build_id": "8ef015e0-dirty",
    "pretty_name": "Bottlerocket OS 1.61.0 (aws-k8s-1.35)",
    "variant_id": "aws-k8s-1.35",
    "version_id": "1.61.0"
  }
}
bash-5.2# uname -r
6.18.30
bash-5.2# modinfo neuron
filename:       /lib/modules/6.18.30/kernel/drivers/neuron/neuron.ko
alias:          pci:v00001d0fd00007064sv*sd*bc*sc*i*
version:        2.26.10.0
license:        GPL
description:    Neuron Driver, built from SHA: dd4b234dc87473b76914082d9dc25785149c978a
import_ns:      DMA_BUF
srcversion:     02A0F96A8B18C6B24CA5EF9
depends:
name:           neuron
retpoline:      Y
vermagic:       6.18.30 SMP preempt mod_unload modversions
parm:           userver_pds_node_cnt:pds ultraserver node count (int)
parm:           userver_pds_server_id:pds ultraserver id (int)
parm:           userver_ctl:ultraserver election control (int)
parm:           userver_etimeout:ultraserver election timeout (int)
parm:           force_userver:Force Neuron UltraServer (int)
parm:           force_die_flip:Force Neuron Core Mapping APIs to give back DIE flip mappings (int)
parm:           nmetric_metric_post_delay:Minimum time to wait (in milliseconds) before posting metrics again (uint)
parm:           nmetric_log_posts:1: send metrics to CW, 2: send metrics to trace, 3: send metrics to both (uint)
parm:           no_reset:Dont reset device (int)
parm:           nc_per_dev_param:Number of neuron cores (int)
parm:           dev_nc_map:Map of active neuron cores (int)
parm:           dma_teardown_on_exit:Reset the DMA state on user process exit (int)
parm:           zerocopy_trn1_override:override zerocopy for trn1 (int)
parm:           mempool_min_alloc_size:Minimum size for memory allocation (int)
parm:           mempool_host_memory_size:Host memory to reserve(in bytes) (int)
parm:           mempool_small_pool_size:Size of genpool for small allocations (in bytes) (ulong)
parm:           mempool_small_alloc_max:Threshold (in bytes) for deciding if an allocation is small (ulong)
parm:           dup_helper_enable:enable duplicate routing id unload helper (int)
parm:           wc_enable:enable write combining (int)
bash-5.2# systemctl status load-neuron-latest-modules
● load-neuron-latest-modules.service - Load Neuron Latest modules
     Loaded: loaded (/x86_64-bottlerocket-linux-gnu/sys-root/usr/lib/systemd/system/load-neuron-latest-modules.service; enabled; preset: enabled)
    Drop-In: /x86_64-bottlerocket-linux-gnu/sys-root/usr/lib/systemd/system/service.d
             └─00-aws-config.conf, 10-requires-tmp.conf
     Active: active (exited) since Thu 2026-05-28 02:25:54 UTC; 6min ago
 Invocation: e1e2b0ea0d734d7fa03f55a9981dc240
   Main PID: 1558 (code=exited, status=0/SUCCESS)
   Mem peak: 156.3M
        CPU: 251ms

May 28 02:25:53 localhost systemd[1]: Starting Load Neuron Latest modules...
May 28 02:25:53 localhost driverdog[1545]: 02:25:53 [INFO] Copied neuron.ko
May 28 02:25:53 localhost driverdog[1558]: 02:25:53 [INFO] Updated modules dependencies
May 28 02:25:54 localhost driverdog[1558]: 02:25:54 [INFO] Loaded kernel modules
May 28 02:25:54 localhost systemd[1]: Finished Load Neuron Latest modules.
bash-5.2# systemctl status load-neuron-inf1-modules
○ load-neuron-inf1-modules.service - Load Neuron Inf1 modules
     Loaded: loaded (/x86_64-bottlerocket-linux-gnu/sys-root/usr/lib/systemd/system/load-neuron-inf1-modules.service; enabled; preset: enabled)
    Drop-In: /x86_64-bottlerocket-linux-gnu/sys-root/usr/lib/systemd/system/service.d
             └─00-aws-config.conf, 10-requires-tmp.conf
     Active: inactive (dead) (Result: exec-condition) since Thu 2026-05-28 02:25:53 UTC; 6min ago
 Invocation: 7ee3585774264d16bbb8d6d74860fa59
  Condition: start condition unmet at Thu 2026-05-28 02:25:53 UTC; 6min ago
   Mem peak: 1.2M
        CPU: 7ms

May 28 02:25:53 localhost systemd[1]: Starting Load Neuron Inf1 modules...
May 28 02:25:53 localhost ghostdog[1536]: Error: Did not detect inf1 hardware
May 28 02:25:53 localhost systemd[1]: load-neuron-inf1-modules.service: Skipped due to 'exec-condition'.
May 28 02:25:53 localhost systemd[1]: Condition check resulted in Load Neuron Inf1 modules being skipped.
bash-5.2# driverdog link-modules
02:33:18 [INFO] Copied neuron.ko
02:33:18 [INFO] Copied neuron.ko
bash-5.2# ls -al /lib/modules/6.18
6.18/    6.18.30/
bash-5.2# find . -name "neuron.ko"
./var/lib/kernel-modules/.overlay/upper/6.18.30/kernel/drivers/neuron/neuron.ko
./x86_64-bottlerocket-linux-gnu/sys-root/usr/lib/modules/6.18.30/kernel/drivers/neuron/neuron.ko
./x86_64-bottlerocket-linux-gnu/sys-root/usr/libexec/neuron/neuron_2_24/neuron.ko
./x86_64-bottlerocket-linux-gnu/sys-root/usr/libexec/neuron/neuron_2x_7372/neuron.ko
./x86_64-bottlerocket-linux-gnu/sys-root/usr/libexec/neuron/neuron_2x_7693/neuron.ko
./x86_64-bottlerocket-linux-gnu/sys-root/usr/libexec/neuron/neuron_2x_8072/neuron.ko
./x86_64-bottlerocket-linux-gnu/sys-root/usr/libexec/neuron/neuron_2x_8689/neuron.ko
./x86_64-bottlerocket-linux-gnu/sys-root/usr/libexec/neuron/neuron_latest/neuron.ko
bash-5.2# modprobe neuron
bash-5.2# rm /lib/modules/6.18.30/kernel/drivers/neuron/neuron.ko
bash-5.2# cp ./x86_64-bottlerocket-linux-gnu/sys-root/usr/libexec/neuron/neuron_2x_7372/neuron.ko /lib/modules/6.18.30/kernel/drivers/neuron/neuron.ko
bash-5.2# modinfo neuron
filename:       /lib/modules/6.18.30/kernel/drivers/neuron/neuron.ko
alias:          pci:v00001d0fd00007064sv*sd*bc*sc*i*
version:        2.x.7372.0
license:        GPL
description:    Neuron Driver, built from SHA: ed2eca4781405ab537f14c42850ecae3fd99dad4
import_ns:      DMA_BUF
srcversion:     CDEA03FCFB55975F7FF1A1F
depends:
name:           neuron
retpoline:      Y
vermagic:       6.18.30 SMP preempt mod_unload modversions
parm:           userver_pds_node_cnt:pds ultraserver node count (int)
parm:           userver_pds_server_id:pds ultraserver id (int)
parm:           userver_ctl:ultraserver election control (int)
parm:           userver_etimeout:ultraserver election timeout (int)
parm:           force_userver:Force Neuron UltraServer (int)
parm:           force_die_flip:Force Neuron Core Mapping APIs to give back DIE flip mappings (int)
parm:           nmetric_metric_post_delay:Minimum time to wait (in milliseconds) before posting metrics again (uint)
parm:           nmetric_log_posts:1: send metrics to CW, 2: send metrics to trace, 3: send metrics to both (uint)
parm:           no_reset:Dont reset device (int)
parm:           nc_per_dev_param:Number of neuron cores (int)
parm:           dev_nc_map:Map of active neuron cores (int)
parm:           zerocopy_trn1_override:override zerocopy for trn1 (int)
parm:           mempool_min_alloc_size:Minimum size for memory allocation (int)
parm:           mempool_host_memory_size:Host memory to reserve(in bytes) (int)
parm:           mempool_small_pool_size:Size of genpool for small allocations (in bytes) (ulong)
parm:           mempool_small_alloc_max:Threshold (in bytes) for deciding if an allocation is small (ulong)
parm:           dup_helper_enable:enable duplicate routing id unload helper (int)
parm:           wc_enable:enable write combining (int)
bash-5.2# cp ./x86_64-bottlerocket-linux-gnu/sys-root/usr/libexec/neuron/neuron_2x_8689/neuron.ko /lib/modules/6.18.30/kernel/drivers/neuron/neuron.ko
bash-5.2# modinfo neuron
filename:       /lib/modules/6.18.30/kernel/drivers/neuron/neuron.ko
alias:          pci:v00001d0fd00007064sv*sd*bc*sc*i*
version:        2.x.8689.0
license:        GPL
description:    Neuron Driver, built from SHA: 623632eb9c3d3312be233cf19079301468d42497
import_ns:      DMA_BUF
srcversion:     808B48438BF1C0893B0DBF6
depends:
name:           neuron
retpoline:      Y
vermagic:       6.18.30 SMP preempt mod_unload modversions
parm:           pds_reservation_id:pds reservation id (int)
parm:           userver_ctl:ultraserver election control (int)
parm:           userver_etimeout:ultraserver election timeout (int)
parm:           force_userver:Force Neuron UltraServer (int)
parm:           force_die_flip:Force Neuron Core Mapping APIs to give back DIE flip mappings (int)
parm:           nmetric_metric_post_delay:Minimum time to wait (in milliseconds) before posting metrics again (uint)
parm:           nmetric_log_posts:1: send metrics to CW, 2: send metrics to trace, 3: send metrics to both (uint)
parm:           no_reset:Dont reset device (int)
parm:           reset_top_dma:Reset top-level DMAs during TPB reset (int)
parm:           nc_per_dev_param:Number of neuron cores (int)
parm:           dev_nc_map:Map of active neuron cores (int)
parm:           dma_teardown_on_exit:Reset the DMA state on user process exit (int)
parm:           zerocopy_trn1_override:override zerocopy for trn1 (int)
parm:           mempool_min_alloc_size:Minimum size for memory allocation (int)
parm:           mempool_host_memory_size:Host memory to reserve(in bytes) (int)
parm:           mempool_small_pool_size:Size of genpool for small allocations (in bytes) (ulong)
parm:           mempool_small_alloc_max:Threshold (in bytes) for deciding if an allocation is small (ulong)
parm:           dup_helper_enable:enable duplicate routing id unload helper (int)
parm:           wc_enable:enable write combining (int)
bash-5.2# cp ./x86_64-bottlerocket-linux-gnu/sys-root/usr/libexec/neuron/neuron_2x_8072/neuron.ko /lib/modules/6.18.30/kernel/drivers/neuron/neuron.ko
bash-5.2# modinfo neuron
filename:       /lib/modules/6.18.30/kernel/drivers/neuron/neuron.ko
alias:          pci:v00001d0fd00007064sv*sd*bc*sc*i*
version:        2.x.8072.0
license:        GPL
description:    Neuron Driver, built from SHA: 381f0e424131a6a8ab4599de7edd7170ee3388c1
import_ns:      DMA_BUF
srcversion:     3804F7651E8952DA90F42A0
depends:
name:           neuron
retpoline:      Y
vermagic:       6.18.30 SMP preempt mod_unload modversions
parm:           userver_pds_node_cnt:pds ultraserver node count (int)
parm:           userver_pds_server_id:pds ultraserver id (int)
parm:           userver_ctl:ultraserver election control (int)
parm:           userver_etimeout:ultraserver election timeout (int)
parm:           force_userver:Force Neuron UltraServer (int)
parm:           force_die_flip:Force Neuron Core Mapping APIs to give back DIE flip mappings (int)
parm:           nmetric_metric_post_delay:Minimum time to wait (in milliseconds) before posting metrics again (uint)
parm:           nmetric_log_posts:1: send metrics to CW, 2: send metrics to trace, 3: send metrics to both (uint)
parm:           no_reset:Dont reset device (int)
parm:           nc_per_dev_param:Number of neuron cores (int)
parm:           dev_nc_map:Map of active neuron cores (int)
parm:           dma_teardown_on_exit:Reset the DMA state on user process exit (int)
parm:           zerocopy_trn1_override:override zerocopy for trn1 (int)
parm:           mempool_min_alloc_size:Minimum size for memory allocation (int)
parm:           mempool_host_memory_size:Host memory to reserve(in bytes) (int)
parm:           mempool_small_pool_size:Size of genpool for small allocations (in bytes) (ulong)
parm:           mempool_small_alloc_max:Threshold (in bytes) for deciding if an allocation is small (ulong)
parm:           dup_helper_enable:enable duplicate routing id unload helper (int)
parm:           wc_enable:enable write combining (int)
bash-5.2# cp ./x86_64-bottlerocket-linux-gnu/sys-root/usr/libexec/neuron/neuron_2x_7693/neuron.ko /lib/modules/6.18.30/kernel/drivers/neuron/neuron.ko
bash-5.2# modinfo neuron
filename:       /lib/modules/6.18.30/kernel/drivers/neuron/neuron.ko
alias:          pci:v00001d0fd00007064sv*sd*bc*sc*i*
version:        2.x.7693.0
license:        GPL
description:    Neuron Driver, built from SHA: 559cb9bf14a5b9e98b28b11fe8c7d70d659bd308
import_ns:      DMA_BUF
srcversion:     4E77A4CB04E32F68022400E
depends:
name:           neuron
retpoline:      Y
vermagic:       6.18.30 SMP preempt mod_unload modversions
parm:           userver_pds_node_cnt:pds ultraserver node count (int)
parm:           userver_pds_server_id:pds ultraserver id (int)
parm:           userver_ctl:ultraserver election control (int)
parm:           userver_etimeout:ultraserver election timeout (int)
parm:           force_userver:Force Neuron UltraServer (int)
parm:           force_die_flip:Force Neuron Core Mapping APIs to give back DIE flip mappings (int)
parm:           nmetric_metric_post_delay:Minimum time to wait (in milliseconds) before posting metrics again (uint)
parm:           nmetric_log_posts:1: send metrics to CW, 2: send metrics to trace, 3: send metrics to both (uint)
parm:           no_reset:Dont reset device (int)
parm:           nc_per_dev_param:Number of neuron cores (int)
parm:           dev_nc_map:Map of active neuron cores (int)
parm:           dma_teardown_on_exit:Reset the DMA state on user process exit (int)
parm:           zerocopy_trn1_override:override zerocopy for trn1 (int)
parm:           mempool_min_alloc_size:Minimum size for memory allocation (int)
parm:           mempool_host_memory_size:Host memory to reserve(in bytes) (int)
parm:           mempool_small_pool_size:Size of genpool for small allocations (in bytes) (ulong)
parm:           mempool_small_alloc_max:Threshold (in bytes) for deciding if an allocation is small (ulong)
parm:           dup_helper_enable:enable duplicate routing id unload helper (int)
parm:           wc_enable:enable write combining (int)

@piyush-jena piyush-jena merged commit 2c88c66 into bottlerocket-os:develop May 28, 2026
2 checks passed
@piyush-jena
Copy link
Copy Markdown
Contributor Author

piyush-jena commented May 28, 2026

inf1 instance

bash-5.2# modinfo neuron
filename:       /lib/modules/6.12.88/kernel/drivers/neuron/neuron.ko
alias:          pci:v00001d0fd00007064sv*sd*bc*sc*i*
version:        2.24.13.0
license:        GPL
description:    Neuron Driver, built from SHA: f3d11aaca3951440c7e47a8c74361815fc8ddee7
import_ns:      DMA_BUF
srcversion:     F148FDF5D12696D55723F7F
depends:
name:           neuron
retpoline:      Y
vermagic:       6.12.88 SMP preempt mod_unload modversions
parm:           userver_ctl:ultraserver election control (int)
parm:           userver_etimeout:ultraserver election timeout (int)
parm:           force_userver:Force Neuron UltraServer (int)
parm:           force_die_flip:Force Neuron Core Mapping APIs to give back DIE flip mappings (int)
parm:           nmetric_metric_post_delay:Minimum time to wait (in milliseconds) before posting metrics again (uint)
parm:           nmetric_log_posts:1: send metrics to CW, 2: send metrics to trace, 3: send metrics to both (uint)
parm:           no_reset:Dont reset device (int)
parm:           nc_per_dev_param:Number of neuron cores (int)
parm:           dev_nc_map:Map of active neuron cores (int)
parm:           mempool_min_alloc_size:Minimum size for memory allocation (int)
parm:           mempool_host_memory_size:Host memory to reserve(in bytes) (int)
parm:           mempool_small_pool_size:Size of genpool for small allocations (in bytes) (ulong)
parm:           mempool_small_alloc_max:Threshold (in bytes) for deciding if an allocation is small (ulong)
parm:           dup_helper_enable:enable duplicate routing id unload helper (int)
parm:           wc_enable:enable write combining (int)
bash-5.2# find . -name neuron.ko
./var/lib/kernel-modules/.overlay/upper/6.12.88/kernel/drivers/neuron/neuron.ko
./x86_64-bottlerocket-linux-gnu/sys-root/usr/lib/modules/6.12.88/kernel/drivers/neuron/neuron.ko
./x86_64-bottlerocket-linux-gnu/sys-root/usr/libexec/neuron/neuron_2_24/neuron.ko
./x86_64-bottlerocket-linux-gnu/sys-root/usr/libexec/neuron/neuron_2x_7372/neuron.ko
./x86_64-bottlerocket-linux-gnu/sys-root/usr/libexec/neuron/neuron_2x_7693/neuron.ko
./x86_64-bottlerocket-linux-gnu/sys-root/usr/libexec/neuron/neuron_2x_8072/neuron.ko
./x86_64-bottlerocket-linux-gnu/sys-root/usr/libexec/neuron/neuron_latest/neuron.ko
bash-5.2# uname -r
6.12.88

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants