Skip to content

support NVMe Deallocate#1105

Merged
ahrens merged 1 commit intomasterfrom
mahrens/trim
May 6, 2026
Merged

support NVMe Deallocate#1105
ahrens merged 1 commit intomasterfrom
mahrens/trim

Conversation

@ahrens
Copy link
Copy Markdown
Contributor

@ahrens ahrens commented Apr 8, 2026

This implements #990. It makes propolis advertise support for the "Dataset Management" NVMe command. It uses ioctl(DKIOCFREE) to pass these requests through to local disks (FileBackend). The requests are ignored on distributed disks (Crucible).

Note that our devices are NVME 1.0e, which specifies that the disk "may" deallocate all provided ranges, and "shall return all zeros, all ones, or the last data written to the associated LBA". The 1.0e spec has no mechanism for telling the guest which of these semantics is actually happening. Future work may include migrating to NVME 1.1 or later, which can use the DRB (Deallocation Read Behavior) field to tell the guest whether the blocks are actually zeroed or not.

This requires the changes for https://github.com/oxidecomputer/stlouis/issues/940 which are under review here.

A few details to be aware of or provide input on:

  1. I am always advertising support for the Dataset Management command (with a bit in the oncs field). I think this is OK since there is no live migration currently (even if the VM has no local disks). So each time a VM boots, it will be on a specific version of Propolis which either advertises Dataset Management, or not.
  2. block::Operation::Discard now means to discard multiple ranges from the client-provided list, not just one range. probes::block_begin_discard used to take an offset and length, but I have changed it to take the number of ranges instead. Is this OK? Are there consumers that need to change? We could fire a probe for each range, but then there would be multiple “begin” probes for one devqid, which could be confusing because begin/complete probes would not match up. Similar for probes::nvme_discard_enqueue.
  3. In VirtualDiskStats, should we add stats for Discard? How do we monitor these? Would we need to add support in consumers?

@ahrens ahrens requested review from iximeow and rmustacc April 8, 2026 17:34
@ahrens
Copy link
Copy Markdown
Contributor Author

ahrens commented Apr 8, 2026

testing: installed the ZFS changes and this Propolis on mb-1. In the linux guest VM, created a zpool on a local disk and used zpool trim and zpool set auotrim=on to cause it to issue Deallocate commands. I used dtrace to observe that vdev_disk_issue_trim() is called on the underlying physical disk.

Copy link
Copy Markdown
Member

@iximeow iximeow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, this will be nice to have, thanks for working on this. hopefully none of the comments are too surprising!

Comment thread lib/propolis/src/hw/nvme/cmds.rs Outdated
Comment thread lib/propolis/src/hw/nvme/cmds.rs Outdated
Comment thread lib/propolis/src/hw/nvme/cmds.rs Outdated
Comment thread lib/propolis/src/hw/nvme/cmds.rs Outdated
Comment thread lib/propolis/src/hw/nvme/cmds.rs Outdated
Comment thread lib/propolis/src/hw/nvme/requests.rs Outdated
Comment thread lib/propolis/src/hw/nvme/requests.rs Outdated
Comment thread lib/propolis/src/block/file.rs Outdated
Comment thread bin/propolis-server/src/lib/stats/virtual_disk.rs Outdated
Comment thread lib/propolis/src/block/crucible.rs Outdated
@ahrens ahrens force-pushed the mahrens/trim branch 2 times, most recently from e370d49 to eed15c6 Compare April 13, 2026 19:47
Copy link
Copy Markdown
Member

@iximeow iximeow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I'm a solid +1 on this now, thanks! if you're going to add a test about DatasetManagementCmd with bogus prp1/prp2 then I'll be happy to take a look at that here too. if not (or you have surprise issues), this is a nice improvement as-is.

@ahrens ahrens force-pushed the mahrens/trim branch 3 times, most recently from 05eafbf to 0d67440 Compare April 15, 2026 17:34
Copy link
Copy Markdown

@rmustacc rmustacc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the work on this.

Comment thread lib/propolis/src/hw/nvme/bits.rs Outdated
Comment thread lib/propolis/src/hw/nvme/cmds.rs
Comment thread lib/propolis/src/hw/nvme/cmds.rs
Comment thread lib/propolis/src/block/mod.rs
Comment thread lib/propolis/src/block/file.rs Outdated
Comment thread lib/propolis/src/block/file.rs
Comment thread bin/propolis-server/src/lib/stats/virtual_disk.rs Outdated
This makes propolis advertise support for the "Dataset Management" NVMe
command. It uses ioctl(DKIOCFREE) to pass these requests through to local
disks (FileBackend). The requests are ignored on distributed disks
(Crucible).
@ahrens ahrens merged commit 3b21bdc into master May 6, 2026
14 checks passed
@ahrens ahrens deleted the mahrens/trim branch May 6, 2026 17:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants