libnvme: fix NULL handle dereference in discovery and identify paths by martin-belanger · Pull Request #3341 · linux-nvme/nvme-cli

martin-belanger · 2026-05-05T21:22:03Z

Problem

libnvme_ctrl_get_transport_handle() uses lazy open semantics: it opens /dev/nvmeX on first use and returns NULL if the open fails. None of the callers validated the returned handle before passing it down the call chain, leading to a segmentation fault.

The crash was observed in a nvmf-connect@.service instance triggered by udevd. The service fires when a discovery controller emits an AEN, but udevd can process the event after the controller has already begun teardown. By that point the kernel returns EAGAIN from open() because the controller is no longer in LIVE state, libnvme_open() fails, and the NULL handle is eventually dereferenced in libnvme_get_log().

Fix

nvme_discovery_log() — guard after libnvme_ctrl_get_transport_handle() with a LIBNVME_LOG_DEBUG message to aid tracing
nvmf_dim() — same guard, placed after the existing pre-condition checks where ctx is known valid, with a LIBNVME_LOG_ERR message consistent with the surrounding style
libnvme_ctrl_identify() — same guard with a LIBNVME_LOG_DEBUG message
libnvme_get_log() — silent safety net with a comment as a last line of defence for any future caller that omits the check

Testing

Verified by running nvme-stas tests that rapidly create and delete NVMe devices, which reliably triggered the crash before this fix. No crash observed after applying the patch.

libnvme_ctrl_get_transport_handle() opens the device lazily and returns NULL if the open fails. This can happen when a udev-triggered nvmf-connect@.service fires for a discovery controller that is already in the process of being removed: the device is still visible in sysfs at scan time but the kernel returns EAGAIN on open because the controller is no longer in LIVE state. None of the callers checked the returned handle before passing it to libnvme_get_log() or libnvme_submit_admin_passthru(), causing a segmentation fault. Add NULL handle guards with appropriate log messages in nvme_discovery_log(), nvmf_dim(), and libnvme_ctrl_identify(). Add a silent safety net with a comment in libnvme_get_log() itself as a last line of defence against future callers that omit the check. Signed-off-by: Martin Belanger <martin.belanger@dell.com> Assisted-by: Claude:claude-sonnet-4-6 [Claude Code]

igaw · 2026-05-07T17:14:15Z

Ah that's how you were able to get a NULL hdl. I think we have way more places where a check would be necessary. Maybe instead testing everywhere for a NULL hdl we could just do it where we actually are using it, in the pass through code? Where did it crash, in libnvme_submit_admin_passthru?

martin-belanger · 2026-05-07T17:27:42Z

The crash was in libnvme_get_log() at nvme-cmds.c:88, not in libnvme_submit_admin_passthru. The hdl was passed through directly from nvme_discovery_log() without a check.

I agree with your suggestion. Centralizing the guard in the passthrough layer is cleaner and more future-proof. The current patch already adds the safety net in libnvme_get_log(). If you also want a guard in libnvme_submit_admin_passthru(), I can add it there and drop the individual call-site checks — though I'd prefer to keep the per-site LIBNVME_LOG_DEBUG messages since they make it easy to trace which code path triggered the NULL (especially useful during rapid device teardown).

Let me know if you'd prefer: (a) keep as-is, (b) safety nets only in the passthrough functions with no per-site checks, or (c) both.

igaw · 2026-05-07T17:36:13Z

I think we should have the check only in ioctl.c to keep it inside the 'ioctl' layer. I think we could push

		if (hdl->uring_enabled)
			ret = libnvme_submit_admin_passthru_async(hdl, cmd);
		else
			ret = libnvme_submit_admin_passthru(hdl, cmd);

and

	if (hdl->uring_enabled) {
		ret = libnvme_wait_complete_passthru(hdl);
		if (ret)
			return ret;
	}

down and only call libnvme_submit_admin_passthru and always libnvme_submit_admin_complete (or what the name will be). So we don't have random checks if the hdl is NULL or not.

martin-belanger · 2026-05-07T17:50:09Z

Understood — that makes sense architecturally. A few clarifying questions before I rework this:

For the uring refactor: would the unified call remain named libnvme_submit_admin_passthru() (routing uring vs. ioctl internally), or do you have a different name in mind for the new combined entry point?
Is this refactoring in scope for this PR, or should we merge the current minimal fix now and follow up with the uring unification separately?

The current patch fixes the immediate crash. I'm happy to do the larger refactor here or in a follow-up — just want to make sure we agree on scope before I start.

igaw · 2026-05-08T07:29:22Z

I'd say we refactor first the code and then add the null pointer checks. That should avoid a bit of code churn.

Yes, I was thinking of just having libnvme_submit_admin_passthru and libnvme_wait_complete_passthru as API. Come to think of it, that means we should also touch all calling places of libnvme_submit_admin_passthru to pair up with libnvme_wait_complete_passthru. Only then the io_uring API will be used correctly throughout the code base.

martin-belanger force-pushed the fix-libnvme-null-hdl branch from 0aabdf9 to ca62b42 Compare May 5, 2026 22:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

libnvme: fix NULL handle dereference in discovery and identify paths#3341

libnvme: fix NULL handle dereference in discovery and identify paths#3341
martin-belanger wants to merge 1 commit intolinux-nvme:masterfrom
martin-belanger:fix-libnvme-null-hdl

martin-belanger commented May 5, 2026

Uh oh!

igaw commented May 7, 2026

Uh oh!

martin-belanger commented May 7, 2026

Uh oh!

igaw commented May 7, 2026

Uh oh!

martin-belanger commented May 7, 2026

Uh oh!

igaw commented May 8, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

martin-belanger commented May 5, 2026

Problem

Fix

Testing

Uh oh!

igaw commented May 7, 2026

Uh oh!

martin-belanger commented May 7, 2026

Uh oh!

igaw commented May 7, 2026

Uh oh!

martin-belanger commented May 7, 2026

Uh oh!

igaw commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

igaw commented May 8, 2026 •

edited

Loading