Skip to content

task: tighten AUTO read-ahead wait-clear to match MDI (#3650)#3971

Draft
grandixximo wants to merge 1 commit intoLinuxCNC:masterfrom
grandixximo:fix/3650-queue-not-empty-probe
Draft

task: tighten AUTO read-ahead wait-clear to match MDI (#3650)#3971
grandixximo wants to merge 1 commit intoLinuxCNC:masterfrom
grandixximo:fix/3650-queue-not-empty-probe

Conversation

@grandixximo
Copy link
Copy Markdown

@grandixximo grandixximo commented Apr 26, 2026

Fixes the long-standing "Queue is not empty after probing" error.

mdi_execute_hook has, since 2012, gated wait-clear on execState==DONE && motion.traj.queue==0 && io.status==DONE. readahead_reading (AUTO mode) checks only execState==DONE, on the assumption that the prior EMC_TASK_PLAN_SYNCH precondition (WAITING_FOR_MOTION_AND_IO) had already drained motion.

That assumption breaks across cycles: emcTaskIssueCommand writes EMC_TRAJ_PROBE to motion via shared memory, then the same task cycle's emcMotionUpdate may snapshot emcmotStatus before the servo loop has added the new segment. motion.status reads DONE (stale), the SYNCH precondition transitions immediately, and the next Interp::_read runs read_inputs against a queue that is now non-empty. CHKS fires.

Patch: add the same two conjuncts to the AUTO check.

Closes #3650, #662, #263.

Complementary to #3537 (already merged): that fix handled probe re-trip during deceleration. This one handles the inter-command snapshot-lag race that lets the same family of bugs leak through.

cc @andypugh, @DauntlessAq, @rmu75, @BsAtHome: would value review on the wait-clear semantics.
cc @tiket18, @Cromaglious: could you test against your reproducing workload? Can use built artifacts for testing

Test plan

  • Cyclic G38.3/G38.5 sim loop (>20k iterations), error gone
  • Single-probe + tool-setter still latch `#5061..#5069` correctly
  • Reporter validation against XXYYZ scan + smartprobe workloads
  • 2.9 backport candidate after master soak

Auto mode cleared the read-ahead wait on execState==DONE alone, so a
G38 issued just before an emcMotionUpdate snapshot could let the next
read fire while motion.traj.queue was still non-zero, tripping
NCE_QUEUE_IS_NOT_EMPTY_AFTER_PROBING. Add the queue==0 + io.status
==DONE conjuncts already used by mdi_execute_hook.

Closes LinuxCNC#3650, LinuxCNC#662, LinuxCNC#263.
@rmu75 rmu75 added bug v2.9 candidate Things that would be nice to have for 2.9 2.10-candidate would be nice to have fixed in 2.10 labels Apr 26, 2026
@rmu75
Copy link
Copy Markdown
Collaborator

rmu75 commented Apr 26, 2026

Not sure I really qualify to comment, but explanation makes sense.

@grandixximo
Copy link
Copy Markdown
Author

Need testers, I will try to post in the forum as well

@grandixximo
Copy link
Copy Markdown
Author

I'll send a 2.9 PR just so I can have debs built, users on the forum want 2.9 to test the fix

@DauntlessAq
Copy link
Copy Markdown
Contributor

I'm slightly confused, does this issue occur when the probe move starts, or when it ends?

Because the error message implies it is at the end, that is to say that the probe move (EMC_TRAJ_PROBE) is still in the queue when it should have been removed?

Aka, currently it is possible that a state sync check at the end of the probe move passed when it should not do so. This would be because execState==DONE was set before the move was REMOVED from the queue, due to a race condition.
So just checking execState==DONE isn't completely reliable, hence your more stringent checks being a fix.

This makes sense to me as an explanation that also matches the error message (but of course it might not be correct)

 

But when you say that:

emcTaskIssueCommand writes EMC_TRAJ_PROBE to motion via shared memory, then the same task cycle's emcMotionUpdate may snapshot emcmotStatus before the servo loop has added the new segment

Are you saying that the error actually occurs during a state sync BEFORE the probe move takes place, instead of AFTER? Because you're talking about a sync where EMC_TRAJ_PROBE is the move which is written?

Or are you talking about the END of the probe move, and the following move just happens to be EMC_TRAJ_PROBE in your testing because you are using consecutive probe moves in your testing (e.g. both towards and away from the part)?

But either way, you're saying that the race condition is that the move is ADDED prematurely, not that the condition passed prematurely because the move has not been fully REMOVED?

Surely the move wouldn't be added prematurely because the addition of the move to the interpreter queue would itself be gated behind the state sync check of execState==DONE so that the moves aren't added to the queue until the sync has occurred?
Else the next move would always be added during a probe move, so this "Queue is not empty after probing" error would happen every probe move?

 

Or are you saying that is the race condition, that upon completion of the probe move there is a brief period where moves can be added to the queue before the sync is commanded?

And this can allow the next move to be added to the queue before the sync takes place, causing the current sync logic to fail because the next move has been added to the queue despite the machine otherwise being in a state where the sync should pass?

In which case, while tightening up this logic might be a fix, we'd ideally also want to fix the issue closer to its orgin by preventing moves which require a sync beforehand from being added to the queue before a sync is confirmed to have occurred?

 

Just to show where the error message originates:

This is where the "Queue is not empty after probing" message is defined:

#define NCE_QUEUE_IS_NOT_EMPTY_AFTER_PROBING _("Queue is not empty after probing")

And this is how the message is called:

if (settings->probe_flag) {

if (settings->probe_flag) {
CHKS((GET_EXTERNAL_QUEUE_EMPTY() == 0),
     NCE_QUEUE_IS_NOT_EMPTY_AFTER_PROBING);
set_probe_data(&_setup);
settings->probe_flag = false;
}

The message is triggered when the interpreter is commanded to read an input, if the probe_flag is true and the queue is not empty then the error occurs. So the error definitely seems to occur AFTER a probe move.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

2.10-candidate would be nice to have fixed in 2.10 bug v2.9 candidate Things that would be nice to have for 2.9

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Queue is not empty after probing

3 participants