Fix two-pass state alignment not reaching the final state by lenzo-ka · Pull Request #468 · cmusphinx/pocketsphinx

lenzo-ka · 2026-04-27T13:47:01Z

Fix two-pass state alignment not reaching the final state

Two-pass alignment (ps_set_alignment + second decode) was unreliable:
in many utterances the Viterbi search failed to reach the final phone's
exit state, producing no result. The cause was a pair of related bugs
in state_align_search.c.

ef boundary was off by one (prune_hmms + phone_transition condition)

ef[i] is the exclusive end frame for phone i, set to start + duration.
prune_hmms used nf > ef[i] to gate advancement, so a phone could
advance to ef[i] and be scored there — one frame past its word window.
phone_transition used hmm_frame(hmm) != nf as its fire condition,
which only triggers when the phone has just been advanced. With the
old prune gate, a phone at its boundary had hmm_frame == nf, so it
did fire. But the semantics were wrong: the phone was active at a
frame it should not occupy, and the history recorded in that extra
frame shifted the backtrace by one frame relative to the first-pass
word boundaries.

Fixed by changing the prune gate to nf >= ef[i] (phone stops at
ef[i]-1) and the fire condition to hmm_frame(hmm) < frame_idx
(any phone active this frame can transition).

Active successor's hmm_frame was bumped by a later competitor

Within a word, multiple phones can compete to enter the same successor
via Viterbi competition. The original code called hmm_enter() in both
the inactive and active cases. hmm_enter() sets hmm_frame = nf
unconditionally, so an intra-word predecessor firing later in the
phone_transition loop could overwrite a successor's hmm_frame with a
value one frame higher than it should be. This caused the successor
to stay "active" for an extra frame and re-enter the phone after that
with a shifted boundary, corrupting the word timing in the backtrace.

Fixed by separating the two cases: inactive successors get a full
hmm_enter(); active successors only have their score and in_history
updated if the incoming score is better.

These two fixes interact: the ef-boundary change exposes the
active-successor frame bump, which is why the entry-logic fix lands
first. Both test_word_align and test_state_align pass; the backtrace
continues to recover all structural levels (word, phone, state) without
change to the caller API.

When phone_transition fires and the successor HMM is already active (hmm_frame >= frame_idx), calling hmm_enter() overwrites hmm_frame with nf even though the phone is already being tracked at the current frame. Split the entry into two cases: inactive successors get a full hmm_enter() with the new frame assignment; active ones only have their in_score and in_history updated if the incoming score is better. This is the correct Viterbi competition rule.

prune_hmms used nf > ef[i] to gate frame advancement, where ef[i] is the exclusive end frame (start + duration). This allowed a phone to advance to ef[i] and be scored there, one frame past its window. phone_transition used hmm_frame(hmm) != nf to decide whether to fire, which meant only phones advanced by prune_hmms could transition. Phones stopped at the ef boundary had hmm_frame == frame_idx rather than nf, so the check silently suppressed transitions. In practice this meant the last phone of a word could not enter the first phone of the following word, causing two-pass alignment to fail to reach the final state. Change the prune gate to nf >= ef[i] so a phone's last active frame is ef[i]-1. Change the transition condition to hmm_frame(hmm) < frame_idx so any phone active this frame can fire, whether or not it was advanced by prune.

dhdaines · 2026-05-01T15:28:27Z

Hi! Sorry it took me a few days to get to this. I remember from writing this code that there were a lot of finicky details around whether ef is inclusive or exclusive - in the reporting of the alignments, if I remember correctly, it's inclusive, which may have caused the confusion here.

It took me a while to reason through the "Viterbi competition" change but I think I get it - phones should be activated in the next frame based on path scores for their own states, and not because of incoming states from previous phones.

I'll need to test this but I think it's correct.

dhdaines · 2026-05-01T16:21:03Z

Is it possible to supply an audio file + transcription which demonstrates the fix?

lenzo-ka · 2026-05-05T18:04:55Z

Is it possible to supply an audio file + transcription which demonstrates the fix?

test/data/goforward.raw works as the demonstration.

test_state_align.c runs the full second-pass state alignment path against it and checks specific per-word start/duration values; those values are wrong under the unpatched code because the ef boundary off-by-one and the active-successor frame bump interact to corrupt the backtrace.

For the Python API, alignment_test.py::test_alignment uses the same audio and iterates the full word → phone → state hierarchy, which with the old code would produce a final phone that never exits cleanly.

test/data/librivox/sense_and_sensibility_01_austen_64kb-0880.wav is already used by test_word_align.c with transcript "he was not an ill disposed young man", and this should also demonstrate it

dhdaines · 2026-05-06T14:14:58Z

test_state_align.c runs the full second-pass state alignment path against it and checks specific per-word start/duration values; those values are wrong under the unpatched code because the ef boundary off-by-one and the active-successor frame bump interact to corrupt the backtrace.

I don't understand - these values don't change with your PR, as far as I can tell, in both cases I get exactly the same output from the updated test_state_align.c that dumps out the phone and state durations (attached here)

I also don't get any errors aside from the "consider disabling bestpath search" ones.

I think the idea of your PR is sound but I really would like to see a reproduction of the problem, with the actual error or incorrect output that you get.

main.txt
kal.txt

dhdaines · 2026-05-06T14:15:50Z

Like, if the output is incorrect, then the tests should fail, right? And if they don't, we should update them to compare against the right expected output?

dhdaines · 2026-05-06T14:23:20Z

test_state_align.c runs the full second-pass state alignment path against it and checks specific per-word start/duration values; those values are wrong under the unpatched code because the ef boundary off-by-one and the active-successor frame bump interact to corrupt the backtrace.

I'm really puzzled by this, because, as you say, test_state_align.c already checks these values. Shouldn't it be failing in the CI run? Or are the values that it tests against incorrect? Here, for instance:

pocketsphinx/test/unit/test_state_align.c

Line 118 in 73e5cc0

TEST_EQUAL(ps_alignment_iter_get(itor)->start, 212);

In that case please update them in the code in this PR...

lenzo-ka · 2026-05-06T18:14:23Z

Ah I'm sorry to have spun your wells on the bad tests. I though they came up there but they do not! This is happening on custom models where the edge cases are exposed more. I'll narrow the testing and make sure to have something reproduceable for you.

dhdaines · 2026-05-06T18:47:24Z

Ah I'm sorry to have spun your wells on the bad tests. I though they came up there but they do not! This is happening on custom models where the edge cases are exposed more.

Phew! I thought my computer was playing tricks on me... yes, if you could come up with a MRE (which can become a regression test) that would be really helpful. I do think the proposed change is correct, I'm thinking it may be useful to refactor the code somewhat to make it more obvious for future readers...

lenzo-ka · 2026-05-07T17:07:49Z

Yeah, those shouldn't be distributed... can you edit out the link to the attachment too, if you have what you need? Apparently one cannot delete attachments.

dhdaines · 2026-05-07T18:58:38Z

Yeah, those shouldn't be distributed... can you edit out the link to the attachment too, if you have what you need? Apparently one cannot delete attachments.

Ah, ooops - yes, I've deleted both comments, not sure if it's still possible to find the attachment in the edit history.

lenzo-ka added 2 commits April 23, 2026 13:35

lenzo-ka requested a review from dhdaines April 27, 2026 13:47

lenzo-ka mentioned this pull request Apr 27, 2026

Add separate beam defaults for alignment #464

Closed

2 tasks

refactor: remove a stray space

1c18c6a

dhdaines mentioned this pull request May 5, 2026

Fix some failures to reach final state in alignment ReadAlongs/SoundSwallower#78

Draft

Merge remote-tracking branch 'origin/main' into kal-align-2pass

a19653f

cmusphinx deleted a comment from lenzo-ka May 7, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix two-pass state alignment not reaching the final state#468

Fix two-pass state alignment not reaching the final state#468
lenzo-ka wants to merge 4 commits intomainfrom
kal-align-2pass

lenzo-ka commented Apr 27, 2026

Uh oh!

dhdaines commented May 1, 2026

Uh oh!

dhdaines commented May 1, 2026

Uh oh!

lenzo-ka commented May 5, 2026

Uh oh!

dhdaines commented May 6, 2026

Uh oh!

dhdaines commented May 6, 2026

Uh oh!

dhdaines commented May 6, 2026

Uh oh!

lenzo-ka commented May 6, 2026

Uh oh!

dhdaines commented May 6, 2026

Uh oh!

lenzo-ka commented May 7, 2026

Uh oh!

dhdaines commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

Conversation

lenzo-ka commented Apr 27, 2026

Uh oh!

dhdaines commented May 1, 2026

Uh oh!

dhdaines commented May 1, 2026

Uh oh!

lenzo-ka commented May 5, 2026

Uh oh!

dhdaines commented May 6, 2026

Uh oh!

dhdaines commented May 6, 2026

Uh oh!

dhdaines commented May 6, 2026

Uh oh!

lenzo-ka commented May 6, 2026

Uh oh!

dhdaines commented May 6, 2026

Uh oh!

lenzo-ka commented May 7, 2026

Uh oh!

dhdaines commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants