Skip to content

Fix two-pass state alignment not reaching the final state#468

Open
lenzo-ka wants to merge 4 commits intomainfrom
kal-align-2pass
Open

Fix two-pass state alignment not reaching the final state#468
lenzo-ka wants to merge 4 commits intomainfrom
kal-align-2pass

Conversation

@lenzo-ka
Copy link
Copy Markdown
Contributor

Fix two-pass state alignment not reaching the final state

Two-pass alignment (ps_set_alignment + second decode) was unreliable:
in many utterances the Viterbi search failed to reach the final phone's
exit state, producing no result. The cause was a pair of related bugs
in state_align_search.c.

ef boundary was off by one (prune_hmms + phone_transition condition)

ef[i] is the exclusive end frame for phone i, set to start + duration.
prune_hmms used nf > ef[i] to gate advancement, so a phone could
advance to ef[i] and be scored there — one frame past its word window.
phone_transition used hmm_frame(hmm) != nf as its fire condition,
which only triggers when the phone has just been advanced. With the
old prune gate, a phone at its boundary had hmm_frame == nf, so it
did fire. But the semantics were wrong: the phone was active at a
frame it should not occupy, and the history recorded in that extra
frame shifted the backtrace by one frame relative to the first-pass
word boundaries.

Fixed by changing the prune gate to nf >= ef[i] (phone stops at
ef[i]-1) and the fire condition to hmm_frame(hmm) < frame_idx
(any phone active this frame can transition).

Active successor's hmm_frame was bumped by a later competitor

Within a word, multiple phones can compete to enter the same successor
via Viterbi competition. The original code called hmm_enter() in both
the inactive and active cases. hmm_enter() sets hmm_frame = nf
unconditionally, so an intra-word predecessor firing later in the
phone_transition loop could overwrite a successor's hmm_frame with a
value one frame higher than it should be. This caused the successor
to stay "active" for an extra frame and re-enter the phone after that
with a shifted boundary, corrupting the word timing in the backtrace.

Fixed by separating the two cases: inactive successors get a full
hmm_enter(); active successors only have their score and in_history
updated if the incoming score is better.

These two fixes interact: the ef-boundary change exposes the
active-successor frame bump, which is why the entry-logic fix lands
first. Both test_word_align and test_state_align pass; the backtrace
continues to recover all structural levels (word, phone, state) without
change to the caller API.

When phone_transition fires and the successor HMM is already active
(hmm_frame >= frame_idx), calling hmm_enter() overwrites hmm_frame
with nf even though the phone is already being tracked at the current
frame.  Split the entry into two cases: inactive successors get a
full hmm_enter() with the new frame assignment; active ones only have
their in_score and in_history updated if the incoming score is better.
This is the correct Viterbi competition rule.
prune_hmms used nf > ef[i] to gate frame advancement, where ef[i]
is the exclusive end frame (start + duration).  This allowed a phone
to advance to ef[i] and be scored there, one frame past its window.

phone_transition used hmm_frame(hmm) != nf to decide whether to
fire, which meant only phones advanced by prune_hmms could
transition.  Phones stopped at the ef boundary had hmm_frame ==
frame_idx rather than nf, so the check silently suppressed
transitions.  In practice this meant the last phone of a word could
not enter the first phone of the following word, causing two-pass
alignment to fail to reach the final state.

Change the prune gate to nf >= ef[i] so a phone's last active frame
is ef[i]-1.  Change the transition condition to hmm_frame(hmm) <
frame_idx so any phone active this frame can fire, whether or not it
was advanced by prune.
@lenzo-ka lenzo-ka requested a review from dhdaines April 27, 2026 13:47
@dhdaines
Copy link
Copy Markdown
Contributor

dhdaines commented May 1, 2026

Hi! Sorry it took me a few days to get to this. I remember from writing this code that there were a lot of finicky details around whether ef is inclusive or exclusive - in the reporting of the alignments, if I remember correctly, it's inclusive, which may have caused the confusion here.

It took me a while to reason through the "Viterbi competition" change but I think I get it - phones should be activated in the next frame based on path scores for their own states, and not because of incoming states from previous phones.

I'll need to test this but I think it's correct.

@dhdaines
Copy link
Copy Markdown
Contributor

dhdaines commented May 1, 2026

Is it possible to supply an audio file + transcription which demonstrates the fix?

@lenzo-ka
Copy link
Copy Markdown
Contributor Author

lenzo-ka commented May 5, 2026

Is it possible to supply an audio file + transcription which demonstrates the fix?

test/data/goforward.raw works as the demonstration.

test_state_align.c runs the full second-pass state alignment path against it and checks specific per-word start/duration values; those values are wrong under the unpatched code because the ef boundary off-by-one and the active-successor frame bump interact to corrupt the backtrace.

For the Python API, alignment_test.py::test_alignment uses the same audio and iterates the full word → phone → state hierarchy, which with the old code would produce a final phone that never exits cleanly.

test/data/librivox/sense_and_sensibility_01_austen_64kb-0880.wav is already used by test_word_align.c with transcript "he was not an ill disposed young man", and this should also demonstrate it

@dhdaines
Copy link
Copy Markdown
Contributor

dhdaines commented May 6, 2026

test_state_align.c runs the full second-pass state alignment path against it and checks specific per-word start/duration values; those values are wrong under the unpatched code because the ef boundary off-by-one and the active-successor frame bump interact to corrupt the backtrace.

I don't understand - these values don't change with your PR, as far as I can tell, in both cases I get exactly the same output from the updated test_state_align.c that dumps out the phone and state durations (attached here)

I also don't get any errors aside from the "consider disabling bestpath search" ones.

I think the idea of your PR is sound but I really would like to see a reproduction of the problem, with the actual error or incorrect output that you get.

main.txt
kal.txt

@dhdaines
Copy link
Copy Markdown
Contributor

dhdaines commented May 6, 2026

Like, if the output is incorrect, then the tests should fail, right? And if they don't, we should update them to compare against the right expected output?

@dhdaines
Copy link
Copy Markdown
Contributor

dhdaines commented May 6, 2026

test_state_align.c runs the full second-pass state alignment path against it and checks specific per-word start/duration values; those values are wrong under the unpatched code because the ef boundary off-by-one and the active-successor frame bump interact to corrupt the backtrace.

I'm really puzzled by this, because, as you say, test_state_align.c already checks these values. Shouldn't it be failing in the CI run? Or are the values that it tests against incorrect? Here, for instance:

TEST_EQUAL(ps_alignment_iter_get(itor)->start, 212);

In that case please update them in the code in this PR...

@lenzo-ka
Copy link
Copy Markdown
Contributor Author

lenzo-ka commented May 6, 2026

Ah I'm sorry to have spun your wells on the bad tests. I though they came up there but they do not! This is happening on custom models where the edge cases are exposed more. I'll narrow the testing and make sure to have something reproduceable for you.

@dhdaines
Copy link
Copy Markdown
Contributor

dhdaines commented May 6, 2026

Ah I'm sorry to have spun your wells on the bad tests. I though they came up there but they do not! This is happening on custom models where the edge cases are exposed more.

Phew! I thought my computer was playing tricks on me... yes, if you could come up with a MRE (which can become a regression test) that would be really helpful. I do think the proposed change is correct, I'm thinking it may be useful to refactor the code somewhat to make it more obvious for future readers...

@lenzo-ka
Copy link
Copy Markdown
Contributor Author

lenzo-ka commented May 7, 2026

Yeah, those shouldn't be distributed... can you edit out the link to the attachment too, if you have what you need? Apparently one cannot delete attachments.

@cmusphinx cmusphinx deleted a comment from lenzo-ka May 7, 2026
@dhdaines
Copy link
Copy Markdown
Contributor

dhdaines commented May 7, 2026

Yeah, those shouldn't be distributed... can you edit out the link to the attachment too, if you have what you need? Apparently one cannot delete attachments.

Ah, ooops - yes, I've deleted both comments, not sure if it's still possible to find the attachment in the edit history.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants