Add Nemotron-ASR streaming inference to Rust SDK by rui-ren · Pull Request #613 · microsoft/Foundry-Local

rui-ren · 2026-04-08T19:45:05Z

Add Nemotron-ASR streaming inference to Rust SDK"

Description

Ports the C# live audio transcription feature (PR #485) to the Rust SDK with full API parity.

The existing AudioClient only supports file-based transcription. This PR introduces LiveAudioTranscriptionSession that accepts continuous PCM audio chunks (e.g., from a microphone) and returns partial/final transcription results as an async stream.

What's included

New files

sdk/rust/src/openai/live_audio_client.rs — Streaming session with start(), append(), get_transcription_stream(), stop(), plus types, cancellation support, and unit tests
sdk/rust/tests/integration/live_audio_test.rs — E2E integration test with synthetic PCM audio
samples/rust/live-audio-transcription-example/ — Full sample with real microphone capture (cpal) and resampling

Modified files

sdk/rust/src/detail/core_interop.rs — Added StreamingRequestBuffer FFI struct and execute_command_with_binary() for binary audio data
sdk/rust/src/openai/audio_client.rs — Added create_live_transcription_session() factory method
sdk/rust/src/detail/model.rs, model_variant.rs — Wired factory method to Model
sdk/rust/src/openai/mod.rs, src/lib.rs — Module registration and public exports
sdk/rust/Cargo.toml — Added tokio-util dependency for CancellationToken

API surface

let audio_client = model.create_audio_client();
let session = audio_client.create_live_transcription_session();

session.settings.sample_rate = 16000;
session.settings.channels = 1;
session.settings.language = Some("en".into());

session.start(None).await?;

// Push audio from microphone callback
session.append(&pcm_bytes, None).await?;

// Read results as async stream
use tokio_stream::StreamExt;
let mut stream = session.get_transcription_stream()?;
while let Some(result) = stream.next().await {
    let result = result?;
    println!("{}", result.content[0].text);
}

session.stop(None).await?;

C# API parity

C#	Rust	Status
`CreateLiveTranscriptionSession()`	`create_live_transcription_session()`	✅
`StartAsync(CancellationToken)`	`start(Option<CancellationToken>)`	✅
`AppendAsync(ReadOnlyMemory<byte>, CancellationToken)`	`append(&[u8], Option<CancellationToken>)`	✅
`GetTranscriptionStream(CancellationToken)`	`get_transcription_stream()`	✅
`StopAsync(CancellationToken)` + cancel-safe cleanup	`stop(Option<CancellationToken>)` + cancel-safe cleanup	✅
`IAsyncDisposable.DisposeAsync()`	`Drop` with best-effort native stop	✅
`LiveAudioTranscriptionResponse.Content[0].Text`	`response.content[0].text`	✅
`LiveAudioTranscriptionResponse.Content[0].Transcript`	`response.content[0].transcript`	✅
`LiveAudioTranscriptionResponse.IsFinal`	`response.is_final`	✅
`LiveAudioTranscriptionResponse.StartTime/EndTime`	`response.start_time` / `response.end_time`	✅
`LiveAudioTranscriptionOptions` (SampleRate, Channels, BitsPerSample, Language, PushQueueCapacity)	`LiveAudioTranscriptionOptions` (sample_rate, channels, bits_per_sample, language, push_queue_capacity)	✅
`CoreErrorResponse.TryParse()`	`CoreErrorResponse::try_parse()`	✅
Native commands: `audio_stream_start`, `audio_stream_push`, `audio_stream_stop`	Same commands via `execute_command` / `execute_command_with_binary`	✅

Design highlights

CancellationToken support — start/append/stop accept Option<CancellationToken> via tokio_util::sync::CancellationToken
Cancel-safe stop — stop() always performs native audio_stream_stop even if token fires, preventing native session leaks (matches C# StopAsync pattern)
Response envelope — LiveAudioTranscriptionResponse uses content: Vec<ContentPart> matching C#'s ConversationItem.Content[0].Text/Transcript
Bounded push queue — Backpressure via bounded channel (capacity=100); prevents unbounded memory growth
Push loop on blocking thread — execute_command_with_binary FFI calls run on spawn_blocking, keeping async runtime free
Settings freeze — Audio format settings are cloned at start() and immutable during the session
Drop safety — Best-effort synchronous audio_stream_stop in Drop to prevent native session leaks
FFI null pointer safety — Empty binary slices use std::ptr::null() to avoid dangling pointer across FFI boundary

Verified working

✅ SDK build succeeds (0 errors, 0 clippy warnings)
✅ 13 unit tests passing (JSON deserialization, settings defaults, error parsing, content envelope)
✅ E2E pipeline: Microphone (48kHz/2ch/F32) → Resample (16kHz/mono/16-bit) → SDK → Core.dll → onnxruntime-genai.dll → nemotron model
✅ Synthetic audio test: 30 chunks (96KB PCM) pushed with clean session lifecycle
✅ Live microphone test: real-time capture, session start/stop, no native errors

Stats

14 files changed, 1,329 additions, 2 deletions

vercel · 2026-04-08T19:45:15Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
foundry-local	Ready	Preview, Comment	Apr 24, 2026 9:12pm

Copilot

Pull request overview

This PR adds live (chunked) PCM audio transcription streaming to the Foundry Local Rust SDK, aligning the Rust API with the existing C# live audio transcription session feature and extending the SDK beyond file-based transcription.

Changes:

Introduces LiveAudioTranscriptionSession + associated response/options/types and stream wrapper in the Rust SDK.
Extends the Rust FFI bridge with execute_command_with_binary() to send JSON params + binary PCM payloads.
Adds integration test coverage and a new Rust sample demonstrating microphone capture (cpal) and streaming transcription.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
sdk/rust/src/openai/live_audio_client.rs	New streaming session implementation, types, cancellation support, and unit tests
sdk/rust/src/detail/core_interop.rs	Adds `StreamingRequestBuffer` + optional `execute_command_with_binary` symbol support
sdk/rust/src/openai/audio_client.rs	Adds `create_live_transcription_session()` factory method
sdk/rust/src/detail/model.rs	Exposes `Model::create_live_transcription_session()`
sdk/rust/src/detail/model_variant.rs	Wires variant factory for live transcription sessions
sdk/rust/src/openai/mod.rs	Registers and re-exports live audio transcription module/types
sdk/rust/src/lib.rs	Public re-exports for the new live transcription session/types
sdk/rust/Cargo.toml	Adds `tokio-util` for `CancellationToken`
sdk/rust/tests/integration/main.rs	Registers the new integration test module
sdk/rust/tests/integration/live_audio_test.rs	New E2E-ish integration test using synthetic PCM audio
samples/rust/live-audio-transcription-example/src/main.rs	New microphone/synthetic streaming transcription sample
samples/rust/live-audio-transcription-example/Cargo.toml	Declares sample dependencies (cpal, tokio, sdk path dep)
samples/rust/Cargo.toml	Adds the new sample crate to the workspace
codex-feedback.md	Adds review/validation notes for the feature

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Port the C# live audio transcription feature (PR #485) to the Rust SDK with full API parity. New files: - src/openai/live_audio_client.rs: LiveAudioTranscriptionSession with start/append/get_transcription_stream/stop lifecycle, response types, CoreErrorResponse, and unit tests - tests/integration/live_audio_test.rs: E2E test with synthetic PCM audio Modified files: - src/detail/core_interop.rs: StreamingRequestBuffer FFI struct and execute_command_with_binary method for binary audio data - src/openai/audio_client.rs: create_live_transcription_session() factory - src/detail/model.rs, model_variant.rs: create_live_transcription_session() - src/openai/mod.rs, src/lib.rs: Module and public type exports API surface: let audio_client = model.create_audio_client(); let session = audio_client.create_live_transcription_session(); session.settings.sample_rate = 16000; session.start().await?; session.append(&pcm_bytes).await?; let mut stream = session.get_transcription_stream()?; // use tokio_stream::StreamExt; while let Some(result) = stream.next().await { ... } session.stop().await?; Design highlights: - Bounded push channel with backpressure (capacity=100) - Push loop runs on blocking thread via spawn_blocking - Fail-fast on native errors (no retry logic) - Settings frozen at start() via clone snapshot - Output channel completed on stop() after final result Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- core_interop.rs: Use std::ptr::null() for empty binary_data slices to avoid passing dangling pointer across FFI boundary - live_audio_client.rs: Call native audio_stream_stop synchronously in Drop to prevent native session leaks when stop() is not called Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Address codex-feedback.md parity gaps: 1. CancellationToken support: start/append/stop now accept Option<CancellationToken> (via tokio_util::sync::CancellationToken). stop() uses cancel-safe pattern matching C# StopAsync native session stop is always performed even if token fires. 2. Response envelope matches C#: LiveAudioTranscriptionResponse now has content: Vec<ContentPart> with text/transcript fields, so callers use result.content[0].text (identical to C# Content[0].Text). 3. Added tokio-util dependency for CancellationToken. Updated E2E sample and integration test to use new API shape. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Sample: update download progress callback from &str to f64 to match upstream API change (PR #608) - Apply cargo fmt to all SDK and sample files for CI compliance Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The function and its AudioClient import triggered -D warnings (dead_code) in the CI build. The E2E test creates the session directly via model.create_audio_client() and doesn't use this helper. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

SDK: fix append() deadlock (clone tx before await), start() cancellation leak, double-stop, get_transcription_stream() async, remove unused field, fix error parsing, remove manual Unpin, improve error messages, refactor stop() into helpers, rewrite push_loop per reviewer suggestion. Sample: extract convert_audio(), edition 2024, remove codex-feedback.md, document BufferSize::Default, replace Arc+spawn with channel-based mic forwarding. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

bmehta001 · 2026-04-24T06:46:02Z

+
+    /// Await the start future with cancellation safety. If cancelled, any
+    /// native session that was already created is cleaned up.
+    async fn await_start(


Will this leak the handle if we cancel? Should cancellation call audio_stream_stop?

Fix the leak concerns. Thanks

nenad1002 · 2026-04-24T07:22:37Z

+
+            state
+                .push_tx
+                .as_ref()


nenad1002 · 2026-04-24T07:30:15Z

Run cargo fmt & cargo clippy

kunal-vaishnavi · 2026-04-24T07:44:03Z

        EmbeddingClient::new(&self.info.id, Arc::clone(&self.core))
    }
+
+    pub(crate) fn create_live_transcription_session(&self) -> LiveAudioTranscriptionSession {


Same comment as here

…implify clone - Remove create_live_transcription_session() from Model and ModelVariant per bmehta001/kunal-vaishnavi: session should only be created via AudioClient, not directly from Model (matches C# pattern) - Simplify .as_ref().cloned() to .clone() per nenad1002 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Always await start_future to completion if cancellation was requested, check is_cancelled() afterwards and call audio_stream_stop to clean up any native session that was created. This prevents native handle leaks when CancellationToken fires during start(). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-24T22:27:09Z

+                    let _ = output_tx.send(Err(FoundryLocalError::CommandExecution {
+                        reason: format!("Push failed (code={code}): {e}"),
+                    }));
+                    break;


On a fatal push error, the push loop breaks but the output channel sender stored in session state remains alive, so get_transcription_stream() will not terminate (it will yield the error once, then await forever) unless stop()/Drop is called to drop output_tx. Consider mirroring the JS/C# behavior by ensuring the stream is completed on fatal push errors (e.g., drop/close the state-held sender, or send a control event to an async task that finalizes state and closes the output channel).

Suggested change

break;

// Fatal push failures are terminal for the transcription stream.

// Drop the loop-owned sender and return immediately so the stream

// can complete once no other senders remain.

drop(output_tx);

return;

Copilot · 2026-04-24T22:27:09Z

+        while let Some(result) = stream.next().await {
+            match result {
+                Ok(r) => results_clone.lock().await.push(r),
+                Err(e) => {
+                    eprintln!("Stream error: {e}");
+                    break;
+                }


This integration test swallows transcription stream errors: the spawned read task only logs Err(e) and breaks, but the test still passes (especially if results ends up empty, since the final assertions are inside a for loop). To make this test meaningful as an E2E signal, propagate the first stream error back to the test (e.g., store it in shared state and assert!(error.is_none()) after stop()), and/or assert that at least one valid response was received once the session completes.

Copilot · 2026-04-24T22:27:10Z

+                    let code = CoreErrorResponse::try_parse(&e.to_string())
+                        .map(|ei| ei.code)
+                        .unwrap_or_else(|| "UNKNOWN".into());


CoreErrorResponse::try_parse(&e.to_string()) will never successfully parse JSON from the native core because FoundryLocalError’s Display format prefixes the message (e.g., "command execution error: ..."). This makes code always fall back to "UNKNOWN" even when the core returns structured JSON. Match on FoundryLocalError::CommandExecution { reason } (or otherwise access the raw reason string) and pass that to try_parse instead of e.to_string().

Suggested change

let code = CoreErrorResponse::try_parse(&e.to_string())

.map(|ei| ei.code)

.unwrap_or_else(|| "UNKNOWN".into());

let code = match &e {

FoundryLocalError::CommandExecution { reason } => {

CoreErrorResponse::try_parse(reason)

.map(|ei| ei.code)

.unwrap_or_else(|| "UNKNOWN".into())

}

_ => "UNKNOWN".into(),

};

Copilot AI review requested due to automatic review settings April 8, 2026 19:45

Copilot started reviewing on behalf of rui-ren April 8, 2026 19:46 View session

Copilot AI reviewed Apr 8, 2026

View reviewed changes

rui-ren force-pushed the ruiren/live-audio-stream-rust branch from 3422045 to 0f8ae7a Compare April 8, 2026 21:33

vercel Bot deployed to Preview April 8, 2026 21:34 View deployment

vercel Bot deployed to Preview April 8, 2026 23:30 View deployment

kunal-vaishnavi reviewed Apr 10, 2026

View reviewed changes

Comment thread samples/rust/live-audio-transcription-example/src/main.rs Outdated

kunal-vaishnavi reviewed Apr 10, 2026

View reviewed changes

Comment thread samples/rust/live-audio-transcription-example/Cargo.toml Outdated

kunal-vaishnavi reviewed Apr 10, 2026

View reviewed changes

Comment thread codex-feedback.md Outdated

kunal-vaishnavi reviewed Apr 10, 2026

View reviewed changes

Comment thread sdk/rust/tests/integration/live_audio_test.rs

nenad1002 reviewed Apr 10, 2026

View reviewed changes

kunal-vaishnavi requested a review from baijumeswani April 13, 2026 20:38

vercel Bot deployed to Preview April 13, 2026 21:51 View deployment

vercel Bot deployed to Preview April 13, 2026 23:13 View deployment

vercel Bot deployed to Preview April 13, 2026 23:26 View deployment

rui-ren changed the title ~~Add live audio transcription streaming support to Foundry Local Rust SDK~~ Add Nemotron-ASR streaming inference to Rust SDK Apr 14, 2026

vercel Bot deployed to Preview April 23, 2026 21:58 View deployment

ruiren_microsoft and others added 9 commits April 23, 2026 15:13

Update codex-feedback.md: mark parity gaps as resolved

7fb48dd

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Fix clippy needless_return for Rust 1.94

e50a889

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Fix cargo fmt: collapse single-arm match block

fbd9d89

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

rui-ren force-pushed the ruiren/live-audio-stream-rust branch from 9754696 to fbd9d89 Compare April 23, 2026 22:25

vercel Bot deployed to Preview April 23, 2026 22:26 View deployment

rui-ren requested a review from kunal-vaishnavi April 23, 2026 23:33

bmehta001 reviewed Apr 24, 2026

View reviewed changes

Comment thread sdk/rust/src/detail/model.rs Outdated

nenad1002 reviewed Apr 24, 2026

View reviewed changes

Comment thread sdk/rust/src/openai/live_audio_client.rs Outdated

state

.push_tx

.as_ref()

Copy link
Copy Markdown

Contributor

nenad1002 Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just clone

kunal-vaishnavi reviewed Apr 24, 2026

View reviewed changes

vercel Bot deployed to Preview April 24, 2026 17:45 View deployment

vercel Bot deployed to Preview April 24, 2026 21:12 View deployment

rui-ren requested review from bmehta001, kunal-vaishnavi and nenad1002 April 24, 2026 21:47

kunal-vaishnavi approved these changes Apr 24, 2026

View reviewed changes

kunal-vaishnavi requested a review from Copilot April 24, 2026 22:20

Copilot started reviewing on behalf of kunal-vaishnavi April 24, 2026 22:21 View session

Copilot AI reviewed Apr 24, 2026

View reviewed changes

-                    break;
+                    // Fatal push failures are terminal for the transcription stream.
+                    // Drop the loop-owned sender and return immediately so the stream
+                    // can complete once no other senders remain.
+                    drop(output_tx);
+                    return;

Conversation

rui-ren commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!