feat: add Speech Recognition sample and enable Phi on GPU by haoliuu · Pull Request #610 · microsoft/ai-dev-gallery

haoliuu · 2026-06-10T08:27:12Z

Adds a Speech Recognition sample under Windows AI APIs that transcribes audio locally on device, and upgrades the Windows App SDK, which also enables Phi on GPU for the existing Phi samples.

What's included

Live microphone transcription with interim + final results (streaming).
File recognition from an audio file, with a choice of Batch (full transcript) or Streaming (incremental) via a dropdown.
Microphone permission handling with a settings prompt when access is denied.
Phi on GPU: The Windows App SDK upgrade enables Phi to run on GPU. The existing Phi samples require no code changes and benefit automatically.

Changes

New SpeechRecognition sample (SpeechRecognition.xaml / .cs).
Registers the Speech API in apis.json, WcrApiHelpers.cs, and WcrApiCodeSnippet.cs.
Adds the microphone capability to both app manifests.
Upgrades Microsoft.WindowsAppSDK to 2.2.2-experimental9 (Microsoft.WindowsAppSDK.ML 2.1.75-experimental).

Testing

Verified on a Copilot+ PC: live mic and file recognition (batch + streaming) work.

Adds a Speech Recognition sample under WCRAPIs that transcribes audio locally on device: live microphone streaming plus batch and streaming recognition from an audio file. Registers the Speech API in apis.json, WcrApiHelpers, and WcrApiCodeSnippet, adds the microphone capability to the app manifests, and upgrades Microsoft.WindowsAppSDK to 2.2.2-experimental9 (Microsoft.WindowsAppSDK.ML 2.1.75-experimental).

Copilot

Pull request overview

Adds a new Speech Recognition sample to AI Dev Gallery under Windows AI APIs (WCRAPIs), enabling on-device transcription from both the microphone (streaming) and audio files (batch/streaming), and wires the API into the gallery’s WCR API registration infrastructure.

Changes:

Adds a new SpeechRecognition sample page (XAML + code-behind) implementing mic + file transcription flows.
Registers the new Speech Recognition WCR API in apis.json, WcrApiHelpers.cs, and WcrApiCodeSnippet.cs.
Updates app capabilities/manifests for microphone access and bumps Windows App SDK / ML package versions.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
Directory.Packages.props	Updates Windows App SDK and WindowsAppSDK.ML package versions used across the solution.
AIDevGallery/Samples/WCRAPIs/SpeechRecognition.xaml.cs	Implements the Speech Recognition sample logic (model load, mic streaming, file batch/streaming, cleanup).
AIDevGallery/Samples/WCRAPIs/SpeechRecognition.xaml	Adds the Speech Recognition sample UI (transcription view, start/stop, file recognition dropdown).
AIDevGallery/Samples/Definitions/WcrApis/WcrApiHelpers.cs	Registers Speech Recognition availability/EnsureReady wiring for the WCR API experience.
AIDevGallery/Samples/Definitions/WcrApis/WcrApiCodeSnippet.cs	Adds a Speech Recognition code snippet for docs/in-app display.
AIDevGallery/Samples/Definitions/WcrApis/apis.json	Registers the Speech Recognition API definition metadata (name, docs link, sample id, category).
AIDevGallery/Package.Store.appxmanifest	Adds `microphone` device capability for Store packaging.
AIDevGallery/Package.appxmanifest	Adds `microphone` device capability for local packaging.

timotiusmargo · 2026-06-12T09:58:27Z

+                    return false;
+                }
+            }
+#pragma warning restore CA1416


Is there a different way to do a runtime OS check instead of silencing the warning? This is not blocking and can be resolved in a future change.

Done. Replaced the pragma with a runtime OS check.

timotiusmargo · 2026-06-12T10:00:05Z

+
+    private static void RewriteWavAsCanonicalPcm(string sourcePath, string destPath)
+    {
+        var src = File.ReadAllBytes(sourcePath);


A very large audio file might exhaust the memory. Consider using a size guard or use stream-based header rewriting

Nice catch! I've changed this to a streaming pattern.

timotiusmargo · 2026-06-12T10:01:27Z

+
+        // Tear down off the UI thread (a synchronous wait would deadlock the DispatcherQueue), stopping
+        // and awaiting the session before disposal to avoid corrupting the on-disk model cache.
+        _ = Task.Run(async () =>


Is this task needed because StopContinuousRecognition() is deadlocking when it is being called from the UI thread? Perhaps it is an API issue that needs to be addressed.

It's not a deadlock, it's actually a native crash. When I used StopContinuousRecognition() and disposed while a file-streaming recognition was still running, the app hit a native fail-fast (0xc0000409) from Microsoft.Windows.AI.Speech.dll - Stop returns immediately but the engine is still draining buffered audio, and there's no completion signal to know when teardown is safe. The Task + cancellation lets us await a terminal state before disposing, which avoids it. As discussed offline, we can switch it back to StopContinuousRecognition() when API has fixed the issue.

…ecognition

Copilot AI review requested due to automatic review settings June 10, 2026 08:27

Copilot started reviewing on behalf of haoliuu June 10, 2026 08:27 View session

Copilot AI reviewed Jun 10, 2026

View reviewed changes

Comment thread AIDevGallery/Samples/Definitions/WcrApis/WcrApiHelpers.cs

Comment thread AIDevGallery/Samples/WCRAPIs/SpeechRecognition.xaml

Comment thread AIDevGallery/Samples/Definitions/WcrApis/WcrApiCodeSnippet.cs

haoliuu added 2 commits June 10, 2026 16:42

fix: Remove stale AutomationProperties.Name on StartStopButton

8e71213

Guard Speech snippet against unavailable states like other WCR snippets

8ad4d7c

haoliuu changed the title ~~feat: Speech Recognition sample~~ feat: add Speech Recognition sample and enable Phi on GPU Jun 11, 2026

haoliuu added 4 commits June 12, 2026 11:37

Hide Windows Update download messaging for Speech Recognition

61ec10d

Play audio during file streaming recognition and fix stop handling

0789416

Fix double spaces between recognized speech segments

4ba3c1f

Unify speech input selection into a single source dropdown

61a7d4e

timotiusmargo reviewed Jun 12, 2026

View reviewed changes

haoliuu added 3 commits June 15, 2026 11:14

Replace CA1416 pragma with runtime OS version check

8729b46

Stream WAV data instead of buffering the whole file in memory

6f80bfb

Defer speech model disposal to avoid crash when navigating away mid-r…

8b1ca70

…ecognition

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add Speech Recognition sample and enable Phi on GPU#610

feat: add Speech Recognition sample and enable Phi on GPU#610
haoliuu wants to merge 10 commits into
mainfrom
haoliu/speech-recognition-sample

haoliuu commented Jun 10, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

timotiusmargo Jun 12, 2026

Uh oh!

haoliuu Jun 15, 2026

Uh oh!

timotiusmargo Jun 12, 2026

Uh oh!

haoliuu Jun 15, 2026

Uh oh!

timotiusmargo Jun 12, 2026

Uh oh!

haoliuu Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

haoliuu commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What's included

Changes

Testing

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

timotiusmargo Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

haoliuu Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

timotiusmargo Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

haoliuu Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

timotiusmargo Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

haoliuu Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

haoliuu commented Jun 10, 2026 •

edited

Loading