feat: add Speech Recognition sample and enable Phi on GPU#610
Conversation
Adds a Speech Recognition sample under WCRAPIs that transcribes audio locally on device: live microphone streaming plus batch and streaming recognition from an audio file. Registers the Speech API in apis.json, WcrApiHelpers, and WcrApiCodeSnippet, adds the microphone capability to the app manifests, and upgrades Microsoft.WindowsAppSDK to 2.2.2-experimental9 (Microsoft.WindowsAppSDK.ML 2.1.75-experimental).
There was a problem hiding this comment.
Pull request overview
Adds a new Speech Recognition sample to AI Dev Gallery under Windows AI APIs (WCRAPIs), enabling on-device transcription from both the microphone (streaming) and audio files (batch/streaming), and wires the API into the gallery’s WCR API registration infrastructure.
Changes:
- Adds a new
SpeechRecognitionsample page (XAML + code-behind) implementing mic + file transcription flows. - Registers the new Speech Recognition WCR API in
apis.json,WcrApiHelpers.cs, andWcrApiCodeSnippet.cs. - Updates app capabilities/manifests for microphone access and bumps Windows App SDK / ML package versions.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| Directory.Packages.props | Updates Windows App SDK and WindowsAppSDK.ML package versions used across the solution. |
| AIDevGallery/Samples/WCRAPIs/SpeechRecognition.xaml.cs | Implements the Speech Recognition sample logic (model load, mic streaming, file batch/streaming, cleanup). |
| AIDevGallery/Samples/WCRAPIs/SpeechRecognition.xaml | Adds the Speech Recognition sample UI (transcription view, start/stop, file recognition dropdown). |
| AIDevGallery/Samples/Definitions/WcrApis/WcrApiHelpers.cs | Registers Speech Recognition availability/EnsureReady wiring for the WCR API experience. |
| AIDevGallery/Samples/Definitions/WcrApis/WcrApiCodeSnippet.cs | Adds a Speech Recognition code snippet for docs/in-app display. |
| AIDevGallery/Samples/Definitions/WcrApis/apis.json | Registers the Speech Recognition API definition metadata (name, docs link, sample id, category). |
| AIDevGallery/Package.Store.appxmanifest | Adds microphone device capability for Store packaging. |
| AIDevGallery/Package.appxmanifest | Adds microphone device capability for local packaging. |
| return false; | ||
| } | ||
| } | ||
| #pragma warning restore CA1416 |
There was a problem hiding this comment.
Is there a different way to do a runtime OS check instead of silencing the warning? This is not blocking and can be resolved in a future change.
There was a problem hiding this comment.
Done. Replaced the pragma with a runtime OS check.
|
|
||
| private static void RewriteWavAsCanonicalPcm(string sourcePath, string destPath) | ||
| { | ||
| var src = File.ReadAllBytes(sourcePath); |
There was a problem hiding this comment.
A very large audio file might exhaust the memory. Consider using a size guard or use stream-based header rewriting
There was a problem hiding this comment.
Nice catch! I've changed this to a streaming pattern.
|
|
||
| // Tear down off the UI thread (a synchronous wait would deadlock the DispatcherQueue), stopping | ||
| // and awaiting the session before disposal to avoid corrupting the on-disk model cache. | ||
| _ = Task.Run(async () => |
There was a problem hiding this comment.
Is this task needed because StopContinuousRecognition() is deadlocking when it is being called from the UI thread? Perhaps it is an API issue that needs to be addressed.
There was a problem hiding this comment.
It's not a deadlock, it's actually a native crash. When I used StopContinuousRecognition() and disposed while a file-streaming recognition was still running, the app hit a native fail-fast (0xc0000409) from Microsoft.Windows.AI.Speech.dll - Stop returns immediately but the engine is still draining buffered audio, and there's no completion signal to know when teardown is safe. The Task + cancellation lets us await a terminal state before disposing, which avoids it. As discussed offline, we can switch it back to StopContinuousRecognition() when API has fixed the issue.
Adds a Speech Recognition sample under Windows AI APIs that transcribes audio locally on device, and upgrades the Windows App SDK, which also enables Phi on GPU for the existing Phi samples.
What's included
Changes
SpeechRecognitionsample (SpeechRecognition.xaml/.cs).apis.json,WcrApiHelpers.cs, andWcrApiCodeSnippet.cs.microphonecapability to both app manifests.Microsoft.WindowsAppSDKto2.2.2-experimental9(Microsoft.WindowsAppSDK.ML2.1.75-experimental).Testing