Skip to content

feat: add Speech Recognition sample and enable Phi on GPU#610

Open
haoliuu wants to merge 10 commits into
mainfrom
haoliu/speech-recognition-sample
Open

feat: add Speech Recognition sample and enable Phi on GPU#610
haoliuu wants to merge 10 commits into
mainfrom
haoliu/speech-recognition-sample

Conversation

@haoliuu

@haoliuu haoliuu commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator

Adds a Speech Recognition sample under Windows AI APIs that transcribes audio locally on device, and upgrades the Windows App SDK, which also enables Phi on GPU for the existing Phi samples.

What's included

  • Live microphone transcription with interim + final results (streaming).
  • File recognition from an audio file, with a choice of Batch (full transcript) or Streaming (incremental) via a dropdown.
  • Microphone permission handling with a settings prompt when access is denied.
  • Phi on GPU: The Windows App SDK upgrade enables Phi to run on GPU. The existing Phi samples require no code changes and benefit automatically.

Changes

  • New SpeechRecognition sample (SpeechRecognition.xaml / .cs).
  • Registers the Speech API in apis.json, WcrApiHelpers.cs, and WcrApiCodeSnippet.cs.
  • Adds the microphone capability to both app manifests.
  • Upgrades Microsoft.WindowsAppSDK to 2.2.2-experimental9 (Microsoft.WindowsAppSDK.ML 2.1.75-experimental).

Testing

  • Verified on a Copilot+ PC: live mic and file recognition (batch + streaming) work.
image image image

Adds a Speech Recognition sample under WCRAPIs that transcribes audio locally on device: live microphone streaming plus batch and streaming recognition from an audio file. Registers the Speech API in apis.json, WcrApiHelpers, and WcrApiCodeSnippet, adds the microphone capability to the app manifests, and upgrades Microsoft.WindowsAppSDK to 2.2.2-experimental9 (Microsoft.WindowsAppSDK.ML 2.1.75-experimental).
Copilot AI review requested due to automatic review settings June 10, 2026 08:27

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new Speech Recognition sample to AI Dev Gallery under Windows AI APIs (WCRAPIs), enabling on-device transcription from both the microphone (streaming) and audio files (batch/streaming), and wires the API into the gallery’s WCR API registration infrastructure.

Changes:

  • Adds a new SpeechRecognition sample page (XAML + code-behind) implementing mic + file transcription flows.
  • Registers the new Speech Recognition WCR API in apis.json, WcrApiHelpers.cs, and WcrApiCodeSnippet.cs.
  • Updates app capabilities/manifests for microphone access and bumps Windows App SDK / ML package versions.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
Directory.Packages.props Updates Windows App SDK and WindowsAppSDK.ML package versions used across the solution.
AIDevGallery/Samples/WCRAPIs/SpeechRecognition.xaml.cs Implements the Speech Recognition sample logic (model load, mic streaming, file batch/streaming, cleanup).
AIDevGallery/Samples/WCRAPIs/SpeechRecognition.xaml Adds the Speech Recognition sample UI (transcription view, start/stop, file recognition dropdown).
AIDevGallery/Samples/Definitions/WcrApis/WcrApiHelpers.cs Registers Speech Recognition availability/EnsureReady wiring for the WCR API experience.
AIDevGallery/Samples/Definitions/WcrApis/WcrApiCodeSnippet.cs Adds a Speech Recognition code snippet for docs/in-app display.
AIDevGallery/Samples/Definitions/WcrApis/apis.json Registers the Speech Recognition API definition metadata (name, docs link, sample id, category).
AIDevGallery/Package.Store.appxmanifest Adds microphone device capability for Store packaging.
AIDevGallery/Package.appxmanifest Adds microphone device capability for local packaging.

Comment thread AIDevGallery/Samples/Definitions/WcrApis/WcrApiHelpers.cs
Comment thread AIDevGallery/Samples/WCRAPIs/SpeechRecognition.xaml
Comment thread AIDevGallery/Samples/Definitions/WcrApis/WcrApiCodeSnippet.cs
@haoliuu haoliuu changed the title feat: Speech Recognition sample feat: add Speech Recognition sample and enable Phi on GPU Jun 11, 2026
return false;
}
}
#pragma warning restore CA1416

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a different way to do a runtime OS check instead of silencing the warning? This is not blocking and can be resolved in a future change.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Replaced the pragma with a runtime OS check.


private static void RewriteWavAsCanonicalPcm(string sourcePath, string destPath)
{
var src = File.ReadAllBytes(sourcePath);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A very large audio file might exhaust the memory. Consider using a size guard or use stream-based header rewriting

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch! I've changed this to a streaming pattern.


// Tear down off the UI thread (a synchronous wait would deadlock the DispatcherQueue), stopping
// and awaiting the session before disposal to avoid corrupting the on-disk model cache.
_ = Task.Run(async () =>

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this task needed because StopContinuousRecognition() is deadlocking when it is being called from the UI thread? Perhaps it is an API issue that needs to be addressed.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a deadlock, it's actually a native crash. When I used StopContinuousRecognition() and disposed while a file-streaming recognition was still running, the app hit a native fail-fast (0xc0000409) from Microsoft.Windows.AI.Speech.dll - Stop returns immediately but the engine is still draining buffered audio, and there's no completion signal to know when teardown is safe. The Task + cancellation lets us await a terminal state before disposing, which avoids it. As discussed offline, we can switch it back to StopContinuousRecognition() when API has fixed the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants