Reintroduce AI image description work by seanbudd · Pull Request #19451 · nvaccess/nvda

seanbudd · 2026-01-16T06:32:09Z

This reverts commit 9f3aecb.

Link to issue number:

Reintroduces #19425
Blocked by #18662 #19337 #19338
Closes #16281

Summary of the issue:

Description of user facing changes:

Description of developer facing changes:

Description of development approach:

Testing strategy:

Known issues with pull request:

Ensure #19298 and #19299 are not reintrocued

Code Review Checklist:

Documentation:
- Change log entry
- User Documentation
- Developer / Technical Documentation
- Context sensitive help for GUI changes
Testing:
- Unit tests
- System (end to end) tests
- Manual testing
UX of all users considered:
- Speech
- Braille
- Low Vision
- Different web browsers
- Localization in other languages / culture than English
API is compatible with existing add-ons.
Security precautions taken.

This reverts commit 9f3aecb.

source/gui/blockAction.py

 		pgettext("remote", "Action unavailable when Remote Access is disabled"),
 	)
+	SCREEN_CURTAIN = (
+		lambda: _isScreenCurtainEnabled(),


source/_localCaptioner/imageDescriber.py

+
+
+# Module-level configuration
+_localCaptioner = None


source/_localCaptioner/modelConfig.py

+
+
+# Default configuration instances
+_DEFAULT_ENCODER_CONFIG: _EncoderConfig | None = None


source/_localCaptioner/modelConfig.py

+
+# Default configuration instances
+_DEFAULT_ENCODER_CONFIG: _EncoderConfig | None = None
+_DEFAULT_DECODER_CONFIG: _DecoderConfig | None = None


source/_localCaptioner/modelConfig.py

+# Default configuration instances
+_DEFAULT_ENCODER_CONFIG: _EncoderConfig | None = None
+_DEFAULT_DECODER_CONFIG: _DecoderConfig | None = None
+_DEFAULT_GENERATION_CONFIG: _GenerationConfig | None = None


source/_localCaptioner/modelConfig.py

+_DEFAULT_ENCODER_CONFIG: _EncoderConfig | None = None
+_DEFAULT_DECODER_CONFIG: _DecoderConfig | None = None
+_DEFAULT_GENERATION_CONFIG: _GenerationConfig | None = None
+_DEFAULT_MODEL_CONFIG: _ModelConfig | None = None


source/_localCaptioner/modelConfig.py

+_DEFAULT_DECODER_CONFIG: _DecoderConfig | None = None
+_DEFAULT_GENERATION_CONFIG: _GenerationConfig | None = None
+_DEFAULT_MODEL_CONFIG: _ModelConfig | None = None
+_DEFAULT_PREPROCESSOR_CONFIG: _PreprocessorConfig | None = None


source/_localCaptioner/modelConfig.py

+		_DEFAULT_GENERATION_CONFIG, \
+		_DEFAULT_MODEL_CONFIG, \
+		_DEFAULT_PREPROCESSOR_CONFIG
+	_DEFAULT_ENCODER_CONFIG = _EncoderConfig()


source/_localCaptioner/modelConfig.py

+		_DEFAULT_MODEL_CONFIG, \
+		_DEFAULT_PREPROCESSOR_CONFIG
+	_DEFAULT_ENCODER_CONFIG = _EncoderConfig()
+	_DEFAULT_DECODER_CONFIG = _DecoderConfig()


source/_localCaptioner/modelConfig.py

+		_DEFAULT_PREPROCESSOR_CONFIG
+	_DEFAULT_ENCODER_CONFIG = _EncoderConfig()
+	_DEFAULT_DECODER_CONFIG = _DecoderConfig()
+	_DEFAULT_GENERATION_CONFIG = _GenerationConfig()


Copilot

Pull request overview

This PR reintroduces experimental on-device AI image description functionality to NVDA. The feature allows users to generate image descriptions locally using ONNX models without sending data externally.

Changes:

Adds local AI image captioning using ONNX Runtime with Vision Transformer encoder and GPT-2 decoder
Includes model downloader for fetching ~235MB of model files from HuggingFace
Adds new settings panel and keyboard commands (NVDA+g) for generating image descriptions

Reviewed changes

Copilot reviewed 28 out of 31 changed files in this pull request and generated 11 comments.

Show a summary per file

File	Description
uv.lock	Downgraded multiple dependencies and added new ones (numpy, onnxruntime, etc.)
pyproject.toml	Added onnxruntime==1.23.2, numpy==2.3.5 dependencies
source/_localCaptioner/	Core implementation of image captioning functionality
source/gui/settingsDialogs.py	Added AI Image Descriptions settings panel
source/config/configSpec.py	Added automatedImageDescriptions configuration section
user_docs/en/	Added user documentation for the feature
tests/	Added unit and system tests

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-16T06:38:01Z

source/config/configSpec.py

 	fontFormattingDisplay = featureFlag(optionsEnum="FontFormattingBrailleModeFlag", behaviorOfDefault="LIBLOUIS")
 	[[auto]]
-		excludedDisplays = string_list(default=list("dotPad"))
+    	excludedDisplays = string_list(default=list("dotPad"))


Inconsistent indentation detected. This line uses tabs and spaces mixed incorrectly compared to the rest of the file. The line should use a single tab for indentation to match the file's indentation pattern.

Suggested change

excludedDisplays = string_list(default=list("dotPad"))

excludedDisplays = string_list(default=list("dotPad"))

Copilot · 2026-01-16T06:38:02Z

source/config/configSpec.py

 	[[speech]]
 		# LearningDisability, Blindness, LowVision
-		impairment = string(default="Blindness")
+    	impairment = string(default="Blindness")


Inconsistent indentation detected across lines 376-392. These lines should use a single tab for indentation to match the file's standard indentation pattern, but they appear to use mixed tabs and spaces.

Copilot · 2026-01-16T06:38:02Z

source/_localCaptioner/modelDownloader.py

+		:return: Absolute path of the *models* directory.
+		:raises OSError: When the directory cannot be created.
+		"""
+		modelsDir = os.path.abspath(config.conf["automatedImageDescriptions"]["defaultModel"])


The configuration value 'defaultModel' contains a model name (e.g., 'Xenova/vit-gpt2-image-captioning'), not a directory path. Using os.path.abspath() on this will create an incorrect path. This should construct the path by joining WritePaths.modelsDir with the defaultModel value.

Suggested change

modelsDir = os.path.abspath(config.conf["automatedImageDescriptions"]["defaultModel"])

defaultModelName = config.conf["automatedImageDescriptions"]["defaultModel"]

modelsDir = os.path.join(WritePaths.modelsDir, defaultModelName)

Copilot · 2026-01-16T06:38:02Z

source/_localCaptioner/captioner/vitGpt2.py

+	@lru_cache()
+	def generateCaption(
+		self,
+		image: str | bytes,
+		maxLength: int | None = None,
+	) -> str:


Using @lru_cache() on a method that accepts bytes input is problematic because bytes objects are not hashable by lru_cache in a memory-efficient way. This will cause cache bloat and potential memory issues. Either remove the cache decorator or implement a custom caching mechanism that handles image data appropriately.

Copilot · 2026-01-16T06:38:02Z

source/gui/_localCaptioner/messageDialogs.py

+		self.modelDownloader.requestCancel()
+		ImageDescDownloader._downloadThread = None
+		self._progressDialog.Hide()
+		self._progressDialog.Destroy()
+		self._progressDialog = None


The _stopped method attempts to destroy the progress dialog without checking if it's None first. If modelDownloader.requestCancel() is called when _progressDialog is None (e.g., during initialization), this will raise an AttributeError.

Suggested change

self.modelDownloader.requestCancel()

ImageDescDownloader._downloadThread = None

self._progressDialog.Hide()

self._progressDialog.Destroy()

self._progressDialog = None

if self.modelDownloader is not None:

self.modelDownloader.requestCancel()

ImageDescDownloader._downloadThread = None

if self._progressDialog is not None:

self._progressDialog.Hide()

self._progressDialog.Destroy()

self._progressDialog = None

Copilot · 2026-01-16T06:38:03Z

source/gui/blockAction.py

 		pgettext("remote", "Action unavailable when Remote Access is disabled"),
 	)
+	SCREEN_CURTAIN = (
+		lambda: _isScreenCurtainEnabled(),


This 'lambda' is just a simple wrapper around a callable object. Use that object directly.

Suggested change

lambda: _isScreenCurtainEnabled(),

_isScreenCurtainEnabled,

Copilot · 2026-01-16T06:38:03Z

source/_localCaptioner/imageDescriber.py

+"""
+
+import io
+import threading


Module 'threading' is imported with both 'import' and 'import from'.

Suggested change

import threading

Copilot · 2026-01-16T06:38:03Z

source/gui/_localCaptioner/messageDialogs.py

+from threading import Thread
+import wx
+import ui
+import _localCaptioner
+
+
+class ImageDescDownloader:
+	_downloadThread: Thread | None = None


Module 'threading' is imported with both 'import' and 'import from'.

Suggested change

from threading import Thread

import wx

import ui

import _localCaptioner

class ImageDescDownloader:

_downloadThread: Thread | None = None

import wx

import ui

import _localCaptioner

class ImageDescDownloader:

_downloadThread: threading.Thread | None = None

Copilot · 2026-01-16T06:38:04Z

source/_localCaptioner/modelDownloader.py

+				except OSError:
+					pass


'except' clause does nothing but pass and there is no explanatory comment.

Suggested change

except OSError:

pass

except OSError as e:

# Best-effort cleanup: log failure but do not interrupt the download flow.

log.warning(f"Failed to remove local file '{localPath}': {e}")

Copilot · 2026-01-16T06:38:04Z

source/_localCaptioner/modelDownloader.py

+				except OSError:
+					pass


'except' clause does nothing but pass and there is no explanatory comment.

Suggested change

except OSError:

pass

except OSError as e:

log.warning(f"Failed to remove partial download '{localPath}': {e}")

Description of user facing changes: replace the default model with Mozilla's distilvit Description of developer facing changes: None Description of development approach: None

seanbudd · 2026-01-30T01:58:24Z

@tianzeshi-study has created a build with a newer model. This model is still very lightweight so it's quality is limited. Please test the build and let us know if it is an improvement over the previous model

Revert "Revert AI image description work (#19425)"

eb00c18

This reverts commit 9f3aecb.

seanbudd requested a review from a team as a code owner January 16, 2026 06:32

Copilot AI review requested due to automatic review settings January 16, 2026 06:32

seanbudd requested a review from a team as a code owner January 16, 2026 06:32

seanbudd requested a review from Qchristensen January 16, 2026 06:32

seanbudd added the blocked label Jan 16, 2026

seanbudd requested a review from SaschaCowley January 16, 2026 06:32

seanbudd marked this pull request as draft January 16, 2026 06:32

Copilot started reviewing on behalf of seanbudd January 16, 2026 06:32 View session

github-code-quality bot found potential problems Jan 16, 2026

View reviewed changes

Copilot AI reviewed Jan 16, 2026

View reviewed changes

seanbudd temporarily deployed to snapshot January 16, 2026 07:07 — with GitHub Actions Inactive

github-actions bot requested a deployment to snapshot January 16, 2026 07:08 Abandoned

replace the default model with Mozilla's distilvit (#19530)

ea5825b

Description of user facing changes: replace the default model with Mozilla's distilvit Description of developer facing changes: None Description of development approach: None

fix unit test for image description (#19535)

6e78df1

seanbudd self-assigned this Feb 19, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reintroduce AI image description work#19451

Reintroduce AI image description work#19451
seanbudd wants to merge 3 commits intomasterfrom
try-image-desc

seanbudd commented Jan 16, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jan 16, 2026

Uh oh!

Copilot AI Jan 16, 2026

Uh oh!

Copilot AI Jan 16, 2026

Uh oh!

Copilot AI Jan 16, 2026

Uh oh!

Copilot AI Jan 16, 2026

Uh oh!

Copilot AI Jan 16, 2026

Uh oh!

Copilot AI Jan 16, 2026

Uh oh!

Copilot AI Jan 16, 2026

Uh oh!

Copilot AI Jan 16, 2026

Uh oh!

Copilot AI Jan 16, 2026

Uh oh!

seanbudd commented Jan 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants



		# Default configuration instances
		_DEFAULT_ENCODER_CONFIG: _EncoderConfig \| None = None

	excludedDisplays = string_list(default=list("dotPad"))
	excludedDisplays = string_list(default=list("dotPad"))

	modelsDir = os.path.abspath(config.conf["automatedImageDescriptions"]["defaultModel"])
	defaultModelName = config.conf["automatedImageDescriptions"]["defaultModel"]
	modelsDir = os.path.join(WritePaths.modelsDir, defaultModelName)

-				except OSError:
-					pass
+				except OSError as e:
+					# Best-effort cleanup: log failure but do not interrupt the download flow.
+					log.warning(f"Failed to remove local file '{localPath}': {e}")

Uh oh!

Conversation

seanbudd commented Jan 16, 2026

Link to issue number:

Summary of the issue:

Description of user facing changes:

Description of developer facing changes:

Description of development approach:

Testing strategy:

Known issues with pull request:

Code Review Checklist:

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

seanbudd commented Jan 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants