Normalize decorative Unicode letters not handled by NFKC by bramd · Pull Request #19608 · nvaccess/nvda

bramd · 2026-02-12T18:57:34Z

Link to issue number:

Summary of the issue:

NVDA's Unicode normalization (NFKC) does not decompose certain decorative Unicode letter characters, causing them to be read as their full Unicode name or as silence. This affects negative squared Latin capital letters (U+1F170–U+1F189), negative circled Latin capital letters (U+1F150–U+1F169), and regional indicator symbol letters (U+1F1E6–U+1F1FF).

Description of user facing changes:

When Unicode normalization is enabled, decorative Unicode letters such as negative squared (🅰🅱🅲), negative circled, and regional indicator symbol characters are now correctly read as their base Latin letters (A, B, C, etc.) in both speech and braille.

Description of developer facing changes:

Added _buildSupplementaryNormalizationTable() which builds a translation table for the three affected Unicode ranges.
Extended unicodeNormalize() to apply the supplementary table via str.translate() before standard NFKC normalization.
Extended isUnicodeNormalized() to detect characters in the supplementary table.
Applied the supplementary table in UnicodeNormalizationOffsetConverter.__init__() so the braille code path also handles these characters correctly.

Description of development approach:

The standard unicodedata.normalize("NFKC", ...) does not define decompositions for these Supplementary Multilingual Plane characters. A supplementary translation table maps each codepoint to its plain Latin letter. This table is applied via str.translate() before NFKC normalization in both the speech path (unicodeNormalize()) and the braille path (UnicodeNormalizationOffsetConverter). Since all mappings are single-codepoint to single-codepoint, the existing offset converter logic handles them correctly without changes to the offset mapping algorithm.

Testing strategy:

All 89 existing test_textUtils unit tests pass.
Manual verification that unicodeNormalize() correctly maps all three character ranges to A-Z.
Manual verification that isUnicodeNormalized() returns False for supplementary characters.
Regression check that standard NFKC characters (circled Ⓐ, fullwidth Ａ) still normalize correctly.
Manual testing for speech and braille output on a unicode character table in Firefox

Known issues with pull request:

None.

Code Review Checklist:

Documentation:
- Change log entry
- User Documentation
- Developer / Technical Documentation
- Context sensitive help for GUI changes
Testing:
- Unit tests
- System (end to end) tests
- Manual testing
UX of all users considered:
- Speech
- Braille
- Low Vision
- Different web browsers
- Localization in other languages / culture than English
API is compatible with existing add-ons.
Security precautions taken.

Extend unicodeNormalize to apply a supplementary translation table for decorative Unicode letter characters that unicodedata.normalize("NFKC") does not decompose: negative squared Latin capitals (U+1F170-U+1F189), negative circled Latin capitals (U+1F150-U+1F169), and regional indicator symbol letters (U+1F1E6-U+1F1FF).

seanbudd · 2026-02-19T02:55:21Z

should this be reported to NFKC so it eventually gets fixed upstream?

…ormalization Remove Regional Indicator Symbol Letters (U+1F1E6-U+1F1FF) from the supplementary normalization table as decomposing them breaks flag emoji (e.g. 🇺🇸 would become "US" instead of being recognized as a flag). Exclude 4 Negative Squared codepoints that have emoji semantics: U+1F170 (🅰 blood type A), U+1F171 (🅱 blood type B), U+1F17E (🅾 blood type O), U+1F17F (🅿 P button). The remaining 22 squared letters are still decomposed.

bramd · 2026-02-19T21:52:55Z

should this be reported to NFKC so it eventually gets fixed upstream?

Good question, I have looked some more in the origin of these characters.
The characters (Negative Circled and Negative Squared Latin Capital Letters) were added in Unicode 6.0 (2010), sourced from the Japanese ARIB broadcasting standard. Unlike the older Circled Latin Letters (added in Unicode 1.1), which were given <circle> compatibility decompositions because they were considered typographic variants, the newer characters were encoded as autonomous symbols. It seems the policy was to don't offer decompositions for characters imported from other, external standards.

More importantly, the Unicode Normalization Stability Policy guarantees that once a character is encoded without a decomposition mapping, it can never receive one. Adding decomposition mappings retroactively would break the guarantee that normalized strings remain stable across Unicode versions.

Additionally, several codepoints in the Negative Squared range have emoji semantics (blood type A, blood type B, blood type O, P button). Decomposing those to plain letters would destroy their meaning. I've excluded those 4 emoji codepoints from the supplementary table in the latest commit, along with removing the Regional Indicator Symbols (U+1F1E6 - U+1F1FF) entirely since decomposing those would break flag emoji (e.g. 🇺🇸 becoming "US"). These regional symbols were specifically added to create flags without encoding many different flags in the standard and probably won't render properly as single characters/letters, so I don't expect them to be used as such anyway.

seanbudd · 2026-02-25T23:38:58Z

I think if we are going to include this, we need to specifically list all additional processing we do on top of NFKC in the user guide.

accident

seanbudd · 2026-02-25T23:55:11Z

Otherwise, I think this looks good

bramd force-pushed the fix/unicode-normalization-squared-17120 branch from efa23a1 to 6282518 Compare February 12, 2026 18:59

bramd added 2 commits February 12, 2026 20:40

Empty commit to trigger CI

c0ebc25

Merge branch 'master' into fix/unicode-normalization-squared-17120

e430607

bramd marked this pull request as ready for review February 15, 2026 14:30

bramd requested a review from a team as a code owner February 15, 2026 14:30

bramd requested a review from SaschaCowley February 15, 2026 14:30

seanbudd assigned SaschaCowley Feb 19, 2026

seanbudd added the blocked/needs-product-decision A product decision needs to be made. Decisions about NVDA UX or supported use-cases. label Feb 25, 2026

seanbudd requested review from seanbudd and removed request for SaschaCowley February 25, 2026 23:08

seanbudd assigned seanbudd and unassigned SaschaCowley Feb 25, 2026

seanbudd previously approved these changes Feb 25, 2026

View reviewed changes

seanbudd self-requested a review February 25, 2026 23:54

seanbudd requested a review from SaschaCowley February 26, 2026 05:18

seanbudd assigned SaschaCowley Feb 26, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Normalize decorative Unicode letters not handled by NFKC#19608

Normalize decorative Unicode letters not handled by NFKC#19608
bramd wants to merge 4 commits intonvaccess:masterfrom
bramd:fix/unicode-normalization-squared-17120

bramd commented Feb 12, 2026 •

edited

Loading

Uh oh!

seanbudd commented Feb 19, 2026

Uh oh!

bramd commented Feb 19, 2026

Uh oh!

seanbudd commented Feb 25, 2026

Uh oh!

seanbudd commented Feb 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

bramd commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Link to issue number:

Summary of the issue:

Description of user facing changes:

Description of developer facing changes:

Description of development approach:

Testing strategy:

Known issues with pull request:

Code Review Checklist:

Uh oh!

seanbudd commented Feb 19, 2026

Uh oh!

bramd commented Feb 19, 2026

Uh oh!

seanbudd commented Feb 25, 2026

Uh oh!

seanbudd commented Feb 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

bramd commented Feb 12, 2026 •

edited

Loading