[model] support Unlimited_OCR by z0o0ey · Pull Request #9645 · modelscope/ms-swift

z0o0ey · 2026-06-25T09:15:30Z

Support for PaddlePaddle/Unlimited-OCR.

Changes:

swift/model/constant.py: Add MLLMModelType.unlimited_ocr
swift/model/model_arch.py: Add MLLMModelArch.unlimited_ocr
swift/model/models/deepseek.py: Add UnlimitedOCRLoader with multi-GPU device_map patch
swift/template/constant.py: Add MLLMTemplateType.unlimited_ocr
swift/template/templates/deepseek.py: Add UnlimitedOCR template

Usage:

swift sft \
    --model PaddlePaddle/Unlimited-OCR \
    --model_type unlimited-ocr \
    --template unlimited_ocr \
    --dataset AI-ModelScope/LaTeX_OCR \
    --lazy_tokenize true

swift infer \
    --adapters <checkpoint> \
    --load_data_args true \
    --stream true

- Add MLLMModelType.unlimited_ocr and MLLMTemplateType.unlimited_ocr - Add UnlimitedOCRLoader with multi-GPU device_map patch - Fix torch.cat device mismatch for image_newline/view_seperator - Fix masked_scatter_ device mismatch caused by hard-coded .cuda() - Add UnlimitedOCR template inheriting from DeepseekOCR - Override image_placeholder to remove trailing newline - Add _fix_device() for parameter device alignment - Register model: PaddlePaddle/Unlimited-OCR Tested: LoRA fine-tuning on LaTeX_OCR dataset with 8x GPU, inference verified with 4/5 exact match on validation set.

gemini-code-assist

Code Review

This pull request adds support for the unlimited-ocr model, including its model type, architecture registration, loader, and template. Key feedback points out a potential race condition when dynamically patching global PyTorch functions inside the forward pass, a redundancy in calling super() in UnlimitedOCRLoader.get_model which bypasses the parent class's patching logic, and an inconsistency in using ModelArch instead of MLLMModelArch during architecture registration.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-25T09:17:11Z

+    def _apply_multi_gpu_patch():
+        """
+        Fixed two bugs affecting `UnlimitedOCRModel` in multi-GPU scenarios using `device_map='auto'`:
+
+        Bug 1 - Device mismatch in `torch.cat`:
+            `image_newline` and `view_seperator` are `nn.Parameter`s;
+            under `device_map='auto'`, their device placement might not align
+            with the image features.
+
+        Bug 2 - Device mismatch in `masked_scatter_`:
+            Hard-coded `.cuda()` usage caused a conflict where `images_in_this_batch`
+            resided on the projector's device (e.g., `cuda:7`),
+            while `inputs_embeds` resided on the device hosting `embed_tokens` (e.g., `cuda:0`).
+
+        Fix strategy: Temporarily replace `torch.cat` and `torch.Tensor.masked_scatter_` during the forward pass
+        to handle device placement automatically, then restore the original methods after execution.
+        """
+        import sys
+        import torch
+
+        modeling_module = None
+        for mod_name, mod in sys.modules.items():
+            if 'modeling_unlimitedocr' in mod_name:
+                modeling_module = mod
+                break
+
+        if modeling_module is None:
+            return False
+
+        UnlimitedOCRModel = getattr(modeling_module, 'UnlimitedOCRModel', None)
+        if UnlimitedOCRModel is None:
+            return False
+
+        # Avoid redundant patching
+        if getattr(UnlimitedOCRModel, '_swift_multi_gpu_patched', False):
+            return True
+
+        _original_forward = UnlimitedOCRModel.forward
+
+        def _patched_forward(self, *args, **kwargs):
+            _orig_cat = torch.cat
+            _orig_masked_scatter_ = torch.Tensor.masked_scatter_
+
+            def _safe_cat(tensors, dim=0, **cat_kwargs):
+                # Using the device of the first tensor as the reference, the others are aligned to it.
+                ref_device = None
+                for t in tensors:
+                    if isinstance(t, torch.Tensor):
+                        ref_device = t.device
+                        break
+                if ref_device is None:
+                    return _orig_cat(tensors, dim, **cat_kwargs)
+                aligned = [
+                    t.to(ref_device) if isinstance(t, torch.Tensor) and t.device != ref_device else t for t in tensors
+                ]
+                return _orig_cat(aligned, dim, **cat_kwargs)
+
+            def _safe_masked_scatter_(tensor_self, mask, source):
+                # Use the device of tensor_self (inputs_embeds[idx]) as the reference.
+                dev = tensor_self.device
+                if mask.device != dev:
+                    mask = mask.to(dev)
+                if source.device != dev:
+                    source = source.to(dev)
+                return _orig_masked_scatter_(tensor_self, mask, source)
+
+            # Simultaneously replace the module namespace and the global scope (double insurance).
+            modeling_module.torch.cat = _safe_cat
+            torch.cat = _safe_cat
+            torch.Tensor.masked_scatter_ = _safe_masked_scatter_
+            try:
+                return _original_forward(self, *args, **kwargs)
+            finally:
+                # Restore the state to avoid contaminating other modules.
+                modeling_module.torch.cat = _orig_cat
+                torch.cat = _orig_cat
+                torch.Tensor.masked_scatter_ = _orig_masked_scatter_
+
+        UnlimitedOCRModel.forward = _patched_forward
+        UnlimitedOCRModel._swift_multi_gpu_patched = True
+        return True


Dynamically patching global functions like torch.cat and torch.Tensor.masked_scatter_ inside the forward pass and restoring them in a finally block is highly prone to race conditions in concurrent/multi-threaded environments (e.g., multi-threaded inference servers). If two threads execute the forward pass concurrently, they can overwrite each other's saved original functions, leading to permanent global pollution of torch.cat with the patched version.

Instead, apply the safe wrapper patches globally once during the initialization of the patch. This completely avoids any race conditions and simplifies the forward pass logic.

@staticmethod def _apply_multi_gpu_patch(): """ Fixed two bugs affecting `UnlimitedOCRModel` in multi-GPU scenarios using `device_map='auto'`: Bug 1 - Device mismatch in `torch.cat`: `image_newline` and `view_seperator` are `nn.Parameter`s; under `device_map='auto'`, their device placement might not align with the image features. Bug 2 - Device mismatch in `masked_scatter_`: Hard-coded `.cuda()` usage caused a conflict where `images_in_this_batch` resided on the projector's device (e.g., `cuda:7`), while `inputs_embeds` resided on the device hosting `embed_tokens` (e.g., `cuda:0`). Fix strategy: Replace `torch.cat` and `torch.Tensor.masked_scatter_` globally once with thread-safe wrappers that handle device alignment automatically. """ import sys import torch modeling_module = None for mod_name, mod in sys.modules.items(): if 'modeling_unlimitedocr' in mod_name: modeling_module = mod break if modeling_module is None: return False UnlimitedOCRModel = getattr(modeling_module, 'UnlimitedOCRModel', None) if UnlimitedOCRModel is None: return False # Avoid redundant patching if getattr(UnlimitedOCRModel, '_swift_multi_gpu_patched', False): return True _orig_cat = torch.cat def _safe_cat(tensors, dim=0, **cat_kwargs): ref_device = None has_mismatch = False for t in tensors: if isinstance(t, torch.Tensor): if ref_device is None: ref_device = t.device elif t.device != ref_device: has_mismatch = True if not has_mismatch or ref_device is None: return _orig_cat(tensors, dim, **cat_kwargs) aligned = [ t.to(ref_device) if isinstance(t, torch.Tensor) and t.device != ref_device else t for t in tensors ] return _orig_cat(aligned, dim, **cat_kwargs) _orig_masked_scatter_ = torch.Tensor.masked_scatter_ def _safe_masked_scatter_(tensor_self, mask, source): dev = tensor_self.device if mask.device != dev: mask = mask.to(dev) if source.device != dev: source = source.to(dev) return _orig_masked_scatter_(tensor_self, mask, source) torch.cat = _safe_cat if hasattr(modeling_module, 'torch'): modeling_module.torch.cat = _safe_cat torch.Tensor.masked_scatter_ = _safe_masked_scatter_ UnlimitedOCRModel._swift_multi_gpu_patched = True return True

gemini-code-assist · 2026-06-25T09:17:12Z


+register_model_arch(
+    MultiModelKeys(
+        ModelArch.unlimited_ocr,


For consistency with all other model architecture registrations in this file, use MLLMModelArch.unlimited_ocr instead of ModelArch.unlimited_ocr.

Suggested change

ModelArch.unlimited_ocr,

MLLMModelArch.unlimited_ocr,

gemini-code-assist · 2026-06-25T09:17:12Z

+        self.auto_model_cls = self.auto_model_cls or AutoModel
+        model = super(DeepseekOCRLoader, self).get_model(model_dir, *args, **kwargs)
+        patch_output_clone(model.model.embed_tokens)
+        patch_output_to_input_device(model.model.sam_model)
+        patch_output_to_input_device(getattr(model.model, self.visual_name))
+        patch_output_to_input_device(model.model.projector)
+        patch_output_to_input_device(model.model)


Since UnlimitedOCRLoader inherits from DeepseekOCRLoader, calling super(DeepseekOCRLoader, self).get_model(...) bypasses DeepseekOCRLoader.get_model and duplicates all of its patching logic. This is highly redundant and hard to maintain.

Instead, use super().get_model(...) to let DeepseekOCRLoader apply its patches, and then simply apply the additional patch_output_to_input_device(model.model) patch.

Suggested change

self.auto_model_cls = self.auto_model_cls or AutoModel

model = super(DeepseekOCRLoader, self).get_model(model_dir, *args, **kwargs)

patch_output_clone(model.model.embed_tokens)

patch_output_to_input_device(model.model.sam_model)

patch_output_to_input_device(getattr(model.model, self.visual_name))

patch_output_to_input_device(model.model.projector)

patch_output_to_input_device(model.model)

model = super().get_model(model_dir, *args, **kwargs)

patch_output_to_input_device(model.model)

gemini-code-assist Bot reviewed Jun 25, 2026

View reviewed changes

z0o0ey added 2 commits June 26, 2026 10:36

add:model support documentation

68d4c24

Revise:Correction of issues

e85659b

Jintao-Huang approved these changes Jun 26, 2026

View reviewed changes

z0o0ey added 3 commits June 26, 2026 17:04

fix: remove duplicate torch import in deepseek.py

b259d6f

add:test case

1c5498b

add:test case（Corrected comments）

a1ebfc8

Jintao-Huang merged commit 0c3e6ea into modelscope:main Jun 26, 2026
2 of 3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[model] support Unlimited_OCR#9645

[model] support Unlimited_OCR#9645
Jintao-Huang merged 6 commits into
modelscope:mainfrom
z0o0ey:support_unlimited_ocr

z0o0ey commented Jun 25, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 25, 2026

Uh oh!

gemini-code-assist Bot Jun 25, 2026

Uh oh!

gemini-code-assist Bot Jun 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

z0o0ey commented Jun 25, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants