Technology
GOT-OCR 2
A 580M-parameter General OCR Theory model that unifies plain text, math formulas, chemical equations, and geometric shapes into a single end-to-end vision encoder.
GOT-OCR 2.0 (General OCR Theory) replaces specialized recognition pipelines with a unified 580M-parameter architecture. It handles diverse inputs: multi-page documents, sheet music, and complex LaTeX formulas. By utilizing a high-resolution vision encoder (1024x1024) and a linear decoder, it achieves state-of-the-art results on the OCRBench benchmark. The system supports localized 'crop' OCR and formatted output like Markdown or TikZ, making it a versatile tool for digitizing structured data from static images.
Related technologies
Recent Talks & Demos
Showing 1-1 of 1