ModelDocument AI models

Unlimited OCR

Unlimited OCR is Baidu's open model and code release for one-shot long-horizon OCR and document parsing across images, multi-page inputs, and PDF-style workflows.

Why it matters

Long documents are still one of the hardest inputs for AI workflows. Unlimited OCR matters because it connects OCR, PDF parsing, document-to-knowledge-base ingestion, and multimodal model serving into a concrete open release with repository, paper, model, and demo surfaces.

Source-backed summary

The official Baidu repository links the GitHub code, Hugging Face model, arXiv paper, demo space, and inference paths for Transformers, vLLM, and SGLang. Hacker News discussion shows strong current demand around long-document OCR quality, layout handling, and how such models fit into document AI pipelines.

Primary use cases

Parse long document images and multi-page inputs before LLM ingestion.
Evaluate PDF-to-knowledge-base and document-automation workflows.
Compare OCR model serving through Transformers, vLLM, or SGLang.
Plan domain-specific extraction for contracts, invoices, papers, or forms.

What the official sources provide

Baidu publishes Unlimited OCR as a repository with model links, paper link, demo link, and inference examples. The README separates single-image and multi-page or PDF-style parsing paths and documents several serving options.

Model surface: Hugging Face hosts the Baidu Unlimited OCR model.
Research surface: the arXiv paper frames the one-shot long-horizon parsing approach.
Serving surface: the README includes Transformers, vLLM, and SGLang inference sections.

Reader decision path

A reader usually wants to know whether this is a normal OCR library, a vision-language document model, or an API-ready document parser. The safest answer is that the public evidence supports an open model and code release, while production adoption still needs task-specific checks on language coverage, table extraction, scanned PDFs, latency, GPU memory, and licensing.

Evidence boundary

Use Baidu repository, paper, model card, and demo surfaces for factual claims. Use Hacker News for adoption questions and skepticism only; community comments should not be used as benchmark proof or capability claims.

Related concepts

AI Model API

Model serving choices still need provider IDs, deployment fields, cost, and limits before catalog use.

Agent Harness

Document agents need OCR and parsing tools before they can reason over long files reliably.

Sources

Source confidence

official-docs

Unlimited OCR GitHub repository

GitHub / baidu

official-docs

Unlimited OCR model card

Hugging Face / Baidu

arXiv

Hugging Face Spaces / Baidu

kol-community

Hacker News Unlimited OCR discussion

Hacker News

Unlimited OCR FAQ

Page-level questions for Unlimited OCR.

Is Unlimited OCR only an OCR library?+

No. The official sources position it as an open model and code release for long-horizon document parsing, with OCR as the core job and model-serving paths for multimodal document workflows.

Can Unlimited OCR parse PDFs?+

The README includes a multi-page and PDF-style path that converts pages to images before parsing. Production users should still test their own PDFs, languages, tables, scans, and layout types before relying on it.

Why is Unlimited OCR not a `/models` record yet?+

It has an official model card, but a GetLLMs model-directory record also needs stable catalog fields such as provider-facing model ID, IO schema, pricing or deployment notes, and examples in the project data shape. This batch uses an entity page first.