Does chunks.md upload my files?

No. OCR runs entirely in your browser via WebAssembly and WebGPU. Files never leave your device.

Yes — free, no account, no upload limits. Models download once and are cached locally.

Which OCR model should I use?

PaddleOCR v5 for most documents, PDF Embed Text for PDFs that already contain text, MangaOCR for Japanese manga, and Unlimited-OCR (DeepSeek-based) for complex layouts on a WebGPU-capable device.

← Back

What is chunks.md?

Eight OCR models — PaddleOCR, SmolDocling, MangaOCR, DeepSeek-based Unlimited-OCR — running directly in your browser. One site, every model, zero uploads. Your device does the work.

Documents in, markdown out

Drop a PDF, photo, or scan into chunks.md and get structured markdown back. Text, headings, tables, code blocks — extracted and formatted, ready to paste into your workflow.

Everything runs in your browser. Your files never leave your device. No account required, no upload limits, no data collection.

Eight models, one interface

PaddleOCR v5

The latest PaddleOCR model. Highest accuracy across 11 language packs. Best for most documents. 92 MB.

PaddleOCR v5 Mobile

Balanced accuracy at just 12 MB. The default on mobile devices.

PaddleOCR v3

Lightweight and fast at ~1 s/page. Just 11 MB. Good for quick extractions when speed matters more than precision.

SmolDocling 256M

IBM vision-language model that understands document layout. Outputs structured markdown with tables, formulas, and headings. English-only.

PaddleOCR-VL 1.5

Vision-language model combining NaViT encoder with ERNIE decoder. Understands visual context, tables, and page structure. Customizable prompts. INT8 and FP32 variants.

Unlimited-OCR (DeepSeek-based)

DeepEncoder-architecture vision model with WebGPU acceleration. Layout-aware, multilingual including Japanese. Needs a WebGPU-capable device.

MangaOCR

TrOCR-based model specialized for Japanese text in manga and print. Handles vertical and horizontal text, furigana, and varied font styles. INT8 and FP32 variants.

PDF Embed Text

Instant extraction of text already embedded in a PDF — no OCR pass at all. The fastest path for digital-native PDFs.

Formats & languages

PDF, JPG, PNG, and WebP. Multi-page PDFs are processed page-by-page with real-time progress.

10+ language groups: English, Chinese, Japanese, Korean, Arabic, Hindi, Latin scripts, Cyrillic, Thai, Greek, Tamil, and Telugu.

Works offline

Models are cached locally in your browser after first use. Come back anytime — chunks.md works without an internet connection from your second visit onward.

One site to access them all

No hunting for separate tools. PaddleOCR, SmolDocling, MangaOCR — every model in a single tab. Switch models instantly and compare results side-by-side.

From quick text extraction to full document understanding to Japanese manga — every model runs locally, privately, and for free. The markdown you get back is ready to feed into AI assistants as clean, structured knowledge.

What is chunks.md?

Documents in, markdown out

Eight models, one interface

Formats & languages

Works offline

One site to access them all

Related reading

What is chunks.md?

Documents in, markdown out

Eight models, one interface

Formats & languages

Works offline

One site to access them all

Related reading