What is chunks.md?
Five OCR engines — PaddleOCR, SmolDocling, MangaOCR — running directly in your browser. One site, every model, zero uploads. Your device does the work.
Documents in, markdown out
Drop a PDF, photo, or scan into chunks.md and get structured markdown back. Text, headings, tables, code blocks — extracted and formatted, ready to paste into your workflow.
Everything runs in your browser. Your files never leave your device. No account required, no upload limits, no data collection.
Five engines, one interface
PaddlePaddle-OCRv5
The latest PaddleOCR model. Highest accuracy across 11 language packs. Best for most documents. 92 MB.
PaddlePaddle-OCRv3
Lightweight and fast at ~1 s/page. Just 11 MB. Good for quick extractions when speed matters more than precision.
SmolDocling 256M
IBM vision-language model that understands document layout. Outputs structured markdown with tables, formulas, and headings. English-only.
PaddleOCR-VL 1.5
Vision-language model combining NaViT encoder with ERNIE decoder. Understands visual context, tables, and page structure. Customizable prompts. INT8 and FP32 variants.
MangaOCR
TrOCR-based model specialized for Japanese text in manga and print. Handles vertical and horizontal text, furigana, and varied font styles. INT8 and FP32 variants.
Formats & languages
PDF, JPG, PNG, and WebP. Multi-page PDFs are processed page-by-page with real-time progress.
10+ language groups: English, Chinese, Japanese, Korean, Arabic, Hindi, Latin scripts, Cyrillic, Thai, Greek, Tamil, and Telugu.
Works offline
Models are cached locally in your browser after first use. Come back anytime — chunks.md works without an internet connection from your second visit onward.
One site to access them all
No hunting for separate tools. PaddleOCR, SmolDocling, MangaOCR — every engine in a single tab. Switch models instantly and compare results side-by-side.
From quick text extraction to full document understanding to Japanese manga — every model runs locally, privately, and for free.