How to Use
- Select a PDF: choose a local
.pdf. - Pick language:
jpn/eng/jpn+eng. - Set DPI:
150 / 200 / 300(higher = better accuracy, more time/memory). - Page ranges (optional):
1-3,7(1-based, comma-separated, inclusive). Empty = all pages. - Run OCR: click Start OCR; progress appears in the log.
- Save result: click Save TXT to download
pdf-ocr.txt(plain text with page headers).
Outputs / Results
- Extracted text file (.txt) for the selected pages
- Per-page preview canvas plus the first 800 characters snippet
- Log info (total pages, selected pages, progress, completion)
[Completely Free] Utility Tools & Work Support Tools
You can use practical tools like CSV formatting, PDF conversion, and ZIP renaming entirely in your browser, all for free. Each tool page clearly explains “How to use it”, “What the output looks like”, and “Important notes & caveats”, so even first-time users can start without confusion.
Notes / Caveats
- Client-side only: no uploads; everything runs in your browser.
- Dependencies: loads
pdf.js 3.11.174andtesseract.js 5.0.4from CDNs; blocked/offline environments will fail. - Accuracy tips:
- Use 200–300 DPI; prefer 300 for small fonts/low-quality scans.
- Skewed/low-contrast pages increase errors; better source quality helps.
- Layout: no layout retention; vertical Japanese, ruby, columns may yield jumbled order/line breaks.
- Limitations:
- Password-protected PDFs not supported (no password prompt).
- Figures/equations remain images; only textual pixels are OCR’d.
- Very large/high-res PDFs may hit browser memory limits—process in parts.
- First-run delay: language models load on first use; subsequent runs are faster due to caching.
- Shadow DOM isolation: site-level CSS/JS won’t affect the widget.
