OCR and Text Recognition
OCR (optical character recognition) turns scanned pages into selectable, searchable text. Results depend on scan quality and language settings. OCR is best used on scanned PDFs where the text is part of an image instead of real, selectable text.
When to Use OCR
If you can already highlight and copy text from a PDF, you usually do not need OCR. Use OCR when the page is a scanned image or when text selection does not work.
What Improves Accuracy
- High contrast between text and background.
- Clean, straight scans without heavy skew.
- Images around 300 DPI or higher for small text.
- Using the correct OCR language pack.
Running OCR in Papr
Open your PDF, choose the OCR tool, and select the correct language. Papr processes the pages locally in your browser and adds searchable text to the document.
Language Packs
Papr downloads OCR language data the first time you use it. These files are cached in your browser so OCR can run offline after the initial download.
Common Limitations
- Handwriting and stylized fonts can reduce accuracy.
- Low-resolution scans may produce missing or incorrect characters.
- Tables and multi-column layouts can be harder to interpret.
Review the Output
OCR is not perfect. Always review the extracted text before using it in a final document, especially for legal or financial content.
After OCR, use search to verify key phrases and fix any errors with text annotations.