Skip to content

Add documentation about OCR testing #424

Description

@duckduckgrayduck

When reviewing #420 I realized to setup OCR in the local dev environment you need two things:

  • Tesseract data in the form of language files that you need to put in the directory documentcloud/documents/processing/ocr/tesseract/tessdata/

  • Run the Django management upload_languages which takes the language files stored in tessdata and puts them into ocr_languages bucket in Minio which can be accessed during runtime.

This isn't documented anywhere and we need to add a sample command for downloading a language file (curl probably as it is cross platform) and then running the upload_languages command into the local dev env instructions

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Fields

No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions