Document Processing and OCR Explained

A primer on how Casefleet prepares uploaded sources for review.

Meg Hall avatar
Written by Meg Hall
Updated over a week ago

When you add a source to your case in Casefleet, the document goes through two steps: (1) Uploading the original file into Casefleet, and (2) processing for viewing in the document reviewer.

This article covers:

Phase 1: Document Upload

In Casefleet, you can add documents to your case on the Sources and Case Home tabs. To learn more about uploading documents as sources and for a list of file types supported by our document reviewer, click here.

NOTE: The following are the maximum file sizes for an individual file upload -

Once the file has been successfully uploaded to Casefleet's servers, you will see the "Completed!" message in the pop-up display.

NOTE: Document upload speeds in Phase 1 depend on your local internet connection.

Duplicate Detection

When uploading sources, Casefleet will automatically notify you if a particular document already exists in the case. Duplicates are identified via an MD5 hash, which is unique to every document.

When a duplicate is detected, you still have the option to upload the duplicated file to your case. However, Casefleet recommends only uploading one of each unique file.

NOTE: You can view a document's MD5 hash under the metadata section of the source detail page. For more information about the MD5, read this blog post.

Phase 2: Document Processing

Once the document is uploaded, Casefleet's processor prepares your document for use in our Document Reviewer, including:

  • Converting the document to a format that can be reviewed from any device

  • Extracting all relevant metadata

  • Performing OCR, if required (see below)

  • Extracting all text and making it fully searchable

All of these steps are performed in a way that still preserves the original document, in case you need to re-download it later. Because of this, the processing phase can take several minutes to an hour or so to complete (depending on file type, document size, and the current processing load on our servers).

After uploading a document to your case, you will see more granular status updates as we prepare each document for review within Casefleet. These fine-grained status updates include:

  • Queued for processing

  • Scanning for security risks

  • Extracting metadata

A progress bar will be displayed next to the source so you can see its status. Once processing is complete, you will see the "Launch Reviewer" button.

NOTE: If it appears that a document is "stuck" at the processing step for an extended period of time (more than an hour or two), feel free to contact us via the in-app support chat or email for assistance.

Text Recognition (OCR)

During the processing phase, Casefleet checks to see if a document requires optical character recognition (OCR). OCR converts text found in images and scanned documents into searchable data. In Casefleet, this enables you to highlight text in the document reviewer or use the full-text search functionality.

Learn more about Casefleet's recent OCR upgrade in this blog post.

If a file already has a lot of detectable text, the system assumes it has already been OCR'd and skips this step. OCR is not 100% accurate, so if the document provided to Casefleet contains usable data, we default to leaving that in place.

For documents where OCR is skipped during the initial processing phase, you can manually reprocess the document using Casefleet's OCR engine. This often resolves any display issues or difficulties with highlighting text in the document reviewer. For more information, click here.

Did this answer your question?