Practical Approaches to Document Processing with AI: Challenges and Solutions

As businesses continue to digitize their workflows, the demand for intelligent document processing (IDP) has skyrocketed. Organizations—from banks handling loan applications to healthcare providers managing patient records—are turning to AI to extract structured data from unstructured documents. Done well, this can save countless hours of manual labor, reduce errors, and unlock valuable insights buried in PDFs, scanned images, and forms.

But building an AI-driven document processing system isn't as simple as pointing an OCR engine at a file and calling it a day. Real-world documents are messy. They vary in layout, language, and formatting. They include handwritten notes, checkboxes, low-resolution scans—and sometimes even coffee stains.

Building Real Solutions for Real Problems

At Inellipse, we faced these challenges firsthand while developing a custom IDP solution for a client in the Mortgage industry. We quickly realized that off-the-shelf tools weren't enough. We had to engineer custom approaches, combining practical experimentation with pipelines tailored to messy, real-world data.

Two of the most persistent challenges we tackled were:

  1. Detecting checkboxes and accurately determining if they were selected or not, even in noisy, scanned documents with inconsistent markings.
  2. Designing effective AI prompts to extract exactly the data we needed from the document text—without relying on fragile keyword searches or guesswork.

These issues might seem minor at first glance, but solving them was critical to the overall accuracy and usability of the system. Below, we share how we approached each challenge, what didn't work, what eventually did, and the key lessons we took away.

1. Checkbox Detection: A Surprisingly Complex Problem

At first glance, detecting a checkbox in a document seems simple: just look at the square and check if there's a mark inside. But when we started working with real-world documents—scanned PDFs and low-resolution images—we quickly realized it was far from trivial.

Why It's Hard

Checkboxes are small, often faint, and their appearance varies widely:

  1. Some are filled with a checkmark (✓), others with an "X", a dot, or a scribble.
  2. The checkbox may not be a perfect square—sometimes it's just a faint outline.
  3. Scanning quality varies, introducing contrast issues and smudging.
  4. Users may tick slightly outside the box or make unclear marks nearby.

These variations make traditional image processing techniques brittle and unreliable.

What Didn't Work

Our team at Inellipse started by testing some common OpenCV-based techniques:

Thresholding + contour detection: Worked in clean documents, but failed when marks were faint or partially cut off.

Pixel density checks: We measured the percentage of dark pixels in the checkbox area. Unfortunately, this approach produced too many false positives, especially when noise or text artifacts were present nearby.

What Worked Better

After considerable testing, we settled on a robust, multi-step approach that combined precise region isolation, preprocessing, and multiple detection methods. Here's what our improved pipeline looked like:

Step 1: Crop the Checkbox Region

  1. Using document templates or layout analysis, we defined exact bounding boxes for each checkbox.
  2. By isolating only the checkbox area, we drastically reduced background noise and ensured our detection logic focused on the relevant region.

Step 2: Apply Preprocessing Filters

  1. Grayscale conversion simplified the image, removing unnecessary color information.
  2. Adaptive thresholding allowed us to handle uneven lighting and highlight marks, making faint or inconsistent ticks more detectable.

Step 3: Analyze for Markings

We combined multiple strategies for maximum reliability:

  1. Count non-white pixels – A simple measure of dark pixels gave a strong signal of whether a checkbox was marked.
  2. Detect contours or shapes – By identifying edges and geometric patterns, we could distinguish intentional marks from stray noise.

This pipeline was surprisingly robust: it delivered consistent results even on low-quality scans and was layout-agnostic—once the crop zones were defined, we didn't need to retrain models for each form variation.

2. Writing AI Prompts: Getting the Right Answers Requires the Right Questions

Once the document's visual elements were processed, the next challenge was extracting meaningful information from the text using a language model.

We learned quickly that vague or generic prompts led to poor results. For example, asking, "Is this checkbox selected?" without specifying what "this" refers to, often confused the model or returned inconsistent answers.

Our Solution: Clear, Context-Rich Prompts

We realized that the quality of answers depended heavily on the clarity and specificity of our questions . To improve results, we redesigned our approach around a few key principles:

Provide explicit context – Instead of asking general questions, we gave the model clear instructions about where in the document we were focusing. For example: "In the section labeled 'Medical History,' is the checkbox for 'Diabetes' selected?" This left no room for confusion.

Define the expected answer format – Ambiguity in answers can be just as problematic as ambiguity in questions. By telling the model exactly how we wanted the response structured (e.g., "Answer only with YES or NO"), we eliminated guesswork and ensured consistency across documents.

Iterate and refine through testing – Prompt design wasn't a one-time task. For every new document type, we created multiple prompt variations, tested them against real samples, and refined the language until we achieved reliable results.

By following this process, we turned the AI from a generic assistant into a domain-specific extractor that could reliably process the messy, unpredictable language of real-world documents in the mortgage domain.

Got an idea? We can take it further.

Scroll