
As businesses continue to digitize their workflows, the demand for intelligent document processing (IDP) has skyrocketed. Organizations—from banks handling loan applications to healthcare providers managing patient records—are turning to AI to extract structured data from unstructured documents. Done well, this can save countless hours of manual labor, reduce errors, and unlock valuable insights buried in PDFs, scanned images, and forms.
But building an AI-driven document processing system isn't as simple as pointing an OCR engine at a file and calling it a day. Real-world documents are messy. They vary in layout, language, and formatting. They include handwritten notes, checkboxes, low-resolution scans—and sometimes even coffee stains.
At Inellipse, we faced these challenges firsthand while developing a custom IDP solution for a client in the Mortgage industry. We quickly realized that off-the-shelf tools weren't enough. We had to engineer custom approaches, combining practical experimentation with pipelines tailored to messy, real-world data.
Two of the most persistent challenges we tackled were:
These issues might seem minor at first glance, but solving them was critical to the overall accuracy and usability of the system. Below, we share how we approached each challenge, what didn't work, what eventually did, and the key lessons we took away.
At first glance, detecting a checkbox in a document seems simple: just look at the square and check if there's a mark inside. But when we started working with real-world documents—scanned PDFs and low-resolution images—we quickly realized it was far from trivial.
Checkboxes are small, often faint, and their appearance varies widely:
These variations make traditional image processing techniques brittle and unreliable.
Our team at Inellipse started by testing some common OpenCV-based techniques:
Thresholding + contour detection: Worked in clean documents, but failed when marks were faint or partially cut off.
Pixel density checks: We measured the percentage of dark pixels in the checkbox area. Unfortunately, this approach produced too many false positives, especially when noise or text artifacts were present nearby.

After considerable testing, we settled on a robust, multi-step approach that combined precise region isolation, preprocessing, and multiple detection methods. Here's what our improved pipeline looked like:
Step 1: Crop the Checkbox Region
Step 2: Apply Preprocessing Filters
Step 3: Analyze for Markings
We combined multiple strategies for maximum reliability:
This pipeline was surprisingly robust: it delivered consistent results even on low-quality scans and was layout-agnostic—once the crop zones were defined, we didn't need to retrain models for each form variation.
Once the document's visual elements were processed, the next challenge was extracting meaningful information from the text using a language model.
We learned quickly that vague or generic prompts led to poor results. For example, asking, "Is this checkbox selected?" without specifying what "this" refers to, often confused the model or returned inconsistent answers.
We realized that the quality of answers depended heavily on the clarity and specificity of our questions . To improve results, we redesigned our approach around a few key principles:
Provide explicit context – Instead of asking general questions, we gave the model clear instructions about where in the document we were focusing. For example: "In the section labeled 'Medical History,' is the checkbox for 'Diabetes' selected?" This left no room for confusion.
Define the expected answer format – Ambiguity in answers can be just as problematic as ambiguity in questions. By telling the model exactly how we wanted the response structured (e.g., "Answer only with YES or NO"), we eliminated guesswork and ensured consistency across documents.
Iterate and refine through testing – Prompt design wasn't a one-time task. For every new document type, we created multiple prompt variations, tested them against real samples, and refined the language until we achieved reliable results.
By following this process, we turned the AI from a generic assistant into a domain-specific extractor that could reliably process the messy, unpredictable language of real-world documents in the mortgage domain.
Memory leaks are the silent killers of Android apps. Your app works perfectly in testing...
As businesses continue to digitize their workflows, the demand for intelligent document processing...
At Inellipse, we've been closely following everything that came out of the Roku Developer Summit 2025 and it...