End-to-end intelligent document processing and data extraction solution using combination of low cost open source tools and advanced ML techniques
Data extration automation 2

Client

Our client, a well-recognized global Fintech company, processes large volumes of data from financial documents often placed through an agent under specified conditions of sale, which they receive from external agencies as an email attachment. There are 20+ agencies and each agency emails data in 12-15 different file formats. The client eventually receives 20 - 25 thousand emails/documents daily.


Objectives

  • Simplify the manual process, enable analysis, and classification of such documents through digitization and extract the data they contain
  • Work across documents of different sizes, layouts, formats at scale
  • Integrate the extracted information with backend platforms on real-time basis with relevant audit logs
  • Ensure accurate transaction per minute processing performance to handle tens of thousands of documents

Solution

We built a custom solution that could extract data from documents of various sizes, layouts and formats at scale. 

  • Developed an independent microservice to read email and fetch attached documents at scale 
  • Leveraged multiple technologies including Python, Adobe, Pandas, Numpy and AWS Textract to address complexity and improve readability of documents and data 
  • Built run-time intelligence to minimize usage of high-cost components and make it cost-effective

Outcomes

The custom-built solutions helped the client meet all processing SLAs with business-ready information on a day-to-day basis resulting in significant reconciliation accuracy as well as huge cost and time savings; with negligible human intervention thereby reducing human errors and improving overall performance of the company