×
Case Studies Blog Careers Technologies
Data Warehouse Modernisation for a Large Broking and Financial Services enterprise: Migrating from Excel-based MIS system to a Data Lakehouse Platform with a Real-time Interactive Dashboard
Industry
Capital Markets
Technologies
Python, Microsoft .Net, SQL Server, AWS, Databricks
About Client

The client is a diversified and well-integrated financial services enterprise providing full range of top quality Trading Services, Asset Management, Wealth Management, Investment Banking, Investment, Advisory Services, Proprietary Trading, Life Insurance and Brokerage to a diverse range of institutional investors.

Problem Statement

The client was using an Excel based reporting system as visualization tool for their business analytics purpose. The system had several limitations, including: 

  • Lack of auto real-time refresh
  • Manual and time-consuming data processing
  • Poor performance and slow reporting
  • Limited data analysis capabilities

The client’s reporting system relied on IT teams to manually Extract, Transform and Load (ETL) data from various sources into a SQL Server database. The data was then queried and processed to generate Excel based reports. This process was time consuming and prone to errors, leading to delays in data availability for decision making.

Client's existing ETL pipeline was as shown below:

 

 

Oneture's Role

The client recognized that their reporting system needed to be modernized to enable faster, more efficient data processing and analysis. Oneture helped client in assessment of existing system and proposing recommendation on implementation plan. Post assessment of existing ETL pipeline, Oneture discovered that the problem goes beyond replacing Excel based reporting with BI dashboards. The pipeline also has manual intervention in pulling data from data sources, multiple points of failure, limited automation and no capabilities of auto restart or self healing in event of failure.

Oneture proposed a modernization strategy that involved automating and optimizing current ETL process, developing BI dashboard and migrating the client’s onprem MIS reporting system to cloud based Data Lakehouse with a real-time interactive dashboard. The Data Lakehouse approach will enable the client to store all their data in one place, providing a single source of truth for all data queries. It will also provide scalability and flexibility to handle the client’s future data needs.

To maintain business continuity during modernization journey and prioritizing business outcome, the proposed modernization strategy involved the following steps:

  • Optimize the current onprem ETL process to ensure smooth data flow and processing
  • Design and develop a real-time interactive dashboard to enable data visualization and analysis
  • Migrate the existing data to a cloud-based Data Lakehouse

The modernization strategy will also leverage Databricks and AWS cloud services to enable scalability and reduce infrastructure costs.

Solution

Step1: Optimizing current ETL process

 This involved multiple tasks including

  1. Re-Engineering of the existing ETL Job to Fix Pefomance issues

This process includes resolving performance issues, optimizing data transformations, improving data quality checks, and ensuring scalability and fault-tolerance. Oneture team followed standard data engineering principles, avoided redundant storage, and identified potential issues to make the ETL process more efficient. Best practices for data modeling, storage, and processing were used to handle large volumes of data. We helped client by migrating failed ADO.NET based Exe loader jobs to standard and reliable tool using SSIS.

  1. Identify and Fix Data quality issues

After consulting with business stakeholders, our developers identified and corrected existing data quality issues. The data quality issues were analyzed using data profiling tools. The root causes of the data quality issues were identified, and data quality controls were implemented to prevent data quality issues from recurring. Moreover, the data quality controls should be scalable and able to handle large data volumes.

  1. RPA Bots to automate manual download and upload of files from data sources

This involved using RPA tools to automate data retrieval and updating processes, ensuring that the data in excel sheets is up-to-date and accurate leveraging multiple techniques including scraping from websites using selenium, API integration with data source provider, JDBC based database intergration etc. RPA bots also helped in fetching data from the DB using SQL Query and populate the excel sheet with the latest data to avoid manual intervention.

  1. Reporting Queries

This involves working with SQL and database management systems to optimize queries and improve data retrieval performance. We also need to understand the client's data model and business requirements to design effective and efficient queries. Oneture team worked with client to make complex queries joining multiple tables to create final View queries that give a unified view of data, which has significant business value with one click using power query feature of excel to tackle business problem for short term by providing required reporting.

 

Step2: Design and develop a real-time interactive dashboard to enable data visualization and analysis

Oneture is working with client in providing an alternative approach of current Excel based visualization to support near real-time updates. Client also requires role and access management with AD connector support. Oneture team divided the work with following tasks:

  1. Proof Of Concept on Visualization Dashboard using PowerBI connecting with UAT SQL server database and evaluating PowerBI licence and pricing as per usage requirements
  2. PoC on other popular analytics tools like Apache Superset, Redash and Grafana (common for application monitoring but well known for real-time refresh)
  3. Comparing pricing, features, compatibility, performance and long term feasibility to shortlist the right tool for usecase
  4. AD connector integration testing with shortlisted dashboard tool
  5. Role and Access Control feature to restrict data access for users
  6. Creating required views of 40+ reports of current Excel based approach on Finalized BI tool
  7. Scheduler to export and send Excel/CSV report from dashboard to relevant stakeholders on email. Oneture will leverage AWS S3, EC2, Quicksight, AD connector, VPC and Direct Connect for the purpose of this initiative

 

Step3:  Migrate the existing data to a cloud-based Data Lakehouse

As part of long term plan, Oneture team will be working with client which involves moving the existing data warehouse and reporting components to the cloud and adopting a lakehouse architecture, which combines the benefits of a data lake and a data warehouse. This migration is aimed at achieving benefits such as better scalability, cost-efficiency, improved data access, and more flexibility in data processing.

 

Future Plans

The modernization initiative is an ongoing process, and our company continues to work with the client to identify areas for improvement and optimization. Our team is currently working on optimizing the ETL process and creating realtime BI dashboard  to ensure smooth data flow processing and visualization. Later we will be moving on migrating to data lakehouse architecture. We are also exploring the use of machine learning algorithms to enable predictive analytics and automate data processing.

Technologies
  • Backend : Python, Selenium,SQL Server, .Net, SSIS, Grafana, PowerBI, Superset, Redash, Quicksight
  • Cloud : AWS S3, Kinesis, Glue, Athena, Lambda, AD Connect, EC2
Value Delivered
  • Real-time data availability for decision making
  • Faster data processing and reporting
  • Improved data analysis capabilities
  • Scalability and flexibility to handle future data needs at exponential volume
  • Reduced infrastructure costs due to cloud-based technology
  • The realtime interactive dashboard provided the client with a powerful tool for data visualization and analysis, enabling them to make data driven decisions quickly and efficiently.
  • The improved data analysis capabilities also enabled the client to identify new business opportunities and optimize their operations.