Step1: Optimizing current ETL process
This involved multiple tasks including
- Re-Engineering of the existing ETL Job to Fix Pefomance issues
This process includes resolving performance issues, optimizing data transformations, improving data quality checks, and ensuring scalability and fault-tolerance. Oneture team followed standard data engineering principles, avoided redundant storage, and identified potential issues to make the ETL process more efficient. Best practices for data modeling, storage, and processing were used to handle large volumes of data. We helped client by migrating failed ADO.NET based Exe loader jobs to standard and reliable tool using SSIS.
- Identify and Fix Data quality issues
After consulting with business stakeholders, our developers identified and corrected existing data quality issues. The data quality issues were analyzed using data profiling tools. The root causes of the data quality issues were identified, and data quality controls were implemented to prevent data quality issues from recurring. Moreover, the data quality controls should be scalable and able to handle large data volumes.
- RPA Bots to automate manual download and upload of files from data sources
This involved using RPA tools to automate data retrieval and updating processes, ensuring that the data in excel sheets is up-to-date and accurate leveraging multiple techniques including scraping from websites using selenium, API integration with data source provider, JDBC based database intergration etc. RPA bots also helped in fetching data from the DB using SQL Query and populate the excel sheet with the latest data to avoid manual intervention.
- Reporting Queries
This involves working with SQL and database management systems to optimize queries and improve data retrieval performance. We also need to understand the client's data model and business requirements to design effective and efficient queries. Oneture team worked with client to make complex queries joining multiple tables to create final View queries that give a unified view of data, which has significant business value with one click using power query feature of excel to tackle business problem for short term by providing required reporting.
Step2: Design and develop a real-time interactive dashboard to enable data visualization and analysis
Oneture is working with client in providing an alternative approach of current Excel based visualization to support near real-time updates. Client also requires role and access management with AD connector support. Oneture team divided the work with following tasks:
- Proof Of Concept on Visualization Dashboard using PowerBI connecting with UAT SQL server database and evaluating PowerBI licence and pricing as per usage requirements
- PoC on other popular analytics tools like Apache Superset, Redash and Grafana (common for application monitoring but well known for real-time refresh)
- Comparing pricing, features, compatibility, performance and long term feasibility to shortlist the right tool for usecase
- AD connector integration testing with shortlisted dashboard tool
- Role and Access Control feature to restrict data access for users
- Creating required views of 40+ reports of current Excel based approach on Finalized BI tool
- Scheduler to export and send Excel/CSV report from dashboard to relevant stakeholders on email. Oneture will leverage AWS S3, EC2, Quicksight, AD connector, VPC and Direct Connect for the purpose of this initiative
Step3: Migrate the existing data to a cloud-based Data Lakehouse
As part of long term plan, Oneture team will be working with client which involves moving the existing data warehouse and reporting components to the cloud and adopting a lakehouse architecture, which combines the benefits of a data lake and a data warehouse. This migration is aimed at achieving benefits such as better scalability, cost-efficiency, improved data access, and more flexibility in data processing.
The modernization initiative is an ongoing process, and our company continues to work with the client to identify areas for improvement and optimization. Our team is currently working on optimizing the ETL process and creating realtime BI dashboard to ensure smooth data flow processing and visualization. Later we will be moving on migrating to data lakehouse architecture. We are also exploring the use of machine learning algorithms to enable predictive analytics and automate data processing.