Client is a Global Appliance Maker
Our Client wanted to put in place enterprise wide initiative to become a data-driven organization to put data as a strategic asset and then building capabilities to put that asset to use not just for big decisions but also for everyday action on the frontline.
The aim is to measure 30+ enterprise-wide metrics like brand value, consumer habits, adoption of IoT, the carbon footprint of plants and appliances, etc. Metrics were either to be generated on a monthly or yearly basis, these metrics were then supposed to be used to present a clear overall picture of the company’s operation as well as to present information for decision-making.
This project is meant to accomplish the following
The Oneture Team’s involvement has been in the backend engineering segment of the project. Essentially, this involves the handling of data from its source to being stored into Synapse tables. The creation of the Synapse views and Power BI reports falls under the scope of another team.
The process has involved a great deal of consulting with the team’s project managers, data source owners and metric stakeholders to establish what the scope of each metric is, what data is available from the source to develop the metric, and to alter metric definitions based on data availability constraints. In addition to developing the Databricks preprocessing code for most of the metrics, the team is also involved in monitoring data quality issues coming from various sources – such as inconsistencies in how certain columns are recorded, in data types, missing data etc. We also coordinate with the Platform team to resolve data access issues and to ensure effective code deployment.
All the metrics were to follow this general structure, from raw data, to Datalake preprocessing, to being stored in a SQL table, to metric-specific views being built on these tables, to these views being visualized through Power BI Reports. This project has been developed with Microsoft Azure and its associated tools.
To understand this flow in detail, let us consider the end-to-end flow of an individual metric that counts the monthly number of newly-onboarded IoT enabled devices. In this case, the raw data would need to record the onboarding of each IoT appliance. This data comes from three different sources. In the datalake layer, the onboarding data for these appliances is aggregated from the three different sources. In successive layers, the data is further preprocessed using Databricks notebooks. In this case, that would involve getting monthly counts for onboarded data according to certain other parameters. The end result is stored in the final layer of the Datalake. From there this data is copied to a Synapse SQL table. There may not be a specific SQL table for each metric, for this metric, the data is copied to a general SQL table. The tables in Synapse are then used to make views for the respective metrics. These views are specific to the metric, unlike the tables. The metrics are finally displayed in the form of reports on a Power BI dashboard.
This initiative has been developed with Microsoft Azure and its associated tools specifically:
Most of the obstacles facing this project come from the wideness of its scope. This project involves data coming from all parts of Client’s platform and has involved a great deal of coordination between multiple stakeholders.
One of the key challenges facing this project is the lack of homogeneity in the way data is recorded across the platform. For example, appliance data can be recorded in different ways in different regions, as well as in different data sources. One of the most important tasks in this project has been standardizing product data, sales data, consumer data and other classes of data across different data sources and geographic region. This has involved numerous discussions with people from different teams across the company in order to arrive at a standardized way of representing data.
As of writing, roughly 70% of the metrics have been completed until the front-end stage and are now the corresponding reports are being developed in Power BI. The remaining metrics are still being developed in the back-end stage in terms of table definition, data aggregation, testing and deployment. In the coming months, as more metrics become available, the focus will shift towards the front-end and to helping the intended end users learn how to use the dashboard.