All the metrics were to follow this general structure, from raw data, to Datalake preprocessing, to being stored in a SQL table, to metric-specific views being built on these tables, to these views being visualized through Power BI Reports. This project has been developed with Microsoft Azure and its associated tools.
To understand this flow in detail, let us consider the end-to-end flow of an individual metric that counts the monthly number of newly-onboarded IoT enabled devices. In this case, the raw data would need to record the onboarding of each IoT appliance. This data comes from three different sources. In the datalake layer, the onboarding data for these appliances is aggregated from the three different sources. In successive layers, the data is further preprocessed using Databricks notebooks. In this case, that would involve getting monthly counts for onboarded data according to certain other parameters. The end result is stored in the final layer of the Datalake. From there this data is copied to a Synapse SQL table. There may not be a specific SQL table for each metric, for this metric, the data is copied to a general SQL table. The tables in Synapse are then used to make views for the respective metrics. These views are specific to the metric, unlike the tables. The metrics are finally displayed in the form of reports on a Power BI dashboard.
This initiative has been developed with Microsoft Azure and its associated tools specifically:
- Azure Datalake Server
- Azure Databricks
- Azure Data Factory
- Azure Sypase SQL Server
- Azure PowerBI