Microsoft Modern Datawarehouse Architecture #Meetup – Les
22 Slides6.23 MB
Microsoft Modern Datawarehouse Architecture #Meetup - Les gentils développeurs Data Platform Sauget Charles-Henri - MVP Data Platform Ben Zahra Anouar – MVP AI 28/12/2019
SAUGET Charles-Henri Consultant Data Platform depuis 2009 MAIL TWITTER [email protected] @SaugetCh GITHUB BLOG https://github.com/chsauget www.sauget-ch.fr WWW.INSIDERS.COOP
Anouar BEN ZAHRA Consultant Data Platform MAIL TWITTER [email protected] @Anouarbenzahra GITHUB NUGET PACKAGE https://github.com/AnouarBe https://www.nuget.org/pro nZahra files/Anouar WWW.INSIDERS.COOP
Data Evolution
Data Abundance Variety - Support large types of datas: Structured (Tables) Semi-Structured (Json) Unstructured (Images) Volume Small data ( 10gb) Medium range data (10gb to 1 tb) Big Data ( 1tb to hundred of petabyte) Velocity Capacity to handle a large throughput (Gb/s ?) Elasticity of the architectures
Structured vs unstructured Data
On-Premise vs Cloud Computing Environmen t Licensing Model Maintainabilit y Scalability Availability
Data Warehousing evolution
Data Engineering job responsibilities New skills for new platforms The technology changed So does the job! Changing loading approaches From implementin g to provisioning
Modern Data Warehouse Architecture Separation of Storage & Compute https://azure.microsoft.com/fr-fr/solutions/architecture/modern-data-warehouse/
DEMO – Modern DWH Lake gen2 Comments.xml (20 go) Lake gen2 Parquet Files Azure Synapse Power BI Azure AS Users.xml (3 go) Azure Data Factory
Our demo architecture When you need a low cost, high throughput data store. When you need to store No-SQL data. When you do not need to query the data directly. No ad hoc query support. Suits the storage of archive or relatively static data. Suits acting as a HDInsight Hadoop data store. When you need a low cost, high throughput data store. Unlimited storage for No-SQL data When you do not need to query the data directly. No ad hoc query support. Suits the storage of archive or relatively static data. Suits acting as a Databricks , HDInsight and IoT data store. Eases the deployment of a Spark based cluster. Enables the fastest processing of Machine Learning solutions. Enables collaboration between data engineers and data scientists. Provides tight enterprise security integration with Azure Active Directory Integration with other Azure Services and Power BI. When When When When When you you you you you require a relational data store. need to manage transactional workloads need to manage a high volume on inserts and reads need a service that requires high concurrency require a solution that can scale elastically When When When When When you you you you you require a relational data store. need to manage analytical workloads need low cost storage. require the ability to pause and restart the compute. require a solution that can scale elastically
Our demo architecture When you require a fully managed event processing engine. When you require temporal analysis of streaming data. Support for analyzing IoT streaming data. Support for analyzing application data through Event Hubs. Ease of use with a Stream Analytics Query Language. When you want to orchestrate the batch movement of data. When you want to connect to wide range of data platforms. When you want to transform or enrich the data in movement. When you want to integrate with SSIS packages. Enables verbose logging of data processing activities. When you require documentation of your data stores. When you require a multi user approach to documentation. When you need to annotate data sources with descriptive metadata. A fully managed cloud service whose users can discover the data sources. When you require a solution that can help business users understand their data.
DEMO –Streaming Lake gen2 Lake gen2 Comments.xml (20 go) Parquet Files Azure Synapse Power BI Azure AS Users.xml (3 go) Avro Files Twitter data #Azure #SynapseAnalytics Event Hub Stream Analytics Azure Data Factory Power BI Stream
DEMO –Streaming Lake gen2 Power BI Direct Query Comments.xml (20 go) Lake gen2 Parquet Files Azure Synapse Power BI Azure AS Users.xml (3 go) Avro Files Twitter data #Azure #SynapseAnalytics Event Hub Stream Analytics Azure Data Factory Power BI Stream
Power BI Premium vs Azure AS ? https://powerbi.microsoft.com/en-us/blog/power-bi-premium-and-azure-analysis-services/
Synapse future Data Develop Orchestrate Monitor
SQL 2019 – Big Data Cluster Same capabilities than previous architecture but on-premise!
SQL 2019 – Big Data Cluster Supported by K8S infrastructure!
CI/CD