Microsoft Fabric E2E data analytics platform Sep-2023
Speaker Info Eiki Sui Fabric CAT(ex-Power BI CAT) Customer Advisory Team Microsoft Fabric Product Team Helping Fabric adoption worldwide Microsoft Fabric Data Factory Synapse Data Engineering Synapse Data Science Synapse Data Warehousing Synapse Real Time Analytics Power BI Data Activator
The world of data
A world awash with data…
How do we translate data into competitive advantage?
Microsoft Analytics Portfolio Data Factory Data Explorer Synapse DW Azure AI Power BI Purview Event Hub Synapse Spark The one thing all services have in common is ”they all process data” Azure Databricks
420K+ Organizations in 187 countries ~8 trillion messages per day 33+ exabytes = 33+ bn GB Data under management 1 Exabyte = 1,000 Petabytes 1 Petabyte = 1,000 Terabytes 1 Terabyte = 1,000 GB metrics for Microsoft Analytics workloads
5 Million Developers
Data handled varies by industry, company size, etc
Industry x Size ⚫ Industry group ⚫ Business size • Manufacturing & Wholesale • Large • Retail/Wholesale • Medium • Finance • Small • IT & Media • Energy & Resources • Other Combined ⚫ Division inside each group • Production Division • Sales Marketing & MD • Headquarters • Corporate Planning • Administration • Others Data exists for every combination of industry and business size
A wide variety of data types as well as data volumes
Data Sources Structured, semi-structured and unstructured data is everywhere
What is the point of making the most of data?
The significance of using data ⚫ Irrespective of one's role, the importance of leveraging data is evident ⚫ Opportunities for business growth – Securing a competitive edge in the top-line – Faster decision-making through BI utilization – Enhancing sales forecasts with ML and AI, etc ⚫ Monitoring the utilization of specific tools, such as log analysis – Elevating internal productivity within the organization. ⚫ Sharing information with external parties, including shareholders and financial institutions, etc Delivering value and benefits to users through data
Until Power BI comes • ~Until 2010: Enterprise BI dominated the world, centered around SSAS (MDX) • With Tableau's emergence, the world shifted towards self-service BI • 2010 onwards: Excel Power Pivot • July 2015~: Birth of Power BI Microsoft had been promoting Excel as a BI tool, but since 2015 it has been all-in on Power BI, a SaaS service
Power BI has become the No1 choice worldwide for BI due to its ease of use, robust features and strong compatibility with MS Office
However, demand for data analytics in business is seeking even more robust analytical tools
Investment in SaaS services that combine user-friendliness of selfservice with enterprise-level functionality
The introduction of Microsoft Fabric
“Microsoft Fabric will be our biggest data-related product announcement since SQL Server” Microsoft CEO Satya Nadella
All industries that utilize technology are at a crossroads
AI is causing a massive platform shift
Massive fragmentation of the modern data stack
Landscape in ML, AI, and data fields in 2023 With so many options for building E2E solutions, there is a high probability of data fragmentation
Every CDO, Every Enterprise ”Simplify, I am the CDO and don’t want to be the CIO” CDO: Chief Data Officer CIO: Chief Integration Officer
Announcing Microsoft Fabric The data platform for the era of AI
Solving the Modern Data Stack Woes Many types of data Fabric is lake-centric and stores all types of data Storage that can work with various workloads OneLake, OneDrive for data will solve all your problems handling data in ML and real-time analysis Fabric comes with workloads that support AI / ML and real-time analytics by default Visualize with BI Power BI can be used as it is No data duplication at all
Microsoft Fabric The data platform for the era of AI Data Factory Synapse Data Engineering Terms to be used - Workloads - Experiences - Products / Services Synapse Data Science Synapse Data Warehousing OneLake Synapse Real Time Analytics Power BI Data Activator Features inside workloads - Items(Power BI Dataset, etc) - Artifacts
Microsoft Fabric The data platform for the era of AI Transforming what was originally provided as PaaS on Azure into a SaaS model aligned with Power BI's user interface Data Factory Synapse Data Engineering Synapse Data Science Synapse Data Warehousing Synapse Real Time Analytics Data pipeline ETL ML DWH Solve the most complex data integration and ETL scenarios with cloud-scale data movement and ETL services Create a lakehouse and use Apache Spark to transform and prepare organizational data to be shared with the business Explore data, build ML models, and incorporate predictive analytics and classification into analytical solutions and applications Build a petabytescale, topperforming, secure, OneLake open data format SQL warehouse Real Time Analysis PaaS: Platform as a Service SaaS: Software as a Service Quickly ingest and convert any data source and format from 1GB to 1PB, execute queries, and visualize analysis results All in one SaaS Platform PBI remains as it is Power BI BI tool Use rich visuals to find analytical information, track progress, and make decisions faster New Feature Data Activator Data Monitoring & Action Enables business analysts to automatically drive action from data
Azure Synapse Analytics -FYI The data platform for the era of AI Azure Synapse Analytics is a multi-functional analytics platform that integrates data integration, data warehousing, and big data analytics Covers enterprise-grade data infrastructure, but being PaaS, requires knowledge and expertise in Azure
Microsoft Fabric The data platform for the era of AI Items Workloads Items
Microsoft Fabric The data platform for the era of AI Data Factory Synapse Data Engineering Synapse Data Science Synapse Data Warehousing Synapse Real Time Analytics Power BI OneLake Open, standard Delta Parquet format (Delta Lake format) for fast analytical storage Accessible from all Microsoft Fabric items (more to be explained later) Data Activator
Email from Bill Gates to CEO Satya This is an amazing release!! Just the integration they have done of all of our data products and what that does for simplification and performance is fantastic. Bill Gates Customers don’t have to think about the pieces like they had to before. Even without the AI this would be one of the best releases for customers we have done.
Microsoft Fabric The data platform for the era of AI Complete Analytics Platform Lake centric and open Empower Every Business User AI Powered Everything, unified OneLake Familiar and intuitive Copilot accelerated SaaS-ified One Copy Built into Microsoft 365 ChatGPT on your data Secured and governed Open at every tier Insight to action AI driven insights
Microsoft Fabric The data platform for the era of AI Complete Analytics Platform Lake centric and open Empower Every Business User AI Powered Everything, unified OneLake Familiar and intuitive Copilot accelerated SaaS-ified One Copy Built into Microsoft 365 ChatGPT on your data Secured and governed Open at every tier Insight to action AI driven insights
Microsoft Fabric The data platform for the era of AI Single… Data Factory Synapse Data Engineering Synapse Data Science Synapse Data Warehousing Synapse Real Time Analytics AI Assisted Shared Workspaces Universal Compute Capacities OneSecurity OneLake Intelligent data foundation Power BI Data Activator Onboarding and trials Sign-on Navigation model UX model Workspace organization Collaboration experience Data Lake Storage format Data copy for all engines Security model CI/CD Monitoring hub Data Hub Governance & compliance
”All-in-one“ to remove silos Scattered Best of Breed Diverse Persona Data Silos Technology Silos Silos of Skills To a single SaaS Platform Data Factory Synapse Data Engineering Synapse Data Science Synapse Data Warehousing OneLake Synapse Real Time Analytics Power BI Data Activator
Microsoft Fabric The collaboration between platforms and data pros to date Use of data Data preparation Integrated Data Platform The remaining minor instances of fragmentation, data re-importation, etc SQL • Building data model • Create reports Analyst • DWH/Lakehouse and Operation • Developing batch / real-time processing Multiple Data Lakes Hard to collaborate • Statistics & Hypothesis Testing • Develop and manage ML models Data Scientist
Microsoft Fabric Seamless collaboration among persona with Fabric Microsoft Fabric Data Factory Synapse Data Engineering Fully seamless integrated data platform Synapse Data Warehousing Synapse Real Time Analytics • Building data model • Create reports Analyst • DWH/Lakehouse and Operation • Developing batch / real-time processing Power BI Easy to collaborate • Statistics & Hypothesis Testing • Develop and manage ML models Data Scientist Synapse Data Science Always single data lake OneLake
Power BI Real-Time Analytics Data Warehouse Data Integration Persona optimized experiences Data Science
Microsoft Fabric: SaaS-ified Solution “It just works" 5x5 experiences Success by default Centralized administration Frictionless onboarding Minimal knobs Tenant-wide governance Instant Provisioning Auto optimized Centralized security management Quick results w/ Intuitive UX Auto Integrated Compliance built-in
Microsoft Fabric Anticipated users Data Engineers Data Scientists Data Analysts Expertise in integrating, Expertise in applying data transforming, and science and ML to centralizing system data implement and run into a schema suitable for machine learning Provides data via Provides analytical solutions workloads Experiences Data Factory Data Engineering warehouses and lakehouses Experiences Data Citizens Expertise in designing, creating, and deploying data analytics solutions transformed data Experiences Users who turn insights from data into business competitive advantages Providing insight through embedding Experiences Warehousing Real Time Analytics Data Stewards Data Engineering Data Science Warehousi Real Time Power BI ng Analytics Power BI Provides data via warehouses and lakehouses •Persons or departments responsible for the management and operation of data within the company Microsoft Purview
Microsoft Fabric The data platform for the era of AI Complete Analytics Platform Lake centric and open Empower Every Business User AI Powered Everything, unified OneLake Familiar and intuitive Copilot accelerated SaaS-ified One Copy Built into Microsoft 365 ChatGPT on your data Secured and governed Open at every tier Insight to action AI driven insights
Public Preview OneLake
Microsoft Fabric All data in OneLake OneDrive for data only OneDrive for Documents OneLake for Data OneLake provides a data lake as a service without having to build it yourself
OneLake “OneDrive for your data” A single SaaS lake for the whole organization Data Factory Synapse Data Engineering Synapse Data Science Synapse Data Warehousing Synapse Real Time Analytics Power BI Data Activator Provisioned automatically with the tenant All workloads automatically store their data in the OneLake workspace folders All the data is organized in an intuitive hierarchical namespace The data in OneLake is automatically indexed for discovery, MIP labels, lineage, PII scans, sharing, governance and compliance Intelligent data foundation
One Copy All computing engines can access the same data All the compute engines store their data automatically in OneLake Data Factory Spark Customer 360 Delta – Parquet Format Synapse Data Engineering T-SQL Synapse Data Science Synapse Data Warehousing Serverless Compute Synapse Real Time Analytics Power BI KQL Data Activator Analysis Services Finance Service Telemetry Business KPIs Delta – Parquet Format Delta – Parquet Format Delta – Parquet Format Delta – Parquet, an open standards format, is the storage format for all tabular data in Fabric Once data is stored in the lake, it is directly accessible by all the engines without needing any import/export All the compute engines have been fully optimized to work with Delta Parquet as their native format Shared universal security model is enforced across all the engines
One Security Apply a common security model to all computing engines All the compute engines store their data automatically in OneLake Data Factory Spark Synapse Data Engineering T-SQL Synapse Data Science Synapse Data Warehousing Serverless Compute Synapse Real Time Analytics Power BI KQL Data Activator Analysis Services One Security Customer 360 Finance Service Telemetry Business KPIs Delta – Parquet Format Delta – Parquet Format Delta – Parquet Format Delta – Parquet Format Delta – Parquet, an open standards format, is the storage format for all tabular data in Fabric Once data is stored in the lake, it is directly accessible by all the engines without needing any import/export All the compute engines have been fully optimized to work with Delta Parquet as their native format Shared universal security model is enforced across all the engines
OneLake Shortcust Taking One Copy to the Next Level Data Factory Synapse Data Engineering Synapse Data Science Synapse Data Warehousing Synapse Real Time Analytics Power BI Data Activator Sharing data in OneLake is as easy as sharing files in OneDrive, removing the needs for data duplication With shortcuts, data throughout OneLake can be composed together without any data movement Customer 360 Finance Service Telemetry Business KPIs Delta – Parquet Format Delta – Parquet Format Delta – Parquet Format Delta – Parquet Format Azure Amazon Google Shortcuts also allow instant linking of data already existing in Azure and in other clouds, without any data duplication and movement, making OneLake the first multi-cloud data lake With support for industry standard APIs, OneLake data can be directly accessed by any application or service
Parquet
Highly efficient column-oriented data format
StoreID DateTime ProductID Amount
StoreA 2023-01-01 SKU001
10
StoreA 2023-01-02 SKU001
15
StoreA 2023-01-03 SKU001
12
[
]
{
"StoreID": "StoreA",
"DateTime": "2023-01-01",
"ProductID": "SKU001",
"Amount": 10
},
{..}
<StoreData>
<Record>
<StoreID>StoreA</StoreID>
<DateTime>2023-01-01</DateTime>
<ProductID>SKU001</ProductID>
<Amount>10</Amount>
</Record>
<Record>…..
</StoreData>
Header:
RowGroup1:
StoreID:
StoreA, StoreA , StoreA
DateTime: 2023-01-01, 2023-01-02, 2023-01-03
ProdudctID: SKU001, SKU001, SKU001
Amount:
10, 15, 12
RowGroup2:
…
Footer:
Parquet
Dictionary encoding
StoreID DateTime ProductID Amount
StoreA 2023-01-01 SKU001
10
StoreA 2023-01-02 SKU001
15
StoreA 2023-01-03 SKU001
12
[
]
{
"StoreID": "StoreA",
"DateTime": "2023-01-01",
"ProductID": "SKU001",
"Amount": 10
},
{..}
<StoreData>
<Record>
<StoreID>StoreA</StoreID>
<DateTime>2023-01-01</DateTime>
<ProductID>SKU001</ProductID>
<Amount>10</Amount>
</Record>
<Record>…..
</StoreData>
Header:
RowGroup1:
StoreID:
DateTime:
ProdudctID:
Amount:
RowGroup2:
…
Footer:
1, 1, 1
1, 2, 3
1, 1, 1
1, 2, 3
Verti-Parquet Power BI Vertipaq + Parquet Source data Microsoft Sales 162 Tables Less I/O and data can be queried more efficiently Vertiparquet 880GB To Parquet ※ Numbers are estimate 268GB After V-Order optimization 84GB V-Order optimization Data size is further reduced to about 1/3 of Parquet's
Storage Mode: Power BI only Database files “DirectQuery Mode” Slow, but real time Tables Scan Data Warehouse/ Lakehouse SQL Queries DAX Queries Power BI Analysis Services Reports Storage Database files “Import Mode” Latent & duplicative but fast Tables Storage Scan Data Warehouse/ Lakehouse Import DAX Queries Power BI Analysis Services Reports Copy of Tables
Storage Mode: with Fabric Database files “DirectQuery Mode” Slow, but real time Tables Scan Data Warehouse/ Lakehouse SQL Queries DAX Queries Power BI Analysis Services Reports Storage Database files “Import Mode” Latent & duplicative but fast “Direct Lake Mode” Perfect! Tables Scan Data Warehouse/ Lakehouse Import DAX Queries Power BI Analysis Services Storage Reports Copy of Tables Parquet/Delta Lake Tables OneLake Data Warehouse/ Lakehouse Scan Power BI Analysis Services DAX Queries Reports
Direct Lake Mode Performance comparison Chris Webb's BI Blog Performance Testing Power BI Direct Lake Mode Datasets In Fabric On-Demand Loading Of Direct Lake Power BI Datasets In Fabric Parquet/Delta Lake Tables Data Warehouse/ Lakehouse Scan Power BI Analysis Services DAX Queries Reports OneLake Data Latency Query Performance Known issues DirectQuery Import Direct Lake Best Not Good Good Not Good Best Good The best of both
Microsoft Fabric The data platform for the era of AI Complete Analytics Platform Lake centric and open Empower Every Business User AI Powered Everything, unified OneLake Familiar and intuitive Copilot accelerated SaaS-ified One Copy Built into Microsoft 365 ChatGPT on your data Secured and governed Open at every tier Insight to action AI driven insights
Microsoft Fabric Empower Every Business User ① Be the undisrupted leader in ease of use, depth of data visualization and analysis capabilities ② Ubiquitous through the Office Productivity Suite ③ Unified across the Intelligence Data Platform ④ Combine with AI-driven Copilot experience for faster time to insight
Microsoft Fabric What happens to Power BI with Fabric? • Nothing changes: Power BI is part of Fabric workloads • End-to-End (E2E) SaaS-ified solution to eliminate data silos (data fragmentation) • Fabric designed for simplicity just like Power BI • Provide opportunities for Power BI users to learn other workloads (e.g., data engineering, data science, etc.) • Fabric is the future of data analytics, including Power BI ※Call-out Although there is a Fabric license, Power BI is still available without having to use Fabric capacity.
Microsoft Fabric Change of licenses Microsoft Fabric (Public Preview) Power BI Only Power BI Pro Power BI • • per user part of M365 E5 Nothing changes, but access to Fabric items requires Fabric capacity Power BI Premium Per User • per user Power BI Embedded • Power BI Free • per user Power BI Premium Per Capacity • Nothing changes per capacity, pay-as-you go per capacity NEW Microsoft Fabric Fabric Free • per user Nothing changes, but access to Fabric items requires Fabric capacity Power BI Premium Per Capacity • per capacity Fabric capacity • Pay-as-you-go Either purchase method gives you access to the full functionality of Microsoft Fabric Azure Synapse Analytics Azure • • Pay-as-you-go Reserved instance, pre-purchase plan Azure Data Factory • • Pay-as-you-go Reserved instance Azure Data Explorer • • Pay-as-you-go Reserved instance Nothing changes
Microsoft Fabric Power BI Copilot demo Simplifying the most difficult content in Power BI DAX Query View
Microsoft Fabric DAX Query View Suggestion on your input DAX is an essential skill for building advanced business logic in Power BI Generate ad hoc queries
Microsoft Fabric Introduction to Data Activator Teams Power BI Synapse Data Activator No code user experiences Outlook Power Automate Data Warehouse Synapse Real Time Analytics … Custom Model Rule Trigger Real-time stream processing …
Microsoft Fabric The data platform for the era of AI Complete Analytics Platform Lake centric and open Empower Every Business User AI Powered Everything, unified OneLake Familiar and intuitive Copilot accelerated SaaS-ified One Copy Built into Microsoft 365 ChatGPT on your data Secured and governed Open at every tier Insight to action AI driven insights
Private Preview Copilot in Microsoft Fabric
Microsoft Fabric Accelerate Productivity with Copilot Data Factory Synapse Data Engineering Synapse Data Science Synapse Data Warehousing OneLake Copilot Synapse Real Time Analytics Power BI Data Activator
Microsoft Fabric Accelerate Productivity with Copilot Your data is your own data Microsoft Cloud Runs on trust Your data will not be used to train or enrich the foundation AI models used by others Your data is protected by the most comprehensive enterprise compliance and security measures
Microsoft Fabric Functional Availability (as of September 2023) Generally available1 Power BI Public preview2 Private preview3 Data Factory Data Activator Synapse Data Engineering Copilot for Microsoft Fabric Synapse Data Science Copilot for Power BI (full) Synapse Data Warehousing Synapse Real Time analytics Copilot for Power BI (DAX) OneLake 1 General availability (GA): service is available and has a 60-day free trial 2 Public preview: Free access to all features until GA 3 Private preview: preview by a very small number of companies before 2
Microsoft Fabric Get started today Explore the product: https://www.microsoft.com/microsoft-fabric Preview Trial Getting Started https://learn.microsoft.com/fabric/get-started/fabric-trial Microsoft Fabric Documentation https://learn.microsoft.com/fabric/ Microsoft Learn - Microsoft Fabric Overview https://learn.microsoft.com/training/paths/get-started-fabric/ Microsoft Fabric Licensing https://learn.microsoft.com/fabric/enterprise/licenses
Microsoft Thank you! @marshal_dabao テクテク日記(Power BI, Fabric関連) https://marshal115.hatenablog.com/