>100 Views
February 25, 25
スライド概要
コネクテッドカー台数の拡大に伴い、ユーザー体験の向上や付加価値サービスの需要が高まり、その中でAIが重要な役割を果たすようになっている。AIモデル開発には、これらの車から得られる多数のセンサーデータを効率的に前処理する必要がある。
オラクルのMySQL HeatWave、特にLakehouse機能は、オブジェクトストレージから直接メモリにデータをロードすることで、データの入出力操作を強化し、効率的なソリューションを提供する。
本発表では、コネクテッドカーからの大量データを効率的にロードするためのLakehouse機能の有効性を評価し、その結果を発表するとともに、柔軟なデータ分析プラットフォーム構築への示唆を考察する。
【登壇者】
根山 亮 氏
トヨタ自動車株式会社 社会システムPF開発部 InfoTech-AS データ流通基盤グループ.
グループ長兼シニアリサーチャー
HeatWavejpは、MySQL HeatWave の良さを知っていただき、参加者同士でノウハウやナレッジを共有できるユーザーコミュニティです。参加者同士のつながりを深めるため、以下の活動を行ってまいります。 COMMUNICATION *Slackやconnpassを活用したユーザー同士のコミュニケーションの場の提供 EVENT *オンライン/オフラインでのMeetupセミナーや勉強会の開催(隔月程度) SHARING *製品情報や最新アップデート、リリース情報の共有 INTERACT *参加者のコミュニティ・ネットワークやユーザー同士の交流を促進
HeatWavejp Meetup, Tokyo Evaluating Large Data Loads and Analysis with MySQL HeatWave Lakehouse at Toyota Ryo NEYAMA TOYOTA MOTOR CORPORATION
Ryo Neyama About Me Current Position: Group Manager & Senior Researcher, Social System PF Development Div., TOYOTA MOTOR CORPORATION Professional Journey: 1999~2007: IBM Tokyo Research Laboratory 2007~2008: A startup company 2008~ : Toyota's subsidiary → Toyota (in Tokyo) Research Interests: • Distributed Computing and Databases • Advanced Mobility Platform Technical Contributions & Productization: • Web Services & Security (published two books) • Transactional Cache for Java • Partitioning Facility for J2EE App Server • Data Collection for HD Map Generation (below) And... two-daughter father, outdoor lover Remote Presentation in Oracle CloudWorld 2022 (The photo is edited for privacy protection) 2
Agenda 1. Introduction • The state of the automotive industry and Toyota's aim for a mobility society • Challenges of building a data platform to support advanced mobility services 2. Evaluation of MySQL HeatWave and Lakehouse for Efficient and Cost-effective Analysis of Vehicle Big Data • Performance comparison with other products (OCW '22 recap) • HeatWave data load performance from Object Storage with Lakehouse • HeatWave's data compression capability 3. Summary and Future Work 3
Agenda 1. Introduction • The state of the automotive industry and Toyota's aim for a mobility society • Challenges of building a data platform to support advanced mobility services 2. Evaluation of MySQL HeatWave and Lakehouse for Efficient and Cost-effective Analysis of Vehicle Big Data • Performance comparison with other products (OCW '22 recap) • HeatWave data load performance from Object Storage with Lakehouse • HeatWave's data compression capability 3. Summary and Future Work 4
Automotive Industry: Once-in-a-century Transformation 1908~: Ford Model-T Today (2023): CASE*, SDGs**, etc. This photo is licensed under CC BY-SA. *CASE: Connected, Autonomous, Shared & Services, Electric (advocated by Mercedes-Benz Group AG) **SDGs: Sustainable Development Goals 5
Transformation into a Mobility Company "Traditional" AUTOMOTIVE COMPANY "All New" MOBILITY COMPANY Electrification Intelligence Diversification For details: "New Management Policy & Direction Announcement Presentation Message from Management Koji Sato", Apr. 07, 2023. https://global.toyota/en/newsroom/corporate/39013233.html Toyota Mobility Concept 6
Why Does Toyota Mobility Concept Matter? Electrification Intelligence Diversification Values People Mobility Driving Efficiency Services Safety Goods Mobility Community Design / Regional Energy Management Service Mobility Safety Assurance Because we aim to provide these values and services for society 7
Carbon Neutrality Driving Efficiency Community Design / Regional Energy Management ★Source: ITS Japan 8
Mobility for All People Mobility Goods Mobility Service Mobility ★Source: ITS Japan 9
Safety and Security Safety Safety Assurance ★Source: ITS Japan 10
Agenda 1. Introduction • The state of the automotive industry and Toyota's aim for a mobility society • Challenges of building a data platform to support advanced mobility services 2. Evaluation of MySQL HeatWave and Lakehouse for Efficient and Cost-effective Analysis of Vehicle Big Data • Performance comparison with other products (OCW '22 recap) • HeatWave data load performance from Object Storage with Lakehouse • HeatWave's data compression capability 3. Summary and Future Work 11
ML & Dev Ops for All People's Happiness ML & Dev Ops 12
ML & Dev Ops for All People's Happiness ML & Dev Ops Connectivity and AI/ML play essential roles for this process. 13
We Use Connectivity to Make a Better World CASE Connected Autonomous Shared Electric Connected Company A virtual company leading "Connected" 14
Data Volume Predictions for Connected Cars Data from cars continuously grow huge EB Shifts in the total number of connected cars Location data up to several dozen KB per month [constant] Shifts in the volume of data transfer per-vehicle CAN data up to hundred MB per month [constant] Overall volume of connected car data transfer Dynamic map generation up to several GB per month [as needed] Peripheral sensing data up to several dozen GB per month [as needed] Time and scale of data on a per-vehicle basis 15
End-to-End Dataflow Overview Cellular or Wi-Fi Stream Processing AI/ML End-user Services Data Communication Module Data Platform (a.k.a. TBDC; Toyota Big Data Center) Data Ingestion Location data Sensor data Video data LiDAR data Batch Processing Customers Developers AI/ML is a key enabler for value-added services 16
Example) Traffic Dispersion in Abrupt Traffic Events Question: For carbon neutrality, is it possible to reduce the total amount of travel time and fuel consumption by traffic dispersion when abrupt traffic events happen? ■ Motivation ■ Demonstration Congested traffic situation [Detour route] non-shortest travel time [Detour route] shortest travel time with low fuel consumption AI-improved route from Haneda Airport IC to Misato-Chuo IC Usual Route Usual route Detour route (40.1km) (47.6km) Travel time Normal traffic situation -1 hour Detour Route Congestion Time [Usual Route] the shortest travel time [Usual route] congestion happens Improvement summary ● Distance: +7.5km ● Travel time: -1hour (see Figure-1) ● Fuel consumption: -25% (see Figure-2) 燃料消費 Fuel consumption Figure-1: Travel time Usual Route -25% Detour Route Time Figure-2: Fuel consumption We need to calculating average speed and fuel consumption per road link. 17
Example) Cruising Area Prediction for Battery EV Question: Given the current location and remaining battery level, how far could the battery EV reach? ■ Motivation ■ Demonstration How far could we reach? Where should we charge the battery if the battery level would be short ? Combine the following three methods: 1. 1. Route search on road network 2. 2. Predict energy consumption along the routes (physical model) 3. 3. Draw contour lines for edge nodes (computational geometry) current location reachable area Example results battery consumption low high We need to calculate avg. speed and gradient per road link in the procedure #2. 18
A Challenge in AI Algorithm Development Cellular or Wi-Fi Stream Processing End-user Services Data Communication Module Data Platform (a.k.a. Toyota Big Data Center) Data Ingestion Location data Sensor data Video data LiDAR data AI/ML Batch Processing Customers Developers To develop new AI algorism, agility of preprocessing, machine learning, and evaluation is one of top priorities. Discovering needs Formulating hypothesis Gathering data Development process of AI algorism Reviewing algorism Developing prototype infra Functional suitability evaluation Prototyping MVP Evaluation, Cost estimation, Performance analysis Adjustment, Review of use cases Development Process of Platform Demonstration, proposal of productization Evaluation, Cost estimation, Performance analysis Development of infrastructure Judgement of productization Challenge: How to process large amounts of data efficiently and at an affordable cost 19
MySQL HeatWave & Lakehouse We expect that HeatWave & Lakehouse could use for preprocessing in development of new AI algorism Advantages of HeatWave and Lakehouse: ✓ Scalability: Capable of large data loads in object storage, i.e. HeatWave Lakehouse* *GA in July 2023 ✓ Efficiency and Affordability: HTAP-ready in-memory columnar database with multiple level parallelization, which is faster and cheaper A large amount of data from connected cars Data Communication Module preprocessing Features Machine Learning AI Models ✓# of cars in each "mesh" ✓Travel time per links ✓・・・ * *OCI: Oracle Cloud Infrastructure Source:https://dev.mysql.com/doc/heatwave/en/heatwave-introduction.html 20
Future use cases Use case 1: On-demand Cluster - Lower cost. Object Storage HeatWave Create HeatWave cluster and load data only when needed Use case 2: Dedicated Cluster Keep ingesting common data into a HeatWave cluster and share data across organizations - Waiting time for cluster creation and data loads. HeatWave Object Storage Input data Share a HeatWave cluster and data in an organization Use case 3: Common Cluster - Suitable for experimental analytics. Output data - The common preprocessing task and resulting data can be shared. - Suitable for design & development teams sharing domain-specific concerns. - Higher cost. MDS: MySQL Database Service Stream Processing (data conversion etc.) MDS HeatWave Common data - Suitable for real-time data analytics for most recent data - Managing long-term data from numerous vehicles can incur significant expenses. We will explore how we can incorporate these use cases into our platform 21
Agenda 1. Introduction • The state of the automotive industry and Toyota's aim for a mobility society • Challenges of building a data platform to support advanced mobility services 2. Evaluation of MySQL HeatWave and Lakehouse for Efficient and Cost-effective Analysis of Vehicle Big Data • Performance comparison with other products (OCW '22 recap) • HeatWave data load performance from Object Storage with Lakehouse • HeatWave's data compression capability 3. Summary and Future Work 22
Performance Comparison with Other Products In CloudWorld 2022, we presented the results in our talk*. Type of test cases OLAP Benchmark Evaluation items TPC-H Fundamental performance Range Search Fundamental performance Dynamic Map Key Search Fundamental performance Feature extraction (Calculation of statistics of multivariate) Pattern mining (Trajectory data mining) Targets of comparison * "Data Collection and Management Platform for Advanced Mobility Services at Toyota" Results A HeatWave is 2 to 30 times faster for most queries and 6+ times better in cost performance Redis HeatWave is thousands of times faster than Redis for key scans, which Redis is not good at Redis Redis is 3 times faster than HeatWave, but HeatWave is also fast enough Aggregation (GROUP BY) Performance of extracting features from data in data store B HeatWave is 2 to 60 times faster B, D HeatWave is 2 to 5 times faster Composition (JOIN) Performance of JOIN between a set of VIN (Vehicle Identification Number) and logs travel data of vehicles Legends - A: DWH Service B: Data Search Service C: Data Processing Service D: ETL Service HeatWave showed its advantages in most test cases 23
Assessment of HeatWave Updates since 2022 Assessment targets: • HeatWave data load performance from Object Storage with Lakehouse* • HeatWave's data compression capability** * We evaluated HeatWave Lakehouse in the beta program. ** GA in March 2022 24
Agenda 1. Introduction • The state of the automotive industry and Toyota's aim for a mobility society • Challenges of building a data platform to support advanced mobility services 2. Evaluation of MySQL HeatWave and Lakehouse for Efficient and Cost-effective Analysis of Vehicle Big Data • Performance comparison with other products (OCW '22 recap) • HeatWave data load performance from Object Storage with Lakehouse • HeatWave's data compression capability 3. Summary and Future Work 25
Evaluation #1 Large-data Loads Performance using HeatWave Lakehouse Objectives: Evaluate data-load performance: ① MDS vs. Lakehouse with CSV files. ② CSV files vs. native-format files, i.e. suspend and resume capability, with Lakehouse. Results: Input Data: • 18 columns CSV sensor data (1,280 files, ~11TB in total), which corresponds to 1 day data collected from 10 million vehicles. Evaluation Environment: • MySQL shape: MySQL.HeatWave.VM.Standard.E3 • MySQL version: 8.0.31-HeatWave-Preview Conclusions: ① Lakehouse was 65x faster than MDS: 11 HeatWave nodes could load 11TB CSVformat* sensor data in 88 minutes, which was >4 days in MDS (see Figure-1.1). ② Native format was 11.7 times faster than CSV format**: 11 HeatWave nodes could load or store 11TB native-format sensor data both in 7.5 minutes (see Figure-1.2). * We evaluated HeatWave Lakehouse in the beta program, and as a result, there was no support for the Parquet file format during our testing. ** Native format files are not accessible to users, so we cannot manipulate the native-format files ourselves. >4 days MySQL client 88 min input data (CSV files) input data (CSV files) Lakehouse MDS Figure-1.1: Lakehouse vs. MDS with CSV files 7.5 min (resume) input data (native-format files) 7.5 min (suspend) Figure-1.2: Data load time of native-format files in Lakehouse Source : https://www.oracle.com/mysql/heatwave/#data-recovery (Note: the figures are edited by the author.) 26
Agenda 1. Introduction • The state of the automotive industry and Toyota's aim for a mobility society • Challenges of building a data platform to support advanced mobility services 2. Evaluation of MySQL HeatWave and Lakehouse for Efficient and Cost-effective Analysis of Vehicle Big Data • Performance comparison with other products (OCW '22 recap) • HeatWave data load performance from Object Storage with Lakehouse • HeatWave's data compression capability 3. Summary and Future Work 27
Evaluation #2 Data compression capability in HeatWave Objectives: Evaluate: ① how much memory consumption differs with and without data compression. ② how much of a performance penalties are imposed by data compression. Data Size [GB] Results: 30 20 25.15 19.19 16.27 10 5.77 0 CSV files MySQL HeatWave HeatWave (w/o compression) (w/ compression) Figure-2.1: Memory consumption Table-2.1: Queries and data size # 1 Query Simple Aggregation Data Size 100GB Calculate average speed by geographical mesh. 2 Aggregation (GROUP BY) 100GB Calculate average speed by geographical mesh and VIN (Vehicle Identification Number). 3 Composition (JOIN) 100GB Composition (JOIN) with more VINs Same as #3 except larger number of vehicles. 56.6% 15 91.8% 10 13.62 11.28 5 5.88 12.4 8.7 7.33 #1 w/o compression 200 13.3% 165 187 100 250GB 69.2% 0 150 JOIN between a set of VIN and logs travel data of vehicles. 4 Processing Time [sec] Conclusions: ① Compression reduced memory consumption by 64.5%: With the original 25.15GB CSV files, HeatWave consumed 16.27GB and 5.77GB memory respectively with and without data compression (see Figure-2.1). ② The performance penalties varied between 50% and 100% for different queries. However, for more computeintensive queries, the impact was less pronounced (see Table 2.1 and Figure 2.2). reduction 64.5% #2 #3 w/ compression performance penalties 50 0 #4 Figure-2.2: Processing time 28
Reference) SQL Queries
Query #1
Query #3 and #4
SELECT
COUNT(*),
AVG(CAST(Speed_TypeA AS DOUBLE)) AS ps_speed_avg
gridsquarecode3rd AS mesh_id
FROM
`random_csv_data`
WHERE
(Speed_TypeA != "") and (length(gridsquarecode3rd) = 10)
GROUP BY
mesh_id;
WITH top100vin AS(
SELECT
/*+ SET_VAR(USE_SECONDARY_ENGINE=FORCED) */
vin,
(max(Odometerkm)-min(Odometerkm)) as Odometer_km_month
FROM
probe
WHERE
pt_dt LIKE "202001%"
AND (gridsquarecode3rd like '5339-45%'
OR gridsquarecode3rd like '5339-46%'
OR gridsquarecode3rd like '5339-35%'
OR gridsquarecode3rd like '5339-36%')
GROUP BY vin
ORDER BY Odometer_km_month DESC
LIMIT 100 /* for #3, or 1000000 for #4 */
)
SELECT
/*+ SET_VAR(USE_SECONDARY_ENGINE=FORCED) */
a.vin,
a.gps_timestamp,
a.mmlongitude,
a.gridsquarecode3rd,
a.Speed_TypeA,
b.Odometer_km_month
FROM
top100vin AS b
STRAIGHT_JOIN probe AS a ON b.vin = a.vin
Query #2
SELECT
/*+ SET_VAR(USE_SECONDARY_ENGINE=FORCED) */
vin,
gridsquarecode3rd,
AVG(Speed_TypeA)
FROM
probe
WHERE
pt_dt LIKE "202001%"
AND vin <> "0"
GROUP BY
vin,
gridsquarecode3rd;
29
Agenda 1. Introduction • The state of the automotive industry and Toyota's aim for a mobility society • Challenges of building a data platform to support advanced mobility services 2. Evaluation of MySQL HeatWave and Lakehouse for Efficient and Cost-effective Analysis of Vehicle Big Data • Performance comparison with other products (OCW '22 recap) • HeatWave data load performance from Object Storage with Lakehouse • HeatWave's data compression capability 3. Summary and Future Work 30
Summary and Future Work Summary • Background • Toyota is evolving into a mobility company with the aim of creating a better world. • Motivation • The preprocessing of extensive data gathered from connected cars to enhance our services through AI/ML. • Findings • Our research indicates that MySQL HeatWave and Lakehouse represent some of the most effective solutions for our specific use cases. Future work • Our feature requests to HeatWave • Lakehouse support in HeatWave on AWS • AWS S3 support as data source • Load and store between HeatWave and object storage in various data formats including native format • Predicate pushdown support with SQL "WHERE" clause in Lakehouse including filter optimization using partitions or metadata in Parquet files • Collaboration with our product team to make HeatWave available on our data platform 31
Mission Producing Happiness for All We make the happiness of others our first priority. We make better products more affordable. We value every second and every cent. We give all our effort and offer all our ingenuity. We look forward, not backward. We believe the impossible is possible. 32
Vision Creating Mobility for All In a diverse and uncertain world, Toyota strives to raise the quality and availability of mobility. We wish to create new possibilities for all humankind, and support a sustainable relationship with our planet. 33
Value The Toyota Way Combining software, hardware and partnerships to create unique value that comes from the Toyota Way. 【 Hardware 】 【 Partnership 】 Applying imagination to improve Creating a physical platform to enable the Expanding our abilities by uniting the society through a people-first design mobility of people and things. A flexible strength of partners, communities, philosophy. Practicing Genchi system that changes with the software. customers and employees to produce 【 Software 】 Genbutsu to understand operations at mobility and happiness for all. their essence. 34
Thank you 35