204 Views
July 09, 24
スライド概要
2024-06-27 TiUGでのPingCAP CTO Ed Huangの発表スライドです。
The Journey from MySQL to TiDB The Story about PingCAP Ed Huang Co-founder & CTO, PingCAP [email protected]
The Beginning -> From OLTP to HTAP -> From Database to Database Cloud
The beginning... ● The story begins in 2015… ○ Max, Dylan and I were working for a fast-growing startup (Wandoujia.com) ■ The AppStore for Android Phone ○ Max and I were working in Storage/Database Infrastructure Team ■ We all love Golang…We build Codis (https://github.com/CodisLabs/codis, I believe u-next is a big user of Codis, also TiDB ) ■ Fun facts: ● We started using Go even before its 1.0 ● We have zero background about SQL database ● No one use Rust in production before 😂 ○ We ran/maintained a huge amount of MySQL instances & HBase! (since 2013) ■ MySQL Sharding Cluster ● ~50 nodes ■ HBase ● >200 nodes
Why not MySQL? The different requirements: ● ● ● ● ● Cost-based model Distributed Computing Online DDL Different storage API abstraction … MySQL codebase is not easy to hack… We love Golang! We want to use the programing language we like :)
The Beginning ● The Beginning: We just want to build a better solution to Sharded MySQL ○ 100% Scalable OLTP ○ MySQL binlog listener (syncer, now is DM), helps user reduce migration cost Binlog Binlog dm Binlog dm Binlog dm dm Binlog https://github.com/go-mysql-org/go-mysql dm
The Journey
The Journey OLTP Workload insert into github_events values(...) insert into github_events values(...) insert into github_events values(...) select … from github_events where actor='xxx' … Binlog Binlog dm Binlog dm dm Binlog dm OLAP Workload
From OLTP to HTAP ● We introduce TiSpark to enhance TiDB's analytical capabilities (TiDB 3.0) TiSpark Binlog Binlog dm Binlog dm Binlog dm dm Binlog dm
From OLTP to HTAP ● ● Is Spark enough? ○ Spark only solve the computing problem We introduce Columnar Storage (TiFlash) into the storage layer of TiDB 4.0 AdHoc Reporting TiKV (Row-based Storage) TiSpark TiFlash (Columnar Storage)
From OLTP to HTAP ● ● TiFlash solves the Storage problem for high speed OLAP And then, we introduce MPP mode in TiDB 5.0 AdHoc Reporting TiKV (Row-based Storage) TiDB SQL TiFlash MPP Mode (Massively Parallel Processing)
What's next? Database Cloud! TiDB Dedicated TiDB Serverless
Is TiDB-Operator a cloud? No, not even close. It's just only a deployment tool for TiDB in k8s
Building a DBaaS is hard, the architecture overview of the first version of TiDB Cloud
Two different approaches to build DBaaS
Two different approaches to build DBaaS Shared Gateway Gateway Gateway Gateway On Demand Pool TiDB Virtual Cluster - Tenant 1 Isolated SQL Layer Shared Storage Layer TiDB Row Engine MPP Row Engine Virtual Cluster - Tenant 2 TiDB MPP MPP MPP MPP Columnar Engine Row Engine Virtual Cluster - Tenant n TiDB TiDB TiDB TiDB Columnar Engine Columnar Engine MPP Worker MPP Worker MPP Worker Shared Storage Pool S3 Service Shared Services Compaction Analyze DDL Remote Copr Service Other Region
Building a DBaaS actually gives us more 1. 2. 3. 4. Multi-tenancy a. Resource Control b. Online/Concurrent DDL c. 1,000,000+ Table Support d. … Cloud native architecture a. Compute/Storage separation TiFlash b. TiDB Serverless Auto-Scaler c. … Performance improvement New workload support a. Vector Search on TiDB Serverless Tier
The philosophy of TiDB ● ● ● ● ● Cloud is the Future ○ Embrace Kubernetes & Cloud Infrastructure early Keep Neutral ○ Not rely on specific hardware ○ Not depend on a specific cloud vendor Be Open ○ Open-Minded & Open-Source Database is not only Database ○ Eco-system also matters! Simplicity is the best weapon to defeat complexity