>100 Views
July 10, 19
スライド概要
Large scale migration fromHive to Presto at Yahoo! JAPAN. Slides for Presto Conference Tokyo 2019.
2023年10月からSpeaker Deckに移行しました。最新情報はこちらをご覧ください。 https://speakerdeck.com/lycorptech_jp
Large scale migration from Hive to Presto at Yahoo! JAPAN 2019年7月9日 Poon Yat Sing (Star) Query Engine, Grid Dept, Data Platform, Yahoo! JAPAN Copyright (C) 2019 Yahoo Japan Corporation. All Rights Reserved.
Agenda About Us Motivation Applications and Usage of Presto Challenges resolved Future Plan Copyright (C) 2019 Yahoo Japan Corporation. All Rights Reserved. 2
Yahoo! JAPAN Founded on 1996 40 million+ Monthly Login Users 100+ services Search, News, Shopping, Weather, Mail, Auctions... Copyright (C) 2019 Yahoo Japan Corporation. All Rights Reserved. 3
About Us Grid Department, Data Platform Division Storage HDFS Dragon (S3 blob storage) Compute Hive Presto Spark Copyright (C) 2019 Yahoo Japan Corporation. All Rights Reserved. 4
Analytics Platform (v1) We use Hadoop since 2009! logs, clickstream MapReduce / Tez / Hive Write Read HDFS Copyright (C) 2019 Yahoo Japan Corporation. All Rights Reserved. 5
Problems Mixed of workloads Bad resource isolation Long waiting queue Scalability 完了通知メールあり テスト集計 確 認 email notification on query completion Copyright (C) 2019 Yahoo Japan Corporation. All Rights Reserved. 6
Analytics Platform (v2) We create more Hadoop clusters! logs, clickstream Kafka General Purpose High SLA Report Generation Blob Storage Hive Hive Hive Dragon (S3) HDFS HDFS HDFS Data platform - Grid Copyright (C) 2019 Yahoo Japan Corporation. All Rights Reserved. 7
Motivations of using Presto Speed Cost Reliability Minimum dependency Copyright (C) 2019 Yahoo Japan Corporation. All Rights Reserved. 8
Analytics Platform (v3) Avg. latency: from min to sec 0 outage 2.5x Throughput logs, clickstream Kafka General Purpose High SLA Report Generation Blob Storage Presto Presto Presto Presto Hive Hive HDFS HDFS HDFS Dragon (S3) Data platform - Grid Copyright (C) 2019 Yahoo Japan Corporation. All Rights Reserved. 9
Applications and Usage SQL on Hadoop & S3 600+ Presto Servers 250K queries / day Report generation, Ad-hoc query, Data exploration, BI tool Copyright (C) 2019 Yahoo Japan Corporation. All Rights Reserved. 10
Report Generation Cluster (v1) 400+ QPS Full GC Workers Workers ... Workers HDFS Copyright (C) 2019 Yahoo Japan Corporation. All Rights Reserved. 11
Report Generation Cluster (v2) 1. New query HAProxy 2. HTTP 307 Redirect 3. Connect to coordinator 4. Coordinator returns NextURL Coordinator Coordinator Coordinator Coordinator Workers ... Workers Workers ... Workers Workers ... Workers Workers ... Workers HDFS Copyright (C) 2019 Yahoo Japan Corporation. All Rights Reserved. 12
General Purpose Cluster User Admin Internal Query Portal BI tools Grafana PagerDuty TLS Client Auth Impersonation LoadBalancer Staging Presto Cluster (Log Analysis) AlertManager Coordinator (active) Coordinator (Standby) Workers Workers Workers Workers event-listener Prometheus Kerberos Impersonation HDFS Dragon (S3) Auditing Metrics monitoring Copyright (C) 2019 Yahoo Japan Corporation. All Rights Reserved. 13
On-demand Presto cluster RBAC service (Athenz) Set permission Query Presto Gateway (Traefik) Auth Plugin On-demand Presto cluster Auth Plugin On-demand Presto cluster ... Auth Plugin On-demand Presto cluster Data Sources Create cluster Cluster Management API Marathon Mesos Service Users Copyright (C) 2019 Yahoo Japan Corporation. All Rights Reserved. 14
Challenges resolved Copyright (C) 2019 Yahoo Japan Corporation. All Rights Reserved.
Problems HiveQL to ANSI SQL Performance issue on Nested data Copyright (C) 2019 Yahoo Japan Corporation. All Rights Reserved. 16
HiveQL to ANSI SQL Why HiveQL is different from Presto SQL (ANSI SQL)[1] 10,000+ queries are written in HiveQL Better to let the expert (ie. we) to deal with the transformation Possible solutions Modify Presto to support HiveQL[2] Rewrite HiveQL to PrestoSQL => Need a semi-automated conversion tool [1] https://github.com/prestodb/presto/issues/7338 [2] http://armsword.com/2019/03/31/presto-compatible-hive-syntax/ Copyright (C) 2019 Yahoo Japan Corporation. All Rights Reserved. 17
HiveQL to ANSI SQL What (not) to transform: Token !Foo => NOT FOO Foo == Bar => Foo = Bar Have clear transformation rules x RLIKE y => regexp_like(x, y) a[0] => element_at(a, 1) NO auto type casting Copyright (C) 2019 Yahoo Japan Corporation. All Rights Reserved. 18
Why No auto type casting
Presto Blog About Presto
Optimizing the Casts Away
May 21, 2019 • Martin Traverso
The next release of Presto (version 312) will include a new optimization to remove unnecessary casts
which might have been added implicitly by the query planner or explicitly by users when they wrote
the query.
This is a long post explaining how the optimization works. If you're only interested in the results, skip
to the last section. For the full details, read on!
Like many programming languages, SQL allows certain operations between values of different types if
there are implicit conversions (a.k.a., implicit casts or coercions) between those types. This improves
usability, as it allows writing expressions like 1.5 > 2 without worrying too much whether the types
are compatible ( 1.5 is of type decimal(2,1) , while 2 is an integer ).
During query analysis and planning, Presto introduces explicit casts for any implicit conversion in the
original query as it translates it into the intermediate query plan representation the engine uses
internally for optimization and execution. This eliminates a layer of complexity for the optimizer, which,
as a result, doesn't need to reason about types (type inference) or worry about whether expressions
are properly typed.
More importantly, it simplifies the job of defining and implementing operators (e.g., > , < , = , etc).
Without implicit conversions, there would need to exist a variant of every operator for every
combination of compatible types. For example, it would be necessary to have an implementation of
https://prestodb.io/blog/2019
/05/21/optimizing-the-casts-
away.html
Copyright (C) 2019 Yahoo Japan Corporation. All Rights Reserved. 19
HiveQL to ANSI SQL Try 1: Regular expression Newlines Comments String encapsulation Nested expression Order of applying the transformation Can't test => We need a SQL parser Copyright (C) 2019 Yahoo Japan Corporation. All Rights Reserved. 20
HiveQL to ANSI SQL Try 2: queryparser uber / queryparser Watch 53 Star 688 Fork 71 <> Code Issues 18 Pull requests 0 Projects 0 Security Insights Join GitHub today GitHub is home to over 36 million developers working together to host and review code, manage projects, and build software together. Sign up Dismiss Parsing and analysis of Vertica, Hive, and Presto SQL. 39 commits 5 branches 1 release 9 contributors MIT Branch: master New pull request Find File Clone or download What language is this? Haskell 100.0% Java Scala Python JavaScript Shell Copyright (C) 2019 Yahoo Japan Corporation. All Rights Reserved. 21
HiveQL to ANSI SQL Try 3: ANTLR! Parser generator used by Presto, Spark, Hive... Easy to use Written in Java! Copyright (C) 2019 Yahoo Japan Corporation. All Rights Reserved. 22
HiveQL to ANSI SQL SqlBase.g4 Rules of Transformation LEXER Parser Expr 1 + 2 ParseTree ParseTreeWalker + TokenStreamRewriter Generated Query Parser Borrow the idea from Presto optimizer Individual testable rules ANTLR support token rewrite Warning on risky query (semantic different) Hive Queries HiveQL Transformer Presto SQL Presto HiveQL Hive Manual fix if not match (type casting...) Compare Result Copyright (C) 2019 Yahoo Japan Corporation. All Rights Reserved. 23
HiveQL to ANSI SQL
Transform rule:
@Override
public void rewrite(TokenStreamRewriter rewriter, SqlBaseParser.PredicateContext ctx
{
// Foo REGEXP Bar
// to
// regexp_like(Foo, Bar)
//
// g4:
// | NOT? kind=(RLIKE | LIKE) pattern=valueExpression
SqlBaseParser.PredicateContext parent = (SqlBaseParser.PredicateContext) ctx.g
String out = String.format("regexp_like(%s, %s)",
rewriter.getText(Interval.of(parent.valueExpression().start.getTokenInde
rewriter.getText(Interval.of(ctx.pattern.start.getTokenIndex(), ctx.patt
rewriter.replace(parent.start, parent.stop, out);
}
Tests for transformation:
@Test
public void TestRewriteNotFire()
{
getRewriteTester().assertThat(new RewriteRLIKE())
.on("select * from a where s = 'str RLIKE ''abcde'''")
.doesNotFire();
}
@Test
public void TestNormalRewrite()
{
getRewriteTester().assertThat(new RewriteRLIKE())
.on("select * from a where str RLIKE 'abcde'")
.matches(pattern: "select * from a where regexp_like(str, 'abcde')");
getRewriteTester().assertThat(new RewriteRLIKE())
.on("select * from a where str REGEXP 'abcde'")
.matches(pattern: "select * from a where regexp_like(str, 'abcde')");
}
Copyright (C) 2019 Yahoo Japan Corporation. All Rights Reserved. 24
HiveQL to ANSI SQL ~50% can be converted automatically with correct result ~40% require additional manual conversion ~10% produce incorrect result due to semantic difference Copyright (C) 2019 Yahoo Japan Corporation. All Rights Reserved. 25
HiveQL to ANSI SQL Most of the conversion can be automated Beware of edge cases and semantic different Verification is a must Copyright (C) 2019 Yahoo Japan Corporation. All Rights Reserved. 26
Performance issue on Nested data Nested Data Type Struct(x, y, z) Array(Struct(x, y, z)) Query SELECT a.x FROM table SELECT a.x FROM table CROSS JOIN UNNEST(arr) as a Copyright (C) 2019 Yahoo Japan Corporation. All Rights Reserved. 27
Performance issue on Nested data Possible Solutions Buy more servers Not enough space for rack in data center... Not enough power supply... ETL the data Too much data Improve Presto Copyright (C) 2019 Yahoo Japan Corporation. All Rights Reserved. 28
Speed up Presto Nested column pruning Improve the Unnest Operator Copyright (C) 2019 Yahoo Japan Corporation. All Rights Reserved. 29
Problem queries EXPLAIN SELECT t.x, count(1) FROM nest_test CROSS JOIN UNNEST(a) AS t GROUP BY t.x; ...... Layout: [x:varchar] Project[] Layout: [x:varchar] Unnest[replicate=, unnest=a:array(row(x varchar, y varchar))] Layout: [x:varchar, y:varchar] TableScan[hive_nao:star_test:nest_test] Layout: [a:array(row(x varchar, y varchar))] a := a:array<struct<x:string,y:string>>:1:REGULAR 80% of the data got pruned 75% of the CPU time 15% of the CPU time Copyright (C) 2019 Yahoo Japan Corporation. All Rights Reserved. 30
CPU Profiling Flame Graph Search Click to go back, hold to see history all java/lang/Thread.run java/util/concurrent/ThreadPoolExecutor$Worker.run java/util/concurrent/ThreadPoolExecutor.runWorker com/facebook/presto/$gen/Presto_0_215_20_g8766203__0_215__20190218_045431_1.run com/facebook/presto/execution/executor/TaskExecutor$TaskRunner.run com/facebook/presto/execution/executor/PrioritizedSplitRunner.process com/facebook/presto/execution/executor/TaskExecutor$SplitRunner.processFor com/facebook/presto/execution/SqlTaskExecution$DriverSplitRunner.processFor com/facebook/presto/operator/Driver$$Lambda$1068/407151781.get com/facebook/presto/operator/Driver.processFor$8 com/facebook/presto/operator/Driver.processInternal com/facebook/presto/operator/Driver.tryWithLock com/facebook/presto/operator/UnnestOperator.getOutput com/facebook/presto/spi/block/MapType.appendTo com/facebook/presto/spi/block/MapBlockBuilder.appendStructureInternal com/facebook/presto/spi/block/Dictionary.. com/facebook/presto/spi/block/AbstractVa.. com/facebook/presto/spi/block/Dictionary.. com/facebook/presto/spi/block/AbstractRowBlock.wri.. com/facebook/presto/spi/block/RowBlockBuilder.appe.. com/facebook/presto/spi/type/RowType.appendTo Copyright (C) 2019 Yahoo Japan Corporation. All Rights Reserved. 31
Column Pruning CREATE TABLE nested( A int, B int, C int ) SELECT B FROM Table; Compute Presto Presto Column Pruning Storage (S3, HDFS) A1 B1 C1 A2 B2 C2 A3 B3 C3 Row format A1 A3 B1 B2 B3 C1 C3 Columnar format (Parquet/ORC) Copyright (C) 2019 Yahoo Japan Corporation. All Rights Reserved. 32
Column Pruning CREATE TABLE nested( A int, B struct<B1:string,B2:int>, C int ) SELECT B.B1 FROM Table; Compute Presto Presto Storage (S3, HDFS) A1 A3 B1_1 B1_2 B1_3 B2_1 B2_2 B2_3 C1 C3 Without Nested Column Pruning A1 A3 B1_1 B1_2 B1_3 B2_1 B2_2 B2_3 C1 C3 With Nested Column Pruning Copyright (C) 2019 Yahoo Japan Corporation. All Rights Reserved. 33
Problem queries
New Optimization Rule:
Project -> Unnest -> TableScan
Push the projection expression
into TableScan operator
Handle the pruning in ORC/Parquet
reader
EXPLAIN SELECT t.x, count(1) FROM nest_test
CROSS JOIN UNNEST(a) AS t
GROUP BY t.x;
......
Layout: [x:varchar]
Project[]
Layout: [x:varchar]
Unnest[replicate=, unnest=a:array(row(x varchar, y varchar))]
Layout: [x:varchar, y:varchar]
TableScan[hive_nao:star_test:nest_test]
Layout: [a:array(row(x varchar, y varchar))]
a := a:array<struct<x:string,y:string>>:1:REGULAR
Copyright (C) 2019 Yahoo Japan Corporation. All Rights Reserved. 34
Performance improved 2x - 10x improvement on our production data CPU time spent Before After 0 500 1000 1500 2000 2500 3000 CPU Time (minutes) Others Project Read+Unzip Unnest Copyright (C) 2019 Yahoo Japan Corporation. All Rights Reserved. 35
TODO Generalized the implementation Migrate to the new pushdown framework: https://github.com/prestosql/presto/wiki/Pushd own-of-complex-operations Copyright (C) 2019 Yahoo Japan Corporation. All Rights Reserved. 36
Problem queries #2
SELECT *
FROM my_table
CROSS JOIN UNNEST(field) AS r
WHERE r.field1 = 'foo' <--- A very selective filter
LIMIT 10;
- Output[<......>]
- LimitPartial[<......>]
- Filter[filterPredicate = ...] => [<......>]
CPU: 2.01m (6.79%), Scheduled: 2.07m (5.71%), Input: 800152 rows (63.35GB), Output: 75222 rows (4.42GB),
Filtered: 90.60%
Input avg.: 247.73 rows, Input std.dev.: 63.96%
- Unnest [replicate=<......>,unnest=field] => [<......>]
CPU: 23.32m (78.52%), Scheduled: 23.75m (65.65%), Output: 800152 rows (63.35GB)
Input avg.: 3.33 rows, Input std.dev.: 96.75%
- TableScan[hive:my_table, grouped = false] => [<......>]
CPU: 4.20m (14.14%), Scheduled: 10.18m (28.14%), Output: 10740 rows (664.73MB)
Input avg.: 3.33 rows, Input std.dev.: 96.75%
Copyright (C) 2019 Yahoo Japan Corporation. All Rights Reserved. 37
Improve Unnest Operator Avoid copy in replicate channel T1(name varchar, field array(string)) name field Alice [A1, A2] Bob [B1, B2, B3] SELECT * FROM T1 CROSS JOIN UNNEST(field) name field Alice A1 Alice A2 Bob B1 Bob B2 Bob B3 Use dictionary block to avoid duplication replicate channels unnest channels Copyright (C) 2019 Yahoo Japan Corporation. All Rights Reserved. 38
Improve Unnest Operator Avoid copy in unnest channel name field Alice [A1, A2] Bob [B1, B2, B3] Charlie [C1] ArrayBlock Offset Value 0 A1 2 A2 5 B1 B2 B3 C1 SELECT * FROM T1 CROSS JOIN UNNEST(field) name field Alice A1 Alice A2 Bob B1 Bob B2 Bob B3 Same Data Memory representation Result of Unnest Copyright (C) 2019 Yahoo Japan Corporation. All Rights Reserved. 39
Improve Unnest Operator issues/563 pull/901 => PrestoSQL 316 Thanks Pratham(@phd3)! CPU time comparison for Unnest Operator with a single unnest channel replicateTypes=[varchar], unnestTypes=[<varying>] 400 300 200 100 0 173 10 290 17 469 27 array(varchar) map(varchar, varchar) array(row(varchar, varchar, varchar)) Current Proposed in #901 Copyright (C) 2019 Yahoo Japan Corporation. All Rights Reserved. 40
Suggestion on Performance Tuning Explain analysis / UI Live Plan Metrics (JMX/CPU/Memory/Network/Data source...) Orcfiledump / Parquet-Tools CPU Flame Graph Dive into the source code Ask by Slack / mail list Copyright (C) 2019 Yahoo Japan Corporation. All Rights Reserved. 41
Future Plan Nested column pruning Java 11 Kubernetes Extend warning system to detect bad queries Copyright (C) 2019 Yahoo Japan Corporation. All Rights Reserved. 42
Any questions? Copyright (C) 2019 Yahoo Japan Corporation. All Rights Reserved.
Thank you! Copyright (C) 2019 Yahoo Japan Corporation. All Rights Reserved.