Cassandra @ Yahoo Japan | Cassandra Summit 2016

147 Views

October 11, 16

スライド概要

Cassandra Summit 2016
Day2 Conference - Thursday, September 8, 2016
3:00 PM – 3:35 PM, Room: 212
Cassandra @ Yahoo Japan
Satoshi Konno: Yahoo Japan Corporation

profile-image

2023年10月からSpeaker Deckに移行しました。最新情報はこちらをご覧ください。 https://speakerdeck.com/lycorptech_jp

シェア

またはPlayer版

埋め込む »CMSなどでJSが使えない場合

(ダウンロード不可)

関連スライド

各ページのテキスト
1.

Cassandra @

2.

About me Satoshi Konno http://www.cybergarage.org • Engineering Manager of NoSQL Team @ Yahoo! Japan • Open Source Software Developer for Virtual Reality, IoT and Cloud Computing • Doctor's Course Student @ JAIST Défago Lab : The φ accrual failure detector 2

3.

Agenda • • • • Company Profile Summary of C* Clusters Issues and Solutions of C* Next Generation Infrastructures for C*

4.

Company Profile 4

5.

Company Profile Founded : Businesses : Web Services : Smartphone Apps: Employees : Head Office : January 31, 1996 Internet Advertising e-Commerce Members Services, etc. 100+ 50+ (iOS), 50+ (Android) 5,800+ (as of June 30, 2016) Chiyoda-ku, Tokyo, Japan 5

6.

Shareholder Composition Japan U.S. Market Cap $29 billion Market Cap $60 billion 35.5 % 42.9 % Market Cap $22 billion An independent and public company in the Japanese Market 6

7.

18th Largest Internet Company in market cap 600 500 bilion U.S. dollars 400 300 200 100 0 http://www.statista.com/statistics/277483/market-value-of-the-largest-internet-companies-worldwide/ 7

8.

Continued Growth Sustained 19 years 18 17 16 Revenue ¥652B, Operating Income ¥171B (FY2015)

9.

Revenue Portfolio Others Marketing Solutions 8% % 60% Consumer 32% (FY2015)

10.

Extensive Reach to a Wide Range of Users 80 % 80% of all Japanese Internet users use Yahoo! JAPAN Nielsen NetView June 2015 : Data by Brands. Access from home and work using PCs (excl. internet applications) 10

11.

Many Strong Services Media Search Video Answer Mail US JP News Membership Search C2C Payment Knowledge search C2C EC B2C EC Mail Local US JP Premium Wallet YAHUOKU! 11 Loco

12.

Summary of C* Clusters 12

13.

Yahoo! JAPAN Database Platforms 100+ Services NoSQL 300+ Team Systems 13

14.

OSS Database Platforms Yahoo Japan RDB Team MySQL 40 180 630 60 Systems DBs 30 100 130 Systems DBs 300+ Systems NoSQL Team Cassandra 70 14

15.

Cassandra @ Yahoo! JAPAN 2010 Service Departments 2012 0.5 0.8 2014 2016 1.x NoSQL Our Team Team 0.8 1.x 2.x 3.x 15

16.

Our Cassandra Clusters 2016 3 30 1000+ 30TB 300,000 100,000 DCs Clusters Nodes Usages Read/sec Write/sec 1 30 Shared Special 10 Cluster Clusters Nodes / 50 30 Systems Systems Cluster … 160 Nodes / Cluster 16

17.

Our Use Case Summary on Cassandra Advertising Services User Databases Demographic Data Preference Data Browsing History 10 Impression Data 40 ・・・・ Life Log Behavior History ・・・・ 100 Service Databases Systems 50 Database Caching Meta Data Meta Data Generated Data Generated Data Aggregated Data 20 Aggregated Data Session Data Recommendation ・・・・ ・・・・ 17

18.

Our Issues and Solutions 18

19.

ISSUE #1 : C10k Problem – C* Proxy 6.8 Billion PV /month PC + Tablet Smart Device 3.36B PV 3.45B PV 19

20.

ISSUE #1 : C10k Problem – C* Proxy Yahoo Japan Services 10 〜 200 Front-end Servers / Service .......... PHOTO:AFLO 20

21.

ISSUE #1 : C10k Problem – C* Proxy • PROBLEM : 200 front-end servers * 128 processes * 2 (C* request + C* heart beat) =51,200 connections / node 51,200 connections ! 200 Front-end Servers 128 processes PHOTO:AFLO 21

22.

ISSUE #1 : C10k Problem – C* Proxy • PROBLEM : 200 front-end servers * 128 processes * 2 (C* request + C* heart beat) =51,200 connections / node PHOTO:AFLO 22

23.

ISSUE #1 : C10k Problem – C* Proxy • PROBLEM : 200 front-end servers * 128 processes * 2 (C* request + C* heart beat) =51,200 connections / node Process down PHOTO:AFLO 23

24.

ISSUE #1 : C10k Problem – C* Proxy • SOLUTION : 200 front-end servers * 128 processes * 1 proxy * 2 (C* request + C* heart beat) =400 connections / node 200 front-end servers 400 connections ! 1 proxy 128 processes PHOTO:AFLO 24

25.

ISSUE #2 : Boostrap Problem - Driver • Heavy Services : ↑3000qps/node = C* cluster with real servers (SSD is recommended) • Light Services : ↓1000qps/node and ↓3GB/node = C * cluster with virtual servers on OpenStack Heavy Service Light Service CPU = Good vCPU = Cheap 25

26.

ISSUE #2 : Boostrap Problem - Driver • PROBLEM : All processes in each front-end server tries to connect a new C* node which is added into the cluster at the same time ... ! ! ! vCPU = Cheap .......... ! ! ! PHOTO:AFLO 26

27.

ISSUE #2 : Boostrap Problem - Driver • PROBLEM : The authentication of C* based on BCrypt is heavy processing for the vCPU nodes. ! ! ! vCPU : Authentication (BCrypt) is heavy ! .......... ! ! ! PHOTO:AFLO 27

28.

ISSUE #2 : Boostrap Problem - Driver • PROBLEM : Most processes can not connect to C* clusters on OpenStack due to the authentication processing, and the processes will timeout and repeat to connect without waiting endlessly … All vCPU Usages = 100% ! vCPU : Authentication (BCrypt) is heavy ! Timeout ! Retry ! PHOTO:AFLO 28

29.

ISSUE #2 : Boostrap Problem - Driver • SOLUTION : Improving the C* drivers not to connect simultaneously when the connection is failed. ! ! ! .......... ! ! ! PHOTO:AFLO 29

30.

ISSUE #3 : Multi-tenancy – Slow Query • Small Services : (↓500qps and ↓10GB) / keyspace = Shared C* cluster with real servers 50 Services Shared Cluster 30

31.

ISSUE #3 : Multi-tenancy – Slow Query • PROBLEM : Couldn’t find the causal service of the high loading queries in the multi-tenancy cluster. QUERY Shared Cluster Which services ? QUERY PHOTO:AFLO 31

32.

ISSUE #3 : Multi-tenancy – Slow Query • SOLUTION : CASSANDRA-12403 - Slow query detecting Slow Query ! QUERY Special Cluster Shared Cluster Service Remove PHOTO:AFLO 32

33.

ISSUE #4 : Multi-racking – Inbound Params • PROBLEM : Our C* clusters are build with other services in a same rack or under a same core switch. PHOTO:AFLO 33

34.

ISSUE #4 : Multi-racking – Inbound Params • PROBLEM : C* Streaming occurs when the node is added or remove by the our operation or the failure detection. Streaming PHOTO:AFLO 34

35.

ISSUE #4 : Multi-racking – Inbound Params • PROBLEM : The streaming of C* rises a heavy traffic, and it troubles the other services. Stop C* streaming ! Streaming stream_throughput_outbound Streaming stream_throughput_outbound Streaming stream_throughput_outbound PHOTO:AFLO 35

36.

ISSUE #4 : Multi-racking – Inbound Params • SOLUTION : CASSANDRA-11303 - New inbound throughput parameters for streaming Streaming stream_throughput_outbound stream_throughput_inbound Streaming stream_throughput_outbound stream_throughput_inbound Streaming stream_throughput_outbound stream_throughput_inbound PHOTO:AFLO 36

37.

Next Generation Infrastructures for C* 37

38.

OpenStack @ Yahoo! JAPAN • PURPOSE : To abstract our data center resources using OpenStack. 50,000+ instances Apps API API API API Platforms API API Infrastructures 38

39.

Trial #1 : Special Hypervisor for C* • PROBLEM : Our hypervisors of OpenStack has C* and other service VMs. Noisy Neighbours 39

40.

Trial #1 : Special Hypervisor for C* • SOLUTION : Trying to offer the special hypervisors which runs only C* VMs. 10Gbps x 2 Optimal Flavors for C* vCPU : 8+, Mem : 16GiB+ SSD : 100GiB+ 40

41.

TRIAL#2 : Bare Metal Clusters for C* • PROBLEM : vCPU of OpenStack is cheap to run a C* node in our special service environment such as the many connections. vCPU : Authentication (BCrypt) is heavy ! 41

42.

TRIAL #2 : Bare Metal Clusters for C* • SOLUTION : Trying to offer the special bare metal clusters which runs only C* using OpenStack Ironic. 10Gbps x 2 Ironic Xeon D-1541 2.1GHz (1CPU) 32GBMEM / SATA SSD 400GB 42