Vespa - Tokyo Meetup #yjmu

>100 Views

March 20, 18

#yjmu #Vespa #検索プラットフォーム #大規模データ処理 #機械学習 #低遅延

スライド概要

https://yj-meetup.connpass.com/event/79031/

Yahoo!デベロッパーネットワーク

@ydnjp

スライド一覧

2023年10月からSpeaker Deckに移行しました。最新情報はこちらをご覧ください。 https://speakerdeck.com/lycorptech_jp

またはPlayer版

埋め込む »CMSなどでJSが使えない場合

（ダウンロード不可）

関連スライド

深層学習による自然言語処理入門: word2vecからBERT, GPT-3まで

Yahoo!デベロッパーネットワーク 185.8K

ゼロから始める転移学習

Yahoo!デベロッパーネットワーク 90.6K

ヤフーにおける WebAuthn と Passkey の UX の紹介と考察 #idcon #fidcon

idcon fidcon

Yahoo!デベロッパーネットワーク 79.2K

OpenID Connectとネイティブアプリを取り巻く仕様と動向 Yahoo! JAPANの取り組み #openid #openid_tokyo

openid openid_tokyo

Yahoo!デベロッパーネットワーク 62.6K

運用業務とスクラムは本当に組み合わせにくいのか︖運用業務が大半を占めるプロダクト開発での試行錯誤

devsumi

Yahoo!デベロッパーネットワーク 40.8K

ヤフーのオンプレ機械学習基盤AIPFについて #ml_kubernetes

ml_kubernetes

Yahoo!デベロッパーネットワーク 32.2K

各ページのテキスト

Vespa - Tokyo Meetup Kristian Aune | March 2018

Kristian Aune Tech Product Manager Vespa Worked in Vespa Team since 2000

Who? アメリカ最大の携帯電話会社総従業員数: 15万人 Verizonでデジタルメディアを取り扱う子会社

Vespa Team 26 developers in Trondheim, Norway History: ● Fast Search & Transfer: 1998 (alltheweb.com) ● Overture: 2004 ● Yahoo: 2004 ● Oath: 2017 ● Vespa Open Source: September 2017 - we want comitters from Japan ! 😊

Topics What is Vespa Vespa features (highlights) ● History ● Ease of use ● The Vespa Team ● Scalability ● Advanced Ranking (tensors) Vespa usage in Oath ● Select use cases

Vespa A platform for low latency computations over large, evolving data sets: ● Search and selection over structured and unstructured data ● Relevance scoring ● Query time organization and aggregation of matching data ● Real-time writes Typical use cases: text search, personalization/recommendation/targeting, real-time data display

Gemini Native Ads Ads blend into streams “natively” ● Relevance ● Budget updates ● Diversity ● Match-phase

Taiwan & Hong Kong E-commerce Auctions and search

Streams: Personalized Articles Popular and personalized articles ● Relevance ● Newsroom

10.

News Search News Direct Display Search Results ● Freshness

11.

Image Search Flickr Search and navigation ● Machine learned models ● Public and personal ● Massive updates after model training

12.

Fantasy Sports Backend for all player data ● Team rosters ● Results ● Vespa is cornerstone in serving architecture ● Grid batch updates ● Extremely low-cost serving

13.

Recommendations Recommendation engine ● Videos ● Contacts ● Questions / answers

14.

Question-to-Answer Search Direct Display From question to answer ● In Search, return answer!

15.

User-generated Content Example news.yahoo.com ● Top / recent comments

16.

Installing ● Rpm packages or Docker images ● All nodes have the same packages/image ● CentOS (Runs on mac and win inside Docker or VirtualBox) ● 1 config variable: echo "override VESPA_CONFIGSERVERS [config-server-hostnames]" >> $VESPA_HOME/conf/vespa/default-env.txt http://docs.vespa.ai/documentation/vespa-quick-start.html http://docs.vespa.ai/documentation/vespa-quick-start-centos.html http://docs.vespa.ai/documentation/vespa-quick-start-multinode-aws.html

17.

Ease of Use ./services.xml <services version='1.0'> <container id='default' version='1.0'> <search/> <document-api/> <nodes> <node hostalias=”node1”/> </nodes> </container> <content id='music' version='1.0'> <redundancy>2</redundancy> <documents> <document mode='index' type='music'/> </documents> <nodes> <node hostalias=”node2” distribution-key=”1”/> <node hostalias=”node3” distribution-key=”2”/> </nodes> </content> </services> ./hosts.xml <hosts> <host name="host1.domain.name"> <alias>node1</alias> </host> <host name="host2.domain.name"> <alias>node2</alias> </host> <host name="host3.domain.name"> <alias>node3</alias> </host> </hosts>

18.

Application Packages ./searchdefinitions/music.sd http://docs.vespa.ai/documentation/ search-definitions.html search music { document music { field artist type string { indexing: summary | index } field album type string { indexing: summary | index } field track type string { indexing: summary | index } field popularity type int { indexing: summary | attribute attribute: fast-search } } rank-profile song inherits default { first-phase { expression { 0.7 * nativeRank(artist,album,track) + 0.3 * attribute(popularity) } } } }

http://docs.vespa.ai/documentation/search-definitions.html

19.

Scalability Vespa distributes data over ● A set of nodes ● With a certain replication factor ● In a set of groups Nodes or distribution (config) change > Dynamic redistribution No need to manually partition data - no shards!

20.

Tensors A data type in ranking expressions (in addition to double) Makes it possible to deploy large and complex ML models to Vespa Examples ● Deep neural nets ● FTRL (regression models with millions of parameters) ● Word2vec models http://docs.vespa.ai/documentation/tensor-intro.html

http://docs.vespa.ai/documentation/tensor-intro.html

21.

What is a tensor? Tensor: A multidimensional array which can be used for computation Textual form: { {address}:double, .. } where address is {identifier:value},... Examples ● 0-dimensional: A scalar {{}:0.1} ● 1-dimensional: A vector {{x:0}:0.1, {x:1}:0.2} ● 2-dimensional: A matrix {{x:0,y:0}:0.1, {x:0,y:1}:0.2} Indexed tensor dimensions: Values addressed by numbers, continuous from 0 Mapped tensor dimensions: Values addressed by identifiers, sparse

22.

[beta]

TensorFlow import
Import machine learned ranking models trained with TensorFlow directly.
Add the files to the application package, and point to the model during ranking:

first-phase {
expression: sum(tensorflow("my_model/saved"))
}

http://docs.vespa.ai/documentation/tensorflow.html

http://docs.vespa.ai/documentation/tensorflow.html

23.

TensorFlow import