5.9K Views
March 09, 24
スライド概要
ものづくり研究会スーパーコンピューティング技術産業応用協議会講演資料_240308
Generative Ai Study Group Master
Cutting-Edge Insights A Deep Dive into Gen-AI Tech and Trends. What's New and What's Next? 8th Mar 2024 Kunihiro Sugiyama a.k.a Generative AI Study Group .host AI Technology Consortium @ AIST
Agenda
https://www.linkedin.com/in/kunihiro-sugiyama-49b0372a/
Introduction • GASG brochure • http://tinyurl.com/ysjh5ua4 • 次回 3月12日火曜18:00~
Agenda
Theme •Title • Cutting-Edge Insights ▪ A Deep Dive into Gen-AI Tech and Trends. ▪ What's New and What's Next? •Contents • Tech trend, Use case, Issue
Theme •「マルチモーダルAI」「小規模言語モデ ル」2024年の生成AI重要トレンド(Forbes JAPAN) - Yahoo!ニュース https://news.yahoo.co.jp/articles/633019417533 9a101c2f6e8ad36b0c4109943939?page=1
Tech trend
Tech trend •Small model •Beyond Transformer •Related tech
Tech trend • Small model • Open LLM Leaderboard - a Hugging Face Space by HuggingFaceH4 https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard • Points ▪ コンピューティング量 ▪ 実行メモリ量 ▪ 推論速度 ▪ 特化型カスタマイゼーション
Tech trend • Small model • Pickup ▪ TinyLlama • jzhang38/TinyLlama: The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens. https://github.com/jzhang38/TinyLlama • 【軽量かつ高速なLLM】TinyLlamaについてまとめてみた #LLM - Qiita https://qiita.com/sergicalsix/items/7cd7665ab90b9f3b343c • LLAMAと完全互換のアーキテクチャおよびトークンナイザー • 1.1Bパラメータモデルは4bit量子化でおよそ550MB RAM上で動作
Tech trend • Small model • Pickup ▪ Phi-2 • Phi-2: The surprising power of small language models - Microsoft Research https://www.microsoft.com/en-us/research/blog/phi-2-the-surprising-powerof-small-language-models/ • [2306.11644] Textbooks Are All You Need https://arxiv.org/abs/2306.11644 • 2.7Bパラメータモデル • 高品質な学習データセットでモデル品質を確保
Tech trend • Small model • Pickup ▪ Orca 2 • Orca - Microsoft Research https://www.microsoft.com/en-us/research/project/orca/ • Microsoft's Orca 2 LLM Outperforms Models That Are 10x Larger https://www.infoq.com/news/2023/12/microsoft-orca-2-llm/ • 7B, 13Bパラメータモデル • LLAMA-2のFinetunedモデル • Reasoningが含まれる合成データセットで訓練
Tech trend • Small model • Pickup ▪ DeciLM • Deci/DeciLM-7B · Hugging Face https://huggingface.co/Deci/DeciLM-7B • [2305.13245] GQA:Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints https://arxiv.org/abs/2305.13245 • What is Grouped Query Attention (GQA)? — Klu https://klu.ai/glossary/grouped-queryattention • GQAの仕組みを採用し軽量で高性能 • GQAは例えばLLaMA2 70Bでも使われている
New!! Tech trend • Small model • Pickup ▪ Gemma • Gemma: Google introduces new state-of-the-art open models https://blog.google/technology/developers/gemma-open-models • 2B and 7B
New!! Tech trend • Small model • Pickup ▪ [2310.11453] BitNet: Scaling 1-bit Transformers for Large Language Models https://arxiv.org/abs/2310.11453 ▪ Advancing AI for humanity | Foundation of AI https://thegenerality.com/agi/
Tech trend 二次的複雑性 (Quadratic complexity O(n^2)) http://tinyurl.com/yqs8pwec • Beyond Transformer • Points ▪ Transformerの課題解決の試み • 推論速度 • 実行メモリ量 • シーケンス長 • コンピューティング量
Tech trend • Beyond Transformer • Pickup ▪ MoE (Mixture of Experts) Reference: http://tinyurl.com/ylxsvomj
Tech trend • Beyond Transformer • Pickup ▪ MoE (Mixture of Experts) • 論文解説 Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer (MoE) - ディープラーニングブログ https://deeplearning.hatenablog.com/entry/moe • Mixture of Experts Explained https://huggingface.co/blog/moe • MixtralSparseMoeBlockを読む https://zenn.dev/if001/articles/fcea9fe9f1bdb1 • Introducing Gemini 1.5, Google's next-generation AI model
Tech trend • Beyond Transformer • Pickup ▪ RWKV (Reinventing RNNs for the Transformer Era) • RWKVについて解説 | AGIRobots Blog https://developers.agirobots.com/jp/rwkv/ • RWKVを論文と実装から読み解く https://zenn.dev/jow/articles/f66d6403b9a509 • RNNでTransformer並みの性能を実現するRWKVがやばい https://zenn.dev/hikettei/articles/5d6c1318998411 • これは衝撃!1.5Bで超高性能LLM!RWKV-5-World-v2|shi3z https://note.com/shi3zblog/n/nfc8dd1abf494
Tech trend • Beyond Transformer • Pickup ▪ Mamba • state-spaces/mamba https://github.com/state-spaces/mamba • [2312.00752] Mamba: Linear-Time Sequence Modeling with Selective State Spaces https://arxiv.org/abs/2312.00752 • Mamba: Redefining Sequence Modeling and Outforming Transformers Architecture - Unite.AI https://www.unite.ai/mambaredefining-sequence-modeling-and-outforming-transformers-architecture/ • Mamba: Linear-Time Sequence Modeling with Selective State Spaces — Arxiv Dives | by Oxen | Dec, 2023 | Medium https://medium.com/@oxenai/mamba-linear-time-sequence-modeling-with-selective-state-spaces-arxiv-dives-cf96518d7ec4 ▪ StripedHyena • Architectures for longer sequences and efficient inference: StripedHyena | hessian.AI https://hessian.ai/architectures-for-longersequences-and-efficient-inference-stripedhyena/ • [2302.10866] Hyena Hierarchy:Towards Larger Convolutional Language Models https://arxiv.org/abs/2302.10866
Tech trend • Beyond Transformer • Pickup ▪ MoE-Mamba • [2401.04081] MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts https://arxiv.org/abs/2401.04081 ▪ Vision Mamba • [2401.09417] Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model https://arxiv.org/abs/2401.09417 ▪ MambaByte ▪ [2401.13660] MambaByte:Token-free Selective State Space Model https://arxiv.org/abs/2401.13660
New!! Tech trend • Beyond Transformer • Pickup ▪ [2402.13753] LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens https://arxiv.org/abs/2402.13753 ▪ LongRoPE extends LLM context windows to 2 million tokens, utilizing nonuniform positional embedding and a progressive context extension strategy to enhance model performance. ▪ Supports the recovery of short contexts, demonstrating reduced complexity and effective retrieval of large content. ▪ Through improved performance in benchmark tests with extended context windows, it enables deeper text analysis and more accurate information extraction.
New!! Tech trend • Beyond Transformer • Pickup ▪ [2402.13753] LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens https://arxiv.org/abs/2402.13753 ▪ Exploitation of Non-uniform Positional Interpolation: Optimizes RoPE for extended contexts using evolutionary search to minimize interpolation loss. ▪ Progressive Extension Strategy: Extends context first to 256k, then to 2048k tokens, avoiding initial fine-tuning on long contexts. ▪ Adjustment for Shorter Context Recovery: Ensures sustained performance across context lengths by readjusting embeddings post-extension.
New!! Tech trend • Beyond Transformer • Pickup ▪ [2402.08268] World Model on Million-Length Video And Language With RingAttention https://arxiv.org/abs/2402.08268 ▪ LWM processes long video sequences and textual data, handling up to 1 million tokens using the RingAttention technique for scalable training. ▪ It tackles vision-language training challenges, enabling efficient training and the creation of a model-driven QA dataset for improved chat functionalities. ▪ Achieves notable outcomes in understanding long videos and fact retrieval, showcasing adaptability across different task contexts.
Tech trend •Related tech •RAG (Retrieval-Augmented Generation) •Agent •Synthetic data (合成データ) •Distributed
Tech trend • Related tech • RAG (Retrieval-Augmented Generation) ▪2023年9月5日GASG第6回で取り上げました ▪進化するRAGアーキテクチャ ▪[2312.10997] Retrieval-Augmented Generation for Large Language Models:A Survey https://arxiv.org/abs/2312.10997
Reference: https://arxiv.org/pdf/2312.10997.pdf Figure 6: RAG compared with other model optimization methods Theme • Tech Trend • Related tech ▪ RAG (Retrieval-Augmented Generation)
Reference: https://arxiv.org/pdf/2312.10997.pdf Figure 2: A representative instance of the RAG process applied to question answering Theme • Tech Trend • Related tech ▪ RAG (Retrieval-Augmented Generation)
Reference: https://arxiv.org/pdf/2312.10997.pdf Figure 3: Comparison between the three paradigms of RAG Theme • Tech Trend • Related tech ▪ RAG (Retrieval-Augmented Generation)
Tech trend • Related tech • RAG (Retrieval-Augmented Generation) ▪Advanced • Optimizing data indexing • Pre retrieval process • Post retrieval process
Tech trend • Related tech • RAG (Retrieval-Augmented Generation) ▪ Modular • 多様な機能モジュール • ニーズに適したPipeline
Tech trend • Related tech • RAG (Retrieval-Augmented Generation) ▪ A Cheat Sheet and Some Recipes For Building Advanced RAG | by Andrei | Jan, 2024 | LlamaIndex Blog https://blog.llamaindex.ai/a-cheat-sheet-and-somerecipes-for-building-advanced-rag-803a9d94c41b
Tech trend • Related tech • RAG (Retrieval-Augmented Generation) ▪ Scaling context window ▪ Robustness • Hallucination ▪ Hybrid (RAG+FT) ▪ Expanding LLM role ▪ Scaling law • Embedding model ▪ Production ready • 精度, 再現性, セキュリティ(アクセスコントロール) ▪ Multi modal • Image, Audio and video, Code
New!! Tech trend •Related tech •RAG (Retrieval-Augmented Generation) ▪RAG vs LLM extension
New!! Tech trend • Related tech • RAG (Retrieval-Augmented Generation) ▪ Impact of Gemini 1.5 with over 1M context window. • Details is unclear. ▪ What is the superior to Gemini 1.5 • Cost • Latency • Accuracy
New!! Tech trend •Related tech •RAG (Retrieval-Augmented Generation) ▪RAG Future is...?
Tech trend • Related tech • Agent ▪ 2023年11月14日GASG第11回で取り上げましたReference: https://medium.com/scisharp/understand-the-llm-agent-orchestration-043ebfaead1f
Tech trend • Related tech • Agent ▪ 2024 AI Agent ▪ https://e2b.dev/blog/ai-agents-in-2024 ▪ 評価フレームワーク • THUDM/AgentBench:A Comprehensive Benchmark to Evaluate LLMs as Agents https://github.com/THUDM/AgentBench • AutoGPT/benchmark at master · Significant-Gravitas/AutoGPT https://github.com/Significant-Gravitas/AutoGPT/tree/master/benchmark • Benchmarking Agent Tool Use https://blog.langchain.dev/benchmarking-agent-tool-use
Tech trend • Related tech • 合成データ ▪ Synthetic data:Anthropic’s CAI, scaling, OpenAI’s Superalignment, tips, and open-source examples https://www.interconnects.ai/p/llm-synthetic-data ▪ [2305.15041] Generating Faithful Synthetic Data with Large Language Models: A Case Study in Computational Social Science https://arxiv.org/abs/2305.15041 ▪ [2310.07849] Synthetic Data Generation with Large Language Models for Text Classification: Potential and Limitations https://arxiv.org/abs/2310.07849 ▪ [2401.00368] Improving Text Embeddings with Large Language Models https://arxiv.org/abs/2401.00368 ▪ [2312.17742] Learning Vision from Models Rivals Learning Vision from Data https://arxiv.org/abs/2312.17742
Tech trend • Related tech • Distributed ▪ Petals • Petals – Run LLMs at home, BitTorrent-style https://petals.dev/ • bigscience-workshop/petals: Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading https://github.com/bigscience-workshop/petals • [2209.01188] Petals: Collaborative Inference and Fine-tuning of Large Models https://arxiv.org/abs/2209.01188 • [2312.08361] Distributed Inference and Fine-tuning of Large Language Models Over The Internet https://arxiv.org/abs/2312.08361 ▪ AI Horde • AI Horde https://stablehorde.net/ • Haidra-Org/AI-Horde: A crowdsourced distributed cluster for AI art and text generation https://github.com/Haidra-Org/AI-Horde?tab=readme-ov-file
Use case
Use case • 企業における生成AIの未来:ChatGPTを越えてその先へ | ガートナー https://www.gartner.co.jp/ja/articles/beyond-chatgptthe-future-of-generative-ai-for-enterprises • Top 100+ Generative AI Applications / Use Cases in 2024 https://research.aimultiple.com/generative-ai-applications/ • 2024 AI Predictions | NVIDIA Blog https://blogs.nvidia.com/blog/2024-ai-predictions/
Use case • Device ▪ Order Ai Pin Now https://hu.ma.ne/ ▪ rabbit — home https://www.rabbit.tech/ ▪ Brilliant Labs https://brilliant.xyz/ ▪ adamcohenhillel/ADeus: An open source AI wearable device that captures what you say and hear in the real world and then transcribes and stores it on your own server.You can then chat with Adeus using the app, and it will have all the right context about what you want to talk about - a truly personalized, personal AI. https://github.com/adamcohenhillel/ADeus
Issue
Issue •Security •Data contamination •Socialization
Issue • Security • Overview ▪ Safety and security risks of generative artificial intelligence to 2025 (Annex B) - GOV.UK https://www.gov.uk/government/publications/frontier-ai-capabilities-andrisks-discussion-paper/safety-and-security-risks-of-generative-artificial-intelligence-to-2025-annex-b ▪ OWASP Top 10 for Large Language Model Applications | OWASP Foundation https://owasp.org/www-project-top-10-for-large-language-model-applications/ • • Prompt hack • GPTs のプロンプトリーキング対策|ぬこぬこ https://note.com/schroneko/n/n6d6c2e645119 Prompt Hacking | Learn Prompting: Your Guide to Communicating with AI https://learnprompting.org/docs/category/-prompt-hacking • • https://owasp.org/www-project-top-10-for-large-language-model-applications/assets/PDF/OWASP-Top-10-for-LLMs-2023-slides-v1_1.pdf Solution ▪ Introducing Purple Llama for Safe and Responsible AI Development | Meta https://about.fb.com/news/2023/12/purple-llama-safe-responsible-ai-development/ ▪ New generative AI-powered SaaS security expert from AppOmni | VentureBeat https://venturebeat.com/security/new-generative-ai-powered-saas-securityexpert-from-appomni ▪ Cloudflare、Firewall for AIを発表 https://blog.cloudflare.com/ja-jp/firewall-for-ai-ja-jp/
New!! Issue •Security • ComPromptMized https://sites.google.com/view/compromptmized
Issue • Security • ComPromptMized https://sites.google.com/view/compromptmized
Issue • Data contamination • Why data contamination is a big issue for LLMs - TechTalks https://bdtechtalks.com/2023/07/17/llm-data-contamination/ • [2312.16337] Task Contamination: Language Models May Not Be Few-Shot Anymore https://arxiv.org/abs/2312.16337 • Socialization • Sotopia https://www.sotopia.world/ • [2310.11667] SOTOPIA: Interactive Evaluation for Social Intelligence in Language Agents https://arxiv.org/abs/2310.11667
Agenda
Introduction How make AI do it is all you need!!
Introduction Yes, AI can!!
EOF https://www.linkedin.com/in/kunihiro-sugiyama-49b0372a/ https://www.ai-tech-c.jp/generative-ai-study-group-gasg/