报告预览

Talkingdata：Alluxio - 开源AI和大数据存储编排平台（36页）.pdf

编号：9276

PDF 36页 18.30MB 下载积分：VIP专享

下载报告请您先登录！

Talkingdata：Alluxio - 开源AI和大数据存储编排平台（36页）.pdf

1、Alluxio - 开源AI和大数据存储编排平台顾荣 Alluxio PMC & Maintainer 南京大学计算机系副研究员、博士提纲 1. Alluxio项目&系统简介 2. Alluxio 2.0新特性概览 3. Alluxio未来发展趋势快览 4. 总结数据处理的四大趋势驱动了新型基础架构的需求 Separation of Compute & Storage Hybrid Multi cloud environments Self-service data across the enterprise Rise o

2、f the object store Data Ecosystem - BetaData Ecosystem 1.0 COMPUTE STORAGESTORAGE COMPUTE 大数据之路与企业创新的选择同置 (Co-located ) Co-located compute & HDFS on the same cluster Disaggregated compute & HDFS on the same cluster MR / Hive HDFS Hive HDFS 分散 (Disaggregated) Burst HDFS data in t

3、he cloud, public or private Support Presto, Spark and other computes without app changes Enable & accelerate big data on object stores 向对象存储过渡混合云化部署HDFS 支持更多计算框架技术转变中的挑战 Accessing data over WAN too slow Copying data to compute cloud time consuming and complex Using anot

4、her storage system like S3 means expensive application changes Using S3 via HDFS connector leads to extremely low performance 混合云部署HDFS Copying data to multiple compute clouds time consuming and error prone Migrating applications for new storage systems is complex & time consuming Storing and ma

5、naging multiple copies of the data becomes expensive 支持更多计算框架 Object stores performance for big data workloads can be very poor No native support for popular frameworks Expensive metadata operations reduce performance even more No support for hybrid environments directly 向对象存储过渡 12

6、/2/19 7 计算与存储实现独立可扩展性 FUSE Compatible File SystemHadoop Compatible File SystemNative Key-Value InterfaceNative File System Unifying Data at Memory Speed GlusterFSInterfaceAmazon S3 InterfaceSwift InterfaceHDFS Interface Alluxio: a Virtual Distributed File System (VDFS) Java File APIHDFS

7、 InterfaceS3 InterfaceREST API HDFS DriverS3 DriverSwift DriverNFS Driver FUSE Interface 12/2/19 计算与存储实现独立可扩展性 Master-Worker Master 管理全部元数据监控各个Worker状态 Worker 管理本地MEM、SSD和HDD Client 向用户和应用提供访问接口向Master和Worker发送请求 Under File System 一般用于备份 9 Under File Syst

8、em node 1node 2node 3 Master Client 齿齿侈侈尺尺 MEM Worker1 SSD HDD MEM Worker3 SSD HDD MEM Worker2 SSD HDD Alluxio系统内部整体架构 Alluxio数据编排赋能的几类场景 Burst big data workloads in hybrid cloud environments On premise Same instance / container Accelerate big data frameworks on the public cloud Same i

9、nstance / container Dramatically speed-up big data on object stores on premise 高级使用场景 Enable big data on object stores across single or multiple clouds Orchestrate data frameworks on the public cloud Alluxio的核心创新数据伸缩性 Data Elasticity with a unified namespace Abstract data silos &a

10、mp; storage systems to independently scale data on-demand with compute Run Spark, Hive, Presto, ML workloads on your data located anywhere Accelerate big data workloads with transparent tiered local data 数据可访问性 Data Accessibility for popular APIs & API translation 数据本地性 Data Locality

11、 with Intelligent Multi-tiering 基于智能多层缓存实现数据本地性 Local performance from remote data using multi-tier storage 通过提供流行APIs和API转换实现数据可访问性 Convert from Client-side Interface to native Storage Interface 通过统一命名空间实现数据可伸缩性 Enables effective data management across different Under Store Uses M

12、ounting with Transparent Naming 统一命名空间（Unified Namespace） Transparent access to understorage makes all enterprise data available locally SUPPORTS HDFS NFS OpenStack Ceph Amazon S3 Azure Google Cloud IT OPS FRIENDLY Storage mounted into Alluxio by central IT Security in Alluxio mirrors source d

13、ata Authentication through LDAP/AD Wireline encryption HDFS #1 Object Store NFS HDFS #2 100+ Known Production Deployments ConsumerTravel & TransportationTelco & Media TechnologyFinancial ServicesRetail & EntertainmentData & Analytics Services Incredible Open Source Moment

14、um with growing community 1000+ contributors & growing 4278+ Git Stars Apache 2.0 Licensed Hundreds of thousands of downloads Github: Join the conversation on Slack alluxio.org/slack Finding high-fit use-cases Example First Projects Enterprise Storage & Big Data Teams Virt

15、ual Data Lakes Gradual transition to low cost storage Unify hybrid-cloud storage Machine Learning & Data Science Teams Accelerate training Improve productivity Compute Zone Standalone or managed with Mesos or Yarn Storage in Different Availability Zone Either on-prem or cloud TensorflowPresto HD

16、FS Spark Alluxio is installed with or near compute to unify data stores, stage remote data, and improve system performance. 19 Alluxio适用场景分析 Alluxio适用场景分析 20 21 Alluxio 2.x新特性介绍支持超大规模数据工作负载 l 支持超过10亿+个文件 2.0引入了分层元数据管理(tiered metadata management)这一新选项，以支持包含超过10亿个文件的单群集部署。我们现在默认使用

17、RocksDB进行堆外存储。热数据的元数据继续存储在堆内的进程内存中，而其余元数据由Alluxio在进程内存外进行管理。 alluxio.master.metastore可以配置为仅使用堆内存储。 l 高度分布式数据服务 2.0引入了Alluxio作业服务(Job Service)，这是一种分布式集群服务，可以实现复制、持久化、跨存储移动和分布式加载等数据操作，从而实现高性能和大规模扩展。支持超大规模数据工作负载 l 自适应副本以增强数据本地性该功能为Alluxio配置一定数量范围的自动管理的存储数据副本数。 alluxio.user.file.replication.m

18、ax和alluxio.user.file.replication.min可用于指定该范围。 l 内嵌式日志以达到高可用性 2.0设计了一种称为内嵌式日志(embedded journal)的面向文件/对象元数据的新容错和高可用模式。内嵌式日志使用RAFT共识算法，并且实现方面独立于任何其他外部存储系统。这对于抽象对象存储特别有用。 Alluxio 2.x新特性介绍支持超大规模数据工作负载 l 自适应副本以增强数据本地性 Alluxio Master Alluxio Worker Under Store Alluxio Worker Alluxio Worker Allux

19、io Worker Application Alluxio Client Block-1 Block-1 Application Alluxio Client Block-1 Application Alluxio Client Application Alluxio Client Block-1Block-1 SetReplicaMax(2) Alluxio 2.x新特性介绍支持超大规模数据工作负载 l 内嵌式日志以达到高可用性 Alluxio 1.x HA依赖ZK/HDFS组件 lAlluxio HA运行模式 Zookeeper: 负责选择leader master HDFS

20、: 负责存储日志文件，并在多个 masters直接共享 l存在的问题日志存储的选择受限依赖于第三方组件，服务的调试恢复都比较困难。 HDFS集群本身的不稳定，会使得 Alluxio集群维护成本变大 Standby Master Leading Master Standby Master Shared Storage write journal Hello, leader read journal Alluxio 2.x新特性介绍支持超大规模数据工作负载 l 内嵌式日志以达到高可用性 Alluxio 2.x去除了ZK/HDFS依赖在Alluxio三个

21、Master内部利用RAFT算法达成共识（ Consensus）状态只有Leading master提交状态变化， Standby masters保持同步优势可以采用本地磁盘存储日志（Master 节点间作副本）挑战性能调优 Standby Master Leading Master Standby Master Raft State Change State ChangeState Change Alluxio 2.x新特性介绍更好的存储抽象，实现完全独立和弹性的计算 l支持跨不同版本的支持跨不同版本的HDFSHDFS集群集群数据的爆炸式增长导致企业通

22、常会拥有许多数据仓库，包括采用跨不同版本的多个Hadoop集群。目前，跨这些集群的统一访问非常困难。使用Alluxio 2.0，用户可以使用Alluxio连接到多个多种版本的HDFS集群，并实现统一的数据访问。 l与与HadoopHadoop主动同步主动同步该新功能是与HDFS iNotify进行对接集成，可对存储在Hadoop中的文件所发生的任何数据和元数据更改进行更新，允许通过Alluxio访问数据的应用程序能够主动接收最新更新。 Alluxio 2.x新特性介绍 Alluxio 2.x新特性介绍对机器学习、数据查询等系统更强的支撑 l 支持在任意存储上运行机器学

23、习和深度学习工作负载机器学习和深度学习框架往往需要从Hadoop或对象存储中提取大规模数据，这通常是手动且非常耗时的过程。 Alluxio的FUSE功能支持POSIX兼容的API，因此通过Alluxio，TensorFlow、Caffe等框架以及其他基于Python的模型可以使用传统文件系统的访问方式直接访问任何存储系统中的数据。 l 与结构化数据管理与查询系统进行深度整合在Alluxio层面提供Catalog Service，提供了对结构化数据的抽象，添加Hive MetaStore到 Alluxio中就像挂载一个文件系统。 Alluxio感知文件和对象的数据存储结构和模式(sche

24、ma)，从而更好地提供服务，提供了 Alluxio Data Transformation服务，例如：自动将CSV格式的文件转成Parquet格式将很多小的表文件整合成大文件，减少查询耗时等 Alluxio Catalog Service (Target 2.1) Serve Metadata of Tables (like Hive Meta Store) Highly Efficient by using Apache Iceberg (e.g., no slow dir listing) Speed up query planning, independent

25、 of speeding up by caching files in Alluxio File System Alluxio Connector for Presto (Target 2.1) Presto connects to Alluxio directly without Hive Connector Enable push downs to Alluxio layer Direction: Structured Data Service Call for Community Contribution! Productionize Helm Chart K8S

26、 csi-driver/provisioner Alluxio K8S Operator Direction: Alluxio on Kubernetes Automatic & Transparent Caching (Target 2.1) Use Alluxio as a caching layer for Presto, Spark or Hive without modifying HMS AWS/GCP Integration Improve EMR bootstrap script Images on AWS / GCP marketplace Direction: File System and Cloud Integration 32 Alluxio：大数据统一存储原理与实践范斌顾荣/著出版社：电子工业出版社. 出版时间：2019年8月 ISBN: 978-7-121-36782-3. 字数：242千字国内首本大数据存储系统Alluxio书籍新出版的Alluxio中文书籍 33 欢迎加入Alluxio开源社区！ www.alluxio.org 扫描关注丰富的Alluxio中文技术材料与案例 34 35 顾荣

友情提示

1、下载报告失败解决办法
2、PDF文件下载后，可能会被浏览器默认打开，此种情况可以点击浏览器菜单，保存网页到桌面，就可以正常下载了。
3、本站不支持迅雷下载，请使用电脑自带的IE浏览器，或者360浏览器、谷歌浏览器下载即可。
4、本站报告下载后的文档和图纸-无水印,预览文档经过压缩，下载后原文更清晰。

本文（Talkingdata：Alluxio - 开源AI和大数据存储编排平台（36页）.pdf）为本站（科技新城）主动上传，三个皮匠报告文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若此文所含内容侵犯了您的版权或隐私，请立即通知三个皮匠报告文库（点击联系客服），我们立即给予删除！

温馨提示：如果因为网速或其他原因下载失败请重新下载，重复下载不扣分。

上海品茶

Talkingdata：Alluxio - 开源AI和大数据存储编排平台（36页）.pdf

Talkingdata：Alluxio - 开源AI和大数据存储编排平台（36页）.pdf