1、MaxCompute基于BigBench标准的最新测试进展阿里云技术专家路璐MaxCompute 2.0Disclaimer:the BigBench kit used by these performance tests is derived from TPCx-BigBench.BigBench On MaxCompute 2.0 敢为人先、引领潮流100TB首个100TB测试通过的引擎The first engine passed 100TB Bigbench verification7830 QPM首个达到7000分的引擎The first engine reaches 7000+首个
2、基于公共云服务的BenchmarkThe first BigBench verification run on public cloud预付费包1月价格Pre-paid for 1 month按需后付费价格Post paid by usage预付费包3年价格Pre-paid for 3yrs$12.3/QPM$2.1/QPM$371.9/QPMMaxCompute 2.0TPCx-BB简介业界领先的基于端到端的大数据分析领域应用级测试基准由英特尔领衔发起、主要开发和大力推广大数据工业应用特征:大数据工业应用特征:SLASLA,TCOTCO多种任务类型:多种任务类型:SQL,MapReduce,
3、MachineLearningSQL,MapReduce,MachineLearning,StreamingStreaming完整的软硬件性能评估标准:完整的软硬件性能评估标准:Scale,BBQpm,Price/BBQpm10TB1491.23 BBQpm$589 Price/BBQpm当前最大数据规模当前最佳性能当前最高性价比当前最佳成绩MaxCompute 2.0TPCx-BB作业特征分布Data SourcesNumber of QueriesStructured18Semi-Structured7Un-Structured5Analytic techniquesNumber of Q
4、ueriesStatics analysis6Data mining17Reporting8Query TypesNumber of QueriesPure sql13Machine learning5OpenNLP5Java MR3Streaming4QueryInput DataTypeProcessing ModelQueryInput DataTypeProcessing Model#1StructuredJava MR#16StructuredPure SQL#2Semi-StrunctredJava MR#17StructuredPure SQL#3Semi-StrunctredS
5、treaming#18UnStructuredOpenNLP#4Semi-StrunctredStreaming#19UnStructuredOpenNLP#5Semi-StrunctredMachine Learining#20StructuredMachine Learining#6StructuredPure SQL#21StructuredPure SQL#7StructuredPure SQL#22StructuredPure SQL#8Semi-StrunctredPure SQL#23StructuredPure SQL#9StructuredPure SQL#24Structu
6、redPure SQL#10UnStructuredOpenNLP#25StructuredMachine Learining#11UnStructuredOpenNLP#26StructuredMachine Learining#12Semi-StrunctredPure SQL#27UnStructuredOpenNLP#13StructuredPure SQL#28UnStructuredMachine Learining#14StructuredPure SQL#29StructuredStreaming#15StructuredJava MR#30Semi-StrunctredStr
7、eamingMaxCompute 2.0根据软硬件性能评估(SLA)通过BBQpmSF评估性能涵盖数据规模、load测试阶段、Power测试阶段、Throughput测试阶段Load测试阶段:测试数据导入Power测试阶段:单个查询流顺序执行30个查询语句Throughput测试阶段:多个并发查询流并发执行查询语句M:表示基准测试运行的Query数量(30个)SF(Scale Factor):表示基准测试数据规模大小,比如1000代表1TTPCx-BB性能评估标准根据软硬件性价比评估(TCO)通过$/BBQpmSF评估性价比C:被评估SUT的总价格MaxCompute 2.0BigBench
8、on MaxCompute软件栈BigBench on MaxCompute基于BigBench进行修改,兼容所有语义软件栈完整,覆盖BigBench所有功能点MaxCompute Hive兼容模式:完全兼容开源hive的所有数据类型和SQL语法Tunnel:MaxCompute数据导入导出工具,可以将不同平台数据导入到MaxCompute中PAI:阿里巴巴一站式的机器学习平台,与MaxCompute数据打通Query TypeHive on SparkMaxComputePure sqlHive SqlMaxCompute SQLMachine learningSpark MLlibPAIO
9、penNLPSQL+UDFSQL+UDFJava MRSQL+UDFSQL+UDFStreamingPython streamingSQL+UDFMaxCompute 2.0BigBench on MaxCompute结果分析MaxCompute海量数据处理能力,总数据量达到EB级基于阿里云自主研发的Apsara分布式操作系统,单集群机器规模达到万台Fuxi:分布式资源管理和调度系统,2015年GraySort纪录Pangu:大规模分布式文件系统,可以支持10亿+文件MaxCompute新一代执行引擎,从Compiler、Optimizer、Runtime等模块进行深度优化Range partition、AutoMapjoin、ShuffleRemove等优化方法与Intel全面合作,软硬结合深度优化,充分发挥挥至强至强可扩展处理器架构优势可扩展处理器架构优势第一个公共云服务Benchmark完整的TCO,包含软硬件和运维服务计价规则灵活,可以按需后付费、按月预付费、按年预付费等规模和性能优势数据规模达到100TB性能达到7830Qpm性价比达到$2.1/QpmMaxCompute 2.0开放一个月测试期,免费提供MaxCompute测试资源开源Test Kit并提供操作指南用户可以在MaxCompute平台上测试验证BigBenchMaxCompute 2.0