《超扩展设计用于处理大型扩展应用程序的新数据库_MichaelMonty+Widenius.pdf》由会员分享,可在线阅读,更多相关《超扩展设计用于处理大型扩展应用程序的新数据库_MichaelMonty+Widenius.pdf(29页珍藏版)》请在三个皮匠报告上搜索。
1、The state of databasesin the current landscapeJuly,2023Michael WideniusCreator of MySQL and MariaDB CTO MariaDBOverviewThe History of RDBMS and MariaDBBig Data is driven by Open SourceAI and databasesDatabases and the cloudWhat I have been working on latelyThe history of RDBMSWe Start HereHereNowWha
2、t made MySQL successful?When MySQL was started(1994)there were big players such as Microsoft,IBM,Oracle,all having a significant portion of the market.Internet was new and everyone needed a web-optimized DB However,the big players did not see the Internet as a viable business platform!MySQL was alre
3、ady proven stable before release(used for data warehousing and web)We created a“Virtual company”,which made it easy to find good people New“free”license scheme(this was before Open Source)Free for most,a few have to pay Second program(ghostscript was first)to use dual licensing.MySQL was first to do
4、 it with GPL.Very easy to install and use(15 minute rule)Released source and tested binaries for most platforms MySQL was a needed,stable and easy to use product with the right priceWhat made MySQL successful?MySQL started from the bottom-up,providing a cheap(or free)solution for the web industry.We
5、 were friendly and helpful towards community I personally wrote 30,000+emails the first 5 years to help people using MySQL MySQL then started growing in other industries(enterprise sector).MySQL followed an open source development model.1 customer in 1000 actually pays-Still enough to grow very fast
6、.The community provided testing,marketing and simple support needs.MySQL Ab provided full support for those that wanted/needed it.We(the MySQL founders)waited with investments until product was“good enough”Why MariaDB was created“Save the People,Save the Product”To keep the MySQL talent togetherTo e
7、nsure that a free version of MySQL always existsTo get one community developed and maintained branchWork with other MySQL forks/branches to share knowhow and codeAfter Oracle announced it wanting to buy Sun&MySQL this got to be even more important.How was MariaDB disruptive?MariaDB follows in the sa
8、me original footsteps of MySQL(which Oracle has not done):It is a true to Open Source project,following proper Free Software/Open Source practices.Development happens in the open,working together with the community.MariaDB FoundationMariaDB Foundation was created to ensure that MariaDB would always
9、be Open Source.This process made MariaDB stand out.MariaDB has been integrated into most major Linux and other free OS distributions as the default“MySQL”variant.MariaDB EcosystemMariaDB FoundationWorks with the communityBuilds and tests binaries-Develops MariaDB buildbotDrives adoption-Works with O
10、S to ensure MariaDB is included everywhereWorks with community developers-Reviews architecture and patches-Approves and pushes changesInsures that MariaDB is always free.Founded trough sponsorships.MariaDB CorporationWorks with customersProvides paid support and subscriptions forMariaDB(Enterprise a
11、nd Community)Employs most of the MariaDB developers-Main driver of MariaDB developmentMariaDB Enterprise-Longer End-of-life-Stable features are backported to earlier versions to minimize needs for upgrades.Provides NRE(paid development of new MariaDB features).MariaDB future plansMontys viewWork wit
12、h customers to help them migrate from commercial closed source database to MariaDBThe MariaDB Oracle compatibility layer makes this easyImprove optimizer to handle very complex queriesThis is important when moving complex application to MariaDBMariaDB 11.0 has a new cost model that greatly improves
13、handling of complex queries.Work closer with SAS database providers to make MariaDB work better in their environmentMariaDB multi-tenancy feature is part of this collaboration.Big data is driven byOpen SourceBig data driven by Open Source MySQL/MariaDB SQL database with flexible replication,ColumnSt
14、ore and Spider for scale out Apache Cassandra Tunable consistency Keys map to multiple values,which are grouped into column families CQL language MongoDB Stores structured data as JSON like documents CouchDB MVCC,ACID,eventually consistent Hbase(Hadoop+distributed file system)Written in Java.Compres
15、sion,in-memory operation and Bloom filters(Is data part of a set).HAWQ,Pivotal HDB Hadoop+SQL Redis In memory database(+snapshots to disk)Optional durability Clickhouse Analytical column oriented databaseWhy use NoSQL Faster replication Fast and easy key/value access Data is often stored in memory N
16、ote that with similar memory resources you can usually keep SQL data in memory too.Can handle unstructured data In traditional SQL,one cant easily implement a web store Most NoSQL vendors are considering adding SQL support.Like Atlas SQL for MongoDB(not open source)Why NOT to use NoSQL Lot of data d
17、uplicationRelational database are designed to keep only one copy of a relation(like a customers data)Cant(easily)join/combine data No standardized language;Hard to move data and applications between systems Complete lock in to one system Normally few connections to computer languages or existing app
18、lications that need a database.Allows one to initially ignore solving the database layout problem as one can store data as such Initially easier to use,MUCH harder later No sustainable scalable business models for most open source NoSQL solutions(as most of them rely on the BSD or Apache licenses)Pr
19、oblems to overcome with BIG data Storage(not normally the big problem anymore)Memory(Very expensive)SSD (Expensive but slower)Hard disk(Inexpensive.Slow when doing random reads,fast on sequential reads.The tapes of tomorrow)Access patterns Some access patterns are good for big data(scanning in paral
20、lel)while others are hard Even simple joins can take days on petabyte tables Get it up when things fail Recovery;Can take days or weeks Get the caches warm(you need 90%hit rate with buffers!)Replication and hot standby Must have,but makes big data more expensiveDefinition of big data is changing Whe
21、n I worked with data warehousing(1986-1993),big data was(all credit card transactions for one Swedens oil companies):1M users,4M transactions/month(30 byte/transaction)10 years of data=1.3G Handled by a Sun SparcStation,25Mz,32M memory.2G hard disk CPUs are now 700 times faster and have 10,000 times
22、 more memory.SSD/Hard disk seek speed is 6000 times faster than in 1986 While machines have got faster,most data is still small(except for social and behavior data)and can easily be handled by one machine.Why big data solutions are hype Most companies will not have as much real data as Facebook,Twit
23、ter,Bilibili,TikTok,etc.that needs to be accessed at once Solutions that they have to use are not applicable for others.In the near future one can run what most companies think is big data on a few machines:Memory now costs about 2100$/T;Most can afford 200G of RAM.SSD are now 50$/T(Read:560-3100MB/
24、sec 98K IOPS,Write:520-2250MB/sec write.You can get a 100Tb SSD from Nimbus and a 960Tb from Dell.16T can easily get bought.MariaDB today can easily handle 0.5T of active data.It is only after 1T when one needs to consider analytical databases.There are very few users that need more than 0.5T of dat
25、a.The most common database size for big enterprises are 100G100G for their largest production databases.90%of queries processes less than 100 MB100 MB of data.See https:/ details If you can store your production databasestore your production database on your desktopdesktop,its not big datanot big da
26、ta!AI and DatabasesAI and databases Personally I dont see AI helping optimizing queries inside the database(data is changing constantly and the optimizer is already using statistics and histograms to make decisions).Optimizer also has to be very fast and handle a lot of concurrency with a limited am
27、ount of memory.AI can be used to optimize SQL queries for applications.AI can be used to translate natural language to SQL,which is useful for interactive queries but too slow to be used in production for applications.When it comes to programming,I would only trust AI to:Initially set things up for
28、a common problem(write a script that does easy explanation)Try to find simple mistakes/bugs in code like possible buffer overflows,missing arguments,things that can easily simplified etc.Translating things from one language to another.Do a simple change that can be repeated in many parts of the code
29、.Add a parameter of this type to this function and ensure that all callers are fixed.The resulting code has to be carefully reviewed!AI and databases I would never even try to use AI for doing changes in a complex project,like adding catalogs to MariaDB.There are no similar patterns in existing soft
30、ware that it could be trained on.Too many things in MariaDB are interconnecting in not obvious ways(if one does not know the code)If the AI would produce hard to understand code,I could not use AI to explain why it did things that way.All MariaDB developers I am working with are experts and can be p
31、ut on solving very complex problem in their domain.I dont see anyone being replaced by AI during their or their childrens lifetimes.Databases and the CloudDatabase scaling for SaaS providers SaaS providers want to optimize the number of customers they can host on their hardware.When offering managed
32、 services on MySQL/MariaDB,SaaS providers have the following options:Create separate VMs for each customer.Large overhead-costly A typical“idle”MariaDB Server requires 1GB of memory Best user experience Use shared instance Force user restrictions,only grant limited database access.cPanel has this mo
33、del,many other database management systems share it Very limited overhead Poor user experience,artificial limitations enforced on users Affected by“noisy neighbour”,hard to trackNew catalogs feature for MariaDB Catalogs provide the best of both worlds.Shared instance A catalog looks like a normal Ma
34、riaDB Server No user limitations Each user has full control over their catalog Root access on the catalog Catalogs still have the problem of noisy neighbour However,statistics are now collected for each catalog So its trivial to detect the problematic catalog Move the problematic user to a bigger ma
35、chine,increase costs Catalogs can also enforce quotas Each catalog have their own configuration file with their own limits Force users to stick to certain performance limitations.Sysadmins have control over all catalogs.Catalogs featureCatalogs enable hyper-scaling for SaaS providers.For basic users
36、,one can now host up-to 100 x more users on the same machines.In MariaDB,this feature is still under developmentIf you would like to steer the roadmap for this,now is the time!ToolingPerformance optimizationsSpecific functionalityCareer path for programmersWhat I have been working on latelyI am stil
37、l actively coding!Improving the MariaDB optimizer to handle complex queries for big databasesThis is based on input from MariaDB customers and usersI spent almost one year on redoing and improving the cost model of the MariaDB optimizer for MariaDB 11.0.This was what I was working on during my 10 da
38、ys of Covid quarantine during my last trip to China!Initial test shows that most of the recent queries that has caused problems with MariaDB 10.6 optimizer works very good in MariaDB 11.0!The last 6 months I have been working on the multi-tenancy catalogs for MariaDB.This was based on input from sev
39、eral SAS provider at the last 2 CloudFest conferences.Career path for programmers I created MySQL and MariaDB.After the initial setups(where I handled everything),I have always hired people to do management,customers and leading the company so that I can continue to focus on architecture,development
40、focus on architecture,development,and leading the MySQL/MariaDB developer teams.A very common question I get in China is what is the best career path for a developer?Should I become a manager or start doing something else?Good coders are hard to find!It takes a long time for a coder to reach their p
41、eak(8-10)years of experience.Good programmers are valuable and can produce quality code for LONG time(70 years)Dont waste a good coder by“promoting”him to a manager.A good coder is not necessarily a good manager.Promote them by giving them more responsibility:Architecture-design bigger and more comp
42、lex systems Help others to work on their code(but not manage!)My advice to managers:Do notDo not follow the Peter PrinciplePeter Principle to promote people until they reach their level of incompetenceincompetence!The expected salary raise/year in China is not suitable for long term programmers(35 year rule!).Needs to be adjusted somehow.Thank you