您的当前位置：上海品茶 > 报告分类 > PDF报告下载

报告预览

PostgreSQL 9.6最新特性及未来展望（56页）.pdf

编号：91196

PDF 56页 5.97MB 下载积分：VIP专享

下载报告请您先登录！

PostgreSQL 9.6最新特性及未来展望（56页）.pdf

1、2016年云栖大会PostgreSQL专场技术交流群PostgreSQL 9.6New advances inFull Text SearchOleg BartunovPostgres Professional3PostgreSQL CORELocale supportPostgreSQL extendability:GiST(KNN),GIN,SP-GiSTFull Text Search(FTS)NoSQL(hstore,jsonb)Indexed regexp searchCustom AM&Generic WALPluggable table engines(WIP)Extension

2、s:Intarray,Hstore,LtreeMajor contributors to PostgreSQLCo-founders of Postgres ProfessionalAlexander Korotkov,Teodor Sigaev,Oleg Bartunov4AgendaInitial design of Postgres and innovationsHistory of some particular innovative features of PostgresFull Text Search in 9.6New RUM index5Original design of

3、PostgresThe main design goals of the new system are to:1)provide better support for complex objects,2)provide user extendibility for data types,operators and access methods,3)provide facilities for active databases(i.e.,alerters and triggers)and inferencing including forward-and backward-chaining,4)

4、simplify the DBMS code for crash recovery,5)produce a design that can take advantage of optical disks,workstations composed of multiple tightly-coupled processors,and custom designed VLSI chips,and6)make as few changes as possible(preferably none)to the relational model.*Stonebraker M.,Rowe L.A.The

5、design of Postgres.ACM,1986.15.2.340-355.6Original design of PostgresThe main design goals of the new system are to:1)provide better support for complex objects,2)provide user extendibility for data types,operators and access methods,3)provide facilities for active databases(i.e.,alerters and trigge

6、rs)and inferencing including forward-and backward-chaining,4)simplify the DBMS code for crash recovery,5)produce a design that can take advantage of optical disks,workstations composed of multiple tightly-coupled processors,and custom designed VLSI chips,and6)make as few changes as possible(preferab

7、ly none)to the relational model.*Stonebraker M.,Rowe L.A.The design of Postgres.ACM,1986.15.2.340-355.7Extendability.Is like a SHOPPING MALL8Rent a place in the mall(vs.having your own shop)ProUse all common facilities of mallUse existing buyers base of the mallConcentrate on your own contentConsHav

8、e to pay the rent9Writing extension to DBMS(vs.writing your own specific DBMS)ProUse all common features of DBMS:concurrency,recovery,transactions etc.Use existing users base of the DBMSConcentrate on your domain specific logicConsHave to pay some overhead10Extendability need APIs11What can we exten

9、d in the DBMS?Data typesHow we can operate with this data types?(functions,operators,aggregates etc.)How we can search this data types?(indexes)What could be the source of data?(FDW)How could we store the data?(table engines)(not yet delivered to Postgres)12New types of indexesare especially hard im

10、plement because we need to deal with:concurrency(low-level locking etc.),packing data into pages,WAL-logging,This is a very hard task.Only DBMS core developer could solve it.Application developer cant.13The solution:add nested API14The solution:add nested APIIndex access method is the template which

11、 could be applied to particular data type using operator class(opclass).btree is template for different linear orderingsGiST is template forbalanced treesSP-GIST is template for non-balanced treesGIN is template for inverted indexes of composite objects BRIN is template for bounding aggregates per b

12、lock ranges 15Propagation of improvementsIf you upgrade your camera to another compatible which have higher resolution,this improvement will apply to all the compatible lenses.In PostgreSQL 9.4 GIN got 2 major improvements:posting list compression and fast scan.Opclasses received these improvements

13、automatically.16ExtendabilityProvides fast feature developingHstore(first version)several hoursFTS(tsearch2)1 week(NY holidays)KNN-GiST 1 weekjsonb_path_ops several hours in restaurantJsonb(prototype)2-3 monthsJsquery 2-3 monthsQuadtree 360 locPostgreSQL 9.4+Open-sourceRelational databaseExtendable

14、databaseStrong support of NoSQLFuture of JSONBDictionary compression for jsonbDuplicate keys storage in jsonb is the problem.Pluggable compression mechanism(extendability!).Could be applied to any data type.Each jsonb column have own dictionary of keys.Conversion on the fly.Will be released soon by

15、Postgres Professional.Dictionary compression for jsonbCustomers reviews datasetcustomer_reviews_jsonb 307 MBcustomer_reviews_jsonbc 123 MBcustomer_reviews_array 139 MBLess space than array version because of numerics compression!Subscription for jsonbNew query syntax:UPDATE js SET jskey=value WHERE

16、jsid=1;Generic mecanism,extendable for any data type instead of hack for arrays we currently have.On commitfest by Postgres Professionalhttps:/commitfest.postgresql.org/11/793/K-Nearest Neighbors SearchK-Nearest Neighbors Search Traditional search algorithms are not effective Index doesnt helps,sinc

17、e there is no predicate Full table scan-sort-limit Ad-hoc solutions are not effective Postgres innovation Use special index scan strategy to get k-tuples in right order Several orders of magnitude speedup!Use ORDER BY distance to express KNN in SQL KNN-GiST,KNN-Btree,KNN-SPGiSTK-Nearest Neighbors Se

18、arch1,000,000 randomly distributed pointsFind K-closest points to the point(0,0)Scan&SortSELECT*FROM qq ORDER BY point_distance(p,(0,0)ASC LIMIT 10;Limit(actual time=291.524.291.526 rows=10 loops=1)-Sort(actual time=291.523.291.523 rows=10 loops=1)Sort Key:(point_distance(p,(0,0):point)Sort Method:t

19、op-N heapsort Memory:26kB-Seq Scan on qq(actual time=0.011.166.091 rows=1000000 loops=1)Planning time:0.048 msExecution time:291.542 ms(7 rows)K-Nearest Neighbors Search1,000,000 randomly distributed pointsFind K-closest points to the point(0,0)KNN-GiST(GiST index for points)SELECT*FROM qq ORDER BY(

20、p (0,0)ASC LIMIT 10;Limit(actual time=0.046.0.058 rows=10 loops=1)-Index Scan using qq_p_s_idx on qq(actual time=0.046.0.058 rows=10 loops=1)Order By:(p (0,0):point)Planning time:0.052 msExecution time:0.081 ms(5 rows)KNN is 3500 times faster!K-Nearest Neighbors SearchKNN-BtreeFind 10 closest events

21、 to the Sputnik launch Union of two selects(btree index on date)select *,date 1957-10-04:date as dt from(select*from(select id,date,event from events where date=1957-10-04:date order by date asc limit 10)t2)t3 order by dt asc limit 10;Execution time:0.146 msK-Nearest Neighbors SearchKNN-BtreeFind 10

22、 closest events to the Sputnik launch Parallel Btree iindex-scans in two directionsselect id,date,event from events order by date 1957-10-04:date asclimit 10;Limit(actual time=0.030.0.039 rows=10 loops=1)-Index Scan using btree_date_idx on events(actual time=0.030.0.036 rows=10 loops=1)Order By:(dat

23、e 1957-10-04:date)Planning time:0.101 msExecution time:0.070 ms(5 rows)KNN is 2 times faster!Full Text SearchWhat is a Full Text Search?Full text search Find documents,which match a querySort them in some order(optionally)Typical SearchFind documents with all words from queryReturn them sorted by re

24、levanceWhy FTS in Databases?Feed database content to external search enginesThey are fast!BUTThey cant index all documents-could be totally virtualThey dont have access to attributes-no complex queriesThey have to be maintained headache for DBASometimes they need to be certifiedThey dont provide ins

25、tant search(need time to download new data and reindex)They dont provide consistency search results can be already deleted from databaseFTS in DatabasesFTS requirementsFull integration with database engineTransactionsConcurrent accessRecoveryOnline indexConfigurability(parser,dictionary.)Scalability

26、Traditional text search operators(TEXT op TEXT,op-,*,LIKE,ILIKE)No linguisticsupportWhat is a word?What to index?Word normalization?Stop-words (noise-words)No ranking-all documents are equally similar to querySlow,documents should be seq.scanned9.3+index support of*(pg_trgm)select*from man_lines whe

27、re man_line*(?:(?:p(?:ostgres(?:ql)?|g?sql)|sql)(?:(?:(?:mak|us)e|do|is);One of(postgresql,sql,postgres,pgsql,psql)space One of(do,is,use,make)FTS in PostgreSQLOpenFTS 2000,Pg as a storageGiST index 2000,thanks RamblerTsearch 2001,contrib:no rankingTsearch2 2003,contrib:configGIN 2006,thanks,JFG Net

28、worksFTS 2006,in-core,thanks,EnterpriseDBFTS(ms)2012,some patches committedRUM 2016,Postgres ProfessionalFTS in PostgreSQLtsvector data type for document optimized for search Sorted array of lexemsPositional informationStructural information(importance)tsquery textual data type for query with boolea

29、n operators&|!()Full text search operator:tsvector tsqueryOperators ,Sort(actual time=476.104.476.104 rows=3 loops=1)Sort Key:(ts_rank(text_vector,titl:tsquery)DESCSort Method:top-N heapsort Memory:25kBBuffers:shared hit=149804 read=87416-Bitmap Heap Scan on ti2(actual time=6.894.469.215 rows=47855

30、loops=1)Recheck Cond:(text_vector titl:tsquery)Heap Blocks:exact=4913Buffers:shared hit=149804 read=87416-Bitmap Index Scan on ti2_index(actual time=6.117.6.117 rows=47855 loops=1)Index Cond:(text_vector titl:tsquery)Buffers:shared hit=1 read=12Planning time:0.255 msExecution time:476.171 ms(15 rows

31、)HEAP IS SLOW470 ms!Some FTS problems:#2No phrase search“A&B”is equivalent to“B&AThere are only 92 posts with person Tom Good,but FTS finds 34039 postsCombination of FTS+regular expression works,but slow and can be used only for simple queries.Some FTS problems:#3Combine FTS with ordering by timesta

32、mp SELECT sent,subject from pglist WHERE fts to_tsquery(english,tom&lane)ORDER BY abs(sent 2000-01-01:timestamp)ASC LIMIT 5;Limit(actual time=545.560.545.560 rows=5 loops=1)-Sort(actual time=545.559.545.559 rows=5 loops=1)Sort Key:(CASE WHEN(sent-2000-01-01 00:00:00:timestamp without time zone)Bitma

33、p Heap Scan on pglist(actual time=87.545.507.897 rows=222813 loops=1)Recheck Cond:(fts tom&lane:tsquery)Heap Blocks:exact=105992-Bitmap Index Scan on pglist_gin_idx(actual time=57.932.57.932 rows=222813 loops=1)Index Cond:(fts tom&lane:tsquery)Planning time:0.376 msExecution time:545.744 mssent|subj

34、ect-+-1999-12-31 13:52:55|Re:HACKERS LIKE fixed(?)for non-ASCII collation orders2000-01-01 11:33:10|Re:HACKERS dubious improvement in new psql1999-12-31 10:42:53|Re:HACKERS LIKE fixed(?)for non-ASCII collation orders2000-01-01 13:49:11|Re:HACKERS dubious improvement in new psql1999-12-31 09:58:53|Re

35、:HACKERS LIKE fixed(?)for non-ASCII collation orders(5 rows)Time:568.357 msInverted Index in PostgreSQLENTRYTREEPosting listPosting treeNo positions in index!Inproving GINImprove GIN indexStore additional information in posting tree,for example,lexemes positions or timestampsUse this information to

36、order resultsImproving GIN9.6 opens Pandora boxCreate access methods as extension!Lets call it RUMCREATE INDEX.USING RUMUse positions to calculate rank and order resultsIntroduce distance operator tsvector tsquery CREATE INDEX ti2_rum_fts_idx ON ti2 USING rum(text_vector rum_tsvector_ops);SELECT doc

37、id,ts_rank(text_vector,to_tsquery(english,title)AS rankFROM ti2WHERE text_vector to_tsquery(english,title)ORDER BYtext_vector plainto_tsquery(english,title)LIMIT 3;QUERY PLAN-L Limit(actual time=54.676.54.735 rows=3 loops=1)Buffers:shared hit=355-Index Scan using ti2_rum_fts_idx on ti2(actual time=5

38、4.675.54.733 rows=3 loops=1)Index Cond:(text_vector titl:tsquery)Order By:(text_vector titl:tsquery)Buffers:shared hit=355Planning time:0.225 msExecution time:54.775 ms VS476 ms!(8 rows)CREATE INDEX.USING RUMTop-10(out of 222813)postings with Tom LaneGIN index 1374.772 ms SELECT subject,ts_rank(fts,

39、plainto_tsquery(english,tom lane)AS rank FROM pglist WHERE fts plainto_tsquery(english,tom lane)ORDER BY rank DESC LIMIT 10;QUERY PLAN-Limit(actual time=1374.277.1374.278 rows=10 loops=1)-Sort(actual time=1374.276.1374.276 rows=10 loops=1)Sort Key:(ts_rank(fts,tom&lane:tsquery)DESCSort Method:top-N

40、heapsort Memory:25kB-Bitmap Heap Scan on pglist(actual time=98.413.1330.994 rows=222813 loops=1)Recheck Cond:(fts tom&lane:tsquery)Heap Blocks:exact=105992-Bitmap Index Scan on pglist_gin_idx(actual time=65.712.65.712 rows=222813 loops=1)Index Cond:(fts tom&lane:tsquery)Planning time:0.287 msExecuti

41、on time:1374.772 ms(11 rows)CREATE INDEX.USING RUMTop-10(out of 222813)postings with Tom LaneRUM index 216 ms vs 1374 ms!create index pglist_rum_fts_idx on pglist using rum(fts rum_tsvector_ops);SELECT subject FROM pglist WHERE fts plainto_tsquery(tom lane)ORDER BY fts plainto_tsquery(tom lane)LIMIT

42、 10;QUERY PLAN-Limit(actual time=215.115.215.185 rows=10 loops=1)-Index Scan using pglist_rum_fts_idx on pglist(actual time=215.113.215.183 rows=10 loops=1)Index Cond:(fts plainto_tsquery(tom lane:text)Order By:(fts plainto_tsquery(tom lane:text)Planning time:0.264 msExecution time:215.833 ms(6 rows

43、)Phrase Search(8 years old!)Queries A&B:tsquery and B&A:tsquery produce the same result Phrase search-preserve order of words in a queryResults for queries A&B and B&A should be different!Introduce new FOLLOWED BY ()operator:Guarantee an order of operands Distance between operandsa b=a&b&(i,j:pos(b)

44、i pos(a)j=n)Phrase search-definitionFOLLOWED BY operator returns:falsetrue and array of positions of the right operand,which satisfy distance condition FOLLOWED BY operator requires positions A B=AB A B matches the word with two different forms(infinitives)TSQUERY phraseto_tsquery(CFG,TEXT)Stop word

45、s are taken into account.select phraseto_tsquery(PostgreSQL can be extended by the user in many ways);phraseto_tsquery-postgresql extend user mani way(1 row)Phrase search-propertiesPrecendence of tsquery operators-!&|Use parenthesis to control nesting in tsqueryselect a&b c:tsquery;tsquery-a&b csele

46、ct b c&a:tsquery;tsquery-b c&aselect b (c&a):tsquery;tsquery-b c&b aPhrase search-Examples 1.1 mln postings(postgres mailing lists)There is overhead of phrase operatortomlane tom&laneSeqScan :2.6s 2.2 s GIN :1.2s 0.48 s need recheckRUM:0.5s 0.48 s use positions to filter Phrase search with RUM index

47、 has negligible overhead!select count(*)from pglist where fts to_tsquery(english,tom lane);count-222777(1 row)Some FTS problems:#3Combine FTS with ordering by timestampStore timestamps in additional information in timestamp order!create index pglist_fts_ts_order_rum_idx on pglist using rum(fts rum_t

48、svector_timestamp_ops,sent)WITH(attach=sent,to=fts,order_by_attach=t);select sent,subject from pglistwhere fts to_tsquery(tom&lane)order by sent 2000-01-01:timestamp limit 5;-L Limit(actual time=84.866.84.870 rows=5 loops=1)-Index Scan using pglist_fts_ts_order_rum_idx on pglist(actual time=84.865.8

49、4.869 rows=5 loops=1)Index Cond:(fts to_tsquery(tom&lane:text)Order By:(sent 2000-01-01 00:00:00:timestamp without time zone)Planning time:0.162 msExecution time:85.602 ms vs 645 ms!(6 rows)Some FTS problems:#3Combine FTS with ordering by timestampStore timestamps in additional information in timest

50、amp order!select sent,subject from pglistwhere fts to_tsquery(tom&lane)and sent 2000-01-01:timestamp order by sent desc limit 5;explain analyze select sent,subject from pglistwhere fts to_tsquery(tom&lane)order by sent=|2000-01-01:timestamp limit 5;Speedup 1x,since tom lane is popular filter-select

51、sent,subject from pglistwhere fts to_tsquery(server&crashed)and sent 2000-01-01:timestamp order by sent desc limit 5;select sent,subject from pglistwhere fts to_tsquery(server&crashed)order by sent=|2000-01-01:timestamp limit 5;Speedup 10 xRUM TodoAllow multiple additional infoadd opclasses for arra

52、y(similarity and as additional info)and int/floatimprove ranking function to support TF/IDFImprove insert time(pending list?)Improve GENERIC WAL to support shiftAvailability:9.6+only:https:/ details about new FTS featureshttp:/www.sai.msu.su/megera/postgres/talks/pgopen-2016-rum.pdfTHANKS FOR YOUR ATTENTION感谢大家！

友情提示

1、下载报告失败解决办法
2、PDF文件下载后，可能会被浏览器默认打开，此种情况可以点击浏览器菜单，保存网页到桌面，就可以正常下载了。
3、本站不支持迅雷下载，请使用电脑自带的IE浏览器，或者360浏览器、谷歌浏览器下载即可。
4、本站报告下载后的文档和图纸-无水印,预览文档经过压缩，下载后原文更清晰。

本文（PostgreSQL 9.6最新特性及未来展望（56页）.pdf）为本站（云闲）主动上传，三个皮匠报告文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若此文所含内容侵犯了您的版权或隐私，请立即通知三个皮匠报告文库（点击联系客服），我们立即给予删除！

温馨提示：如果因为网速或其他原因下载失败请重新下载，重复下载不扣分。