《联合国:2023隐私增强技术指南(英文版)(132页).pdf》由会员分享,可在线阅读,更多相关《联合国:2023隐私增强技术指南(英文版)(132页).pdf(132页珍藏版)》请在三个皮匠报告上搜索。
1、THE UNITED NATIONS GUIDE ON PRIVACY-ENHANCING TECHNOLOGIES FOR OFFICIAL STATISTICS.2023THE PET GUIDETHE UNITED NATIONS GUIDE ON PRIVACY-ENHANCING TECHNOLOGIES FOR OFFICIAL STATISTICSTHE UN GUIDE ON PRIVACY-ENHANCING TECHNOLOGIES FOR OFFICIAL STATISTICS4MISSION STATEMENT OF UNITED NATIONS COMMITTEE O
2、F EXPERTS ON BIG DATA AND DATA SCIENCE FOR OFFICIAL STATISTICSThe Committee will provide a strategic vision,direction and coordination for a global programme on Big Data and Data Science for official statistics,including for indicators of the 2030 Agenda for Sustainable Development;the Committee wil
3、l promote practical use of Big Data sources,including cross-border data,while building on existing precedents and finding solutions for the many existing challenges,including:-Methodological issues,covering quality concerns and data analytics,-Legal and other issues in respect of access to data sour
4、ces,-Privacy issues,in particular those relevant to the use and reuse of data,data linking and re-identification,-Security,information technology issues and management of data,including advanced means of data dissemination,assessment of cloud computing and storage,and cost-benefit analysisThe Commit
5、tee will also promote capacity-building,training and sharing of experience and will foster communication and advocacy of the use of Big Data and Data Science for policy applications,especially for the monitoring of the 2030 Agenda for Sustainable Development;and,finally,the Committee will strive to
6、build public trust in the use of Big Data and Data Science for official statistics.THE UN GUIDE ON PRIVACY-ENHANCING TECHNOLOGIES FOR OFFICIAL STATISTICS5DISCLAIMERThe designations employed and the presentation of the material in the present publication do not imply the expression of any opinion wha
7、tsoever on the part of the United Nations concerning the legal status of any country or its authorities or the delimitations of its frontiers.The term“country”as used in this publication also refers,as appropriate,to territories or areas.The designations of country groups in the publication are inte
8、nded solely for statistical or analytical convenience and do not necessarily express a judgment about the stage reached by a particular country,territory or area in the development process.Mention of the names of firms and commercial products does not imply endorsement by the United Nations.The view
9、s expressed in this publication are those of the authors and do not necessarily reflect those of the United Nations or its senior management,or of the experts whose contributions are acknowledged.Copyright United Nations,2023All rights reserved.Suggested citation:United Nations,2023,United Nations G
10、uide on Privacy-Enhancing Technologies for Official Statistics,United Nations Committee of Experts on Big Data and Data Science for Official Statistics,New York.Website:https:/unstats.un.org/bigdataTHE UN GUIDE ON PRIVACY-ENHANCING TECHNOLOGIES FOR OFFICIAL STATISTICS6FOREWORD 09 ACKNOWLEDGEMENTS 11
11、ACRONYMS 14EXECUTIVE SUMMARY 16CHAPTER 1 INTRODUCTION TO PRIVACY-ENHANCING TECHNOLOGIES 181.1 Motivations for the use of Privacy-Enhancing Technologies 181.2 Challenges in using Privacy-Enhancing Technologies for Official Statistics 201.3 A New Approach to International Collaboration on PETs 24CHAPT
12、ER 2 METHODOLOGIES AND APPROACHES 282.1 Secure Multi-party Computation 282.2 Homomorphic Encryption 322.3 Differential Privacy 362.4 Synthetic Data 402.5 Distributed Learning 432.6 Zero Knowledge Proofs 482.7 Trusted Execution Environments and Secure Enclaves 512.8 Practical Considerations of PETs 5
13、3CHAPTER 3 CASE STUDIES 61CASE STUDY 1 Boston Womens Workforce Council:Measuring salary disparity using secure multi-party computation 66CASE STUDY 2 European Statistical System:Developing Trusted Smart Surveys 69CASE STUDY 3 Eurostat:Processing of longitudinal mobile network operator data 71CASE ST
14、UDY 4 Indonesia Ministry of Tourism:Confidentially sharing datasets between two mobile network operators via a trusted execution environment 73CASE STUDY 5 Italian National Institute of Statistics and Bank of Italy:Enriching data analysis using privacy-preserving record linkage 76CASE STUDY 6 Office
15、 for National Statistics:Trialling the use of synthetic data at the United Kingdoms national statistics institute 78CASE STUDY 7 Samsung SDS(Korea):Data aggregation system 80CASE STUDY 8 Statistics Canada:Measuring the coverage of a data source using a private set intersection 82CONTENTSTHE UN GUIDE
16、 ON PRIVACY-ENHANCING TECHNOLOGIES FOR OFFICIAL STATISTICS7CASE STUDY 9 Statistics Canada:Training a machine learning model for private text classification using leveled homomorphic encryption 84CASE STUDY 10 Statistics Canada:Trialling the use of synthetic data 86CASE STUDY 11 Statistics Korea:Deve
17、loping a privacy-preserving Statistical Data Hub Platform 88CASE STUDY 12 Statistics Netherlands:Developing privacy-preserving cardiovascular risk prediction models from distributed clinical and socioeconomic data 90CASE STUDY 13 Statistics Netherlands:Measuring effectiveness of an eHealth solution
18、using private set intersection 92CASE STUDY 14 Twitter and OpenMined:Enabling Third-party Audits and Research Reproducibility over Unreleased Digital Assets 95CASE STUDY 15 United Nations Economic Commission for Europe:Trialling approaches to privacy-preserving federated machine learning 98CASE STUD
19、Y 16 United Nations PET Lab:International Trade 100CASE STUDY 17 United States Census Bureau:Deploying a differentially private Disclosure Avoidance System for the 2020 US Census 103CASE STUDY 18 United States Department of Education:Analysing student financial aid data using privacy-preserving reco
20、rd linkage 105CHAPTER 4 STANDARDS 1094.1 Introduction 1094.2 Key Standards 1104.3 Related Standards 1144.4 Standards Under Development 117CHAPTER 5 LEGAL AND REGULATORY ISSUES 1225.1 Introduction 1225.2 The Legal and Regulatory Outlook 1235.3 Challenges and Risks when using PETs 1245.4 Opportunities
21、 and Affordances of PETs 1255.5 Challenges of Crossing Jurisdictions 1295.6 Advice to Regulators 129THE UN GUIDE ON PRIVACY-ENHANCING TECHNOLOGIES FOR OFFICIAL STATISTICS8GOVERNMENTS UNDERSTAND THE GREAT SOCIETAL AND ECONOMIC VALUE THAT CAN BE UNLEASHED BY A MORE WIDE-SPREAD USE OF DATA ON TOPICS LI
22、KE HEALTH,TAXES OR SOCIAL SECURITY.THIS PET GUIDE CAN PAVE THE WAY FOR A BETTER UNDERSTANDING OF AND GREATER CONFIDENCE IN USING PRIVACY-ENHANCING TECHNOLOGIES TO SAFELY UTILIZE SENSITIVE DATA.THE UN GUIDE ON PRIVACY-ENHANCING TECHNOLOGIES FOR OFFICIAL STATISTICS9FOREWORDRELEVANCE OF OFFICIAL STATIS
23、TICS/ACCESS TO SENSITIVE DATAIn recent years almost every government has been faced with very serious challenges,such as the global health pandemic,increasing occurrences of severe weather causing flooding,drought or fires,environmental degradation,supply chain disruption,increasing numbers of refug
24、ees and migrants,rising energy and food prices,and economic stagnation.To handle these crises in the right way,our leaders need the right data at the right time.National statistical offices(NSOs),and other institutes of the national statistical system,are called upon by their governments to provide
25、these trusted,relevant,timely and high-quality data,which support evidence-based decision making.In many cases,NSOs themselves collect sensitive data on persons and businesses through surveys and censuses,such as data from a population census or from household or business surveys.However,to act swif
26、tly on emerging issues,NSOs are almost always obliged to supplement those data with additional secondary data sources such as administrative data(for example,tax records,social security data,health records or customs administration records)or private sector data(for example,mobile phone records or t
27、ransactional credit card information).On the basis of a national Statistics Act,NSOs are often entrusted by society to have access to these kinds of sensitive data.In practice,however,the administrative authorities or private sector companies are very reluctant to“hand over”their raw data.Institutio
28、nal arrangements are complicated,and require additional legal approvals and guidance which may take a long time to finalize.The difficulties mount further when more partners are involved in the processing and analysis of the data.In addition,the national data protection authority may want to provide
29、 input in cases of data sharing,since they want to make sure that the privacy of persons and businesses is protected.THE COVID EXCEPTIONWhen COVID-19 hit as a global health pandemic in the early part of 2020,many governments wanted to limit the spread of the very contagious virus and therefore invok
30、ed measures to limit the mobility of people.To monitor if these measures were successful,mobile phone data proved to be very useful.With these data,it could be shown almost in real-time,if and where the population stayed mostly at home and in which part of the country movement was still happening.Ge
31、tting access to the mobile phone data was possible,but by no means easy.In countries such as Ghana and the Gambia,negotiations with the mobile phone companies regarding data access had already been ongoing for a few years,so in March 2020 agreements to access the mobile phone records could be signed
32、 very quickly.Access was still restricted and data would still remain on the premises of the company,but analyses could be done,which were fit-for-purpose.In other countries,telecom companies were much more reluctant to come to agreements,and often access was only given to highly aggregated data,whi
33、ch would not allow for fine-grain analyses.THE ROLE OF PETS IN OPENING A PATHWAY TO BETTER ACCESS TO DATAGovernments,companies and the public in general are worried that sensitive personal or business information could possibly be leaked and misused,if data were accessed by external partners,includi
34、ng NSOs.However,if data could be accessed without revealing any sensitive information and without possibilities of de-identification,would that take away the privacy concerns?Encryption has already been widely used in banking and internet data transfer,and has proven to be highly reliable.Could priv
35、acy-enhancing technologies(PETs)also be highly reliable and be used in a similar way for accessing,for example,health records,tax records or credit card data by NSOs?This guide will exactly deal with this issue:can PETs guarantee the safe sharing of data?THE UN GUIDE ON PRIVACY-ENHANCING TECHNOLOGIE
36、S FOR OFFICIAL STATISTICS10THE CONTEXT OF THE UNCEBD,THE PET TASK TEAM AND PUBLICATIONSAs the digital society emerged over the last 20 years or so,the global community of official statistics saw the increasing need to explore benefits and challenges of new data sources,new methods and new technologi
37、es.The United Nations Statistical Commission,which is the highest global governing body for official statistics,therefore established in 2014 the UN Committee of Experts on Big Data and Data Science for Official Statistics(UNCEBD).This committee explored benefits and challenges of the use of a varie
38、ty of Big Data sources and their application to various statistical domains.It became clear very soon that getting access to these data was one of the main challenges.At the beginning of 2018,UNCEBD created a task team to look into the possibilities of using privacy-enhancing technologies.The object
39、ives of this task team were to develop principles,policies and open standards for data sharing,taking full account of data privacy,confidentiality and security issues when designing methods and procedures for the collection,processing,storage and presentation of data.A first document on those issues
40、 was released in 2019.Since 2019,many data sharing projects using PETs have been carried out,showing a diversity in data sets,project objectives and the kind of PETs used.The PET task team prepared a second document,which is this guide on PETs.It contains several new techniques(synthetic data and di
41、stributed learning),new international collaboration initiatives(like the UN PET Lab),a review of standards and legal and regulatory aspects of the use of PETs,and especially the descriptions of 18 use cases.FORWARD LOOKINGThe current global crises need a coordinated international response,which dema
42、nds timely access to often sensitive data shared with multiple partners,of which some are in other countries.For understandable privacy concerns those partners cannot be given full access to all data.Going forward,we should develop smart ways in which to elicit the essential information from the ori
43、ginal data to arrive at the appropriate responses to recover from existing global crises.Application of PETs will help us in designing those smart methods.There are specific characteristics of persons,businesses or locations,which help us in formulating,driving and monitoring policies.We can make su
44、re through the use of PETs that we extract those characteristics without identifying individual persons,businesses or locations.THE UN GUIDE ON PRIVACY-ENHANCING TECHNOLOGIES FOR OFFICIAL STATISTICS11ACKNOWLEDGEMENTSCHAPTER 1 INTRODUCTION TO PRIVACY-ENHANCING TECHNOLOGIESContributors were Maxime Ago
45、stini(Sarus Technologies,France),Jack Fitzsimons(Oblivious AI,Ireland),Ronald Jansen(United Nations Statistics Division),Matja Jug(Statistics Netherlands),Saeid Molladavoudi(Statistics Canada),Monica Scannapieco(Italian National Institute of Statistics(ISTAT)CHAPTER 2 METHODOLOGIES AND APPROACHESCon
46、tributors were Will Abramson(Edinburgh Napier University,United Kingdom),Maxime Agostini(Sarus Technologies,France),David Archer(Galois,Inc,United States),Jack Fitzsimons(Oblivious AI,Ireland),Nicolas Grislain(Sarus Technologies,France),Hema Krishna Murty(OpenMined),Saeid Molladavoudi(Statistics Can
47、ada),Julian Padget(University of Bath,United Kingdom),Robert Pisarczyk(Oblivious AI,Ireland),Wade Shen(Actuate,United States),Andrew Trask(OpenMined,United States).CHAPTER 3 CASE STUDIESThe overall guidance to and coordination of chapter 3 was given by David Buckley(CDEI,United Kingdom)and Matja Jug
48、(Statistics Netherlands)CASE STUDY 1 BOSTON WOMENS WORKFORCE COUNCIL:MEASURING SALARY DISPARITY USING SECURE MULTI-PARTY COMPUTATIONSpecific contributors to case study 1 were Kinan Dak Albab(Brown University,United States),Azer Bestavros(Boston University,United States),Ran Canetti(Boston University
49、,United States),Rawane Issa(Boston University,United States),Frederick Jansen(Nth party,United States),Andrei Lapets(Nth party,United States),Lucy Qin(Brown University,United States),Shannon Roberts(UMass Amherst,United States),Mayank Varia(Boston University,United States),and Nikolaj Volgushev(Elas
50、tic,Germany).CASE STUDY 2 EUROPEAN STATISTICAL SYSTEM:DEVELOPING TRUSTED SMART SURVEYSSpecific contributors to case study 2 were Joeri van Etten(Statistics Netherlands),Sulaika Duijsings-Mahangi(Statistics Netherlands),Rob Warmerdam(Statistics Netherlands),Matja Jug(Statistics Netherlands),and Fabri
51、zio De Fausti(ISTAT,Italy).CASE STUDY 3 EUROSTAT:PROCESSING OF LONGITUDINAL MOBILE NETWORK OPERATOR DATASpecific contributors to case study 3 were Fabio Ricciato(Eurostat).and Baldur Kubo(Cybernetica,Estonia).CASE STUDY 4 INDONESIA MINISTRY OF TOURISM:CONFIDENTIALLY SHARING DATASETS BETWEEN TWO MOBI
52、LE NETWORK OPERATORS VIA A TRUSTED EXECUTION ENVIRONMENTSpecific contributors to case study 4 were Siim Esko(Positium,Estonia),Erki Saluveer(Positium,Estonia),Jaak Randmets(Cybernetica,Estonia),Angela Sahk(Cybernetica,Estonia)and Baldur Kubo(Cybernetica,Estonia),Addin Maulana,(Ministry of Tourism,In
53、donesia),Norman Sasono,(Ministry of Tourism,Indonesia).The United Nations Guide on Privacy-Enhancing Technologies for Official Statistics was prepared by the Task Team on Privacy-Enhancing Technologies of the United Nations Committee of Experts on Big Data and Data Science for Official Statistics.We
54、 would like to acknowledge the valuable contributions of many experts,who voluntarily dedicated time and effort in the preparation of this document.The overall guidance was given by the editorial board under leadership of Matja Jug(Statistics Netherlands).The editorial board further consisted of Jes
55、s Stahl(OpenMined),Jack Fitzsimons(Oblivious AI,Ireland),Robert Pisarczyk(Oblivious AI,Ireland),Julian Padget(University of Bath,United Kingdom),Ronald Jansen(United Nations Statistics Division),David Buckley(Centre for Data Ethics and Innovation(CDEI),United Kingdom),Editorial board was responsible
56、 for organizing work on chapters,drafting of the Foreword and the Executive Summary,and for reviewing all chapters.The Editorial Board wish to thank external experts Hema Krishna Murty(OpenMined)and Fabio Ricciato(Eurostat)for useful and insightful suggestions during the editorial process,Augusto Ce
57、sar Fadel(Brazilian Institute of Geography and Statistics)for compiling the list of acronyms and Adrian McLoughlin(Swerve,Ireland),who was responsible for style and formatting.THE UN GUIDE ON PRIVACY-ENHANCING TECHNOLOGIES FOR OFFICIAL STATISTICS12CASE STUDY 5 ITALIAN NATIONAL INSTITUTE OF STATISTIC
58、S AND BANK OF ITALY:ENRICHING DATA ANALYSIS USING PRIVACY-PRESERVING RECORD LINKAGESpecific contributors to case study 5 were Mauro Bruno(ISTAT,Italy),Massimo De Cubellis(ISTAT,Italy),Fabrizio De Fausti(ISTAT,Italy)and Monica Scannapieco(ISTAT,Italy).CASE STUDY 6 OFFICE FOR NATIONAL STATISTICS:TRIAL
59、LING THE USE OF SYNTHETIC DATA AT THE UNITED KINGDOMS NATIONAL STATISTICS INSTITUTESpecific contributors to case study 6 were Owen Daniel(Office for National Statistics,United Kingdom)and Ioannis Kaloskampis(Office for National Statistics,United Kingdom).CASE STUDY 7 SAMSUNG SDS(KOREA):DATA AGGREGAT
60、ION SYSTEMSpecific contributors to case study 7 were Jihoon Cho(Samsung SDS,Republic of Korea),Hyojin Yoon(Samsung SDS,Republic of Korea)and Kyoohyung Han(Samsung SDS,Republic of Korea).CASE STUDY 8 STATISTICS CANADA:MEASURING THE COVERAGE OF A DATA SOURCE USING A PRIVATE SET INTERSECTIONSpecific co
61、ntributors to case study 8 were Abel Dasylva(Statistics Canada)and Jean-Franois Beaumont(Statistics Canada).CASE STUDY 9 STATISTICS CANADA:TRAINING A MACHINE LEARNING MODEL FOR PRIVATE TEXT CLASSIFICATION USING LEVELED HOMOMORPHIC ENCRYPTIONSpecific contributors to case study 9 were Saeid Molladavou
62、di(Statistics Canada).Benjamin Santos(Statistics Canada)and Zachary Zanussi(Statistics Canada).CASE STUDY 10 STATISTICS CANADA:TRIALLING THE USE OF SYNTHETIC DATASpecific contributors to case study 10 were Hlose Gauvin(Statistics Canada),Claude Girard(Statistics Canada),Isabelle Michaud(Statistics C
63、anada),Kenza Sallier(Statistics Canada)and Steven Thomas(Statistics Canada).CASE STUDY 11 STATISTICS KOREA:DEVELOPING A PRIVACY-PRESERVING STATISTICAL DATA HUB PLATFORMSpecific contributors to case study 11 were Kyeongwon Choo(Statistics Korea),Keunkwan Ryu(Seoul National University,Republic of Kore
64、a),Jung Hee Cheon(Seoul National University&Cryptolab,Republic of Korea),and Jaebeom An(Seoul National University,Republic of Korea).CASE STUDY 12 STATISTICS NETHERLANDS:DEVELOPING PRIVACY-PRESERVING CARDIOVASCULAR RISK PREDICTION MODELS FROM DISTRIBUTED CLINICAL AND SOCIOECONOMIC DATASpecific contr
65、ibutors to case study 12 were Andre Dekker(Maastricht University,Netherlands),Inigo Bermejo(Maastricht University,Netherlands),Florian van Daalen(Maastricht University,Netherlands),Anke Bruninx(Maastricht University,Netherlands),Paul Grooten(Statistics Netherlands),Johan van der Valk(Statistics Neth
66、erlands),Bart Scheenstra(MUMC+,Netherlands)and Arnoud vant Hof(MUMC+,Netherlands).CASE STUDY 13 STATISTICS NETHERLANDS:MEASURING EFFECTIVENESS OF AN EHEALTH SOLUTION USING PRIVATE SET INTERSECTIONSpecific contributors to case study 13 were Tjerk Heijmens Visser(CZ,Netherlands),Martijn Antes(Zuyderla
67、nd,Netherlands),Martine van de Gaar(Linksight,Netherlands),Ralph Schreijen(Statistics Netherlands),and Sulaika Duijsings-Mahangi(Statistics Netherlands).CASE STUDY 14 TWITTER AND OPENMINED:ENABLING THIRD-PARTY AUDITS AND RESEARCH REPRODUCIBILITY OVER UNRELEASED DIGITAL ASSETSSpecific contributors to
68、 case study 14:the following members of OpenMined Laura Ayre,Jack Bandy,Irina Bejan,Tudor Cebere,Phil Culliton,Kien Dang,Kyoko Eng,Ronnie Falcon,Bennett Farkas,Stephen Gabriel,Baye Gaspard,Shubham Gupta,Madhava Jay,Ionesio Junior,Yemissi Kifouly,Osam Kyemenu-Sarsah,Teo Milea,Ishan Mishra,Curtis Mitc
69、hell,George Muraru,Ivoline Ngong,Thiago Porto,Mark Rode,Rasswanth S.,Jess Stahl,Kellye Trask,Andrew Trask and Gatha Varma,as well as the following staff of Twitter:Rumman Chowdhury,Vijaya Gadde,Aaron Gonzales,Kristian Lum,Nick Matheson,Nick Pickles,Tylea Richard,Jutta Williams and Patrick Woody.THE
70、UN GUIDE ON PRIVACY-ENHANCING TECHNOLOGIES FOR OFFICIAL STATISTICS13CASE STUDY 15 UNITED NATIONS ECONOMIC COMMISSION FOR EUROPE:TRIALLING APPROACHES TO PRIVACY-PRESERVING FEDERATED MACHINE LEARNINGSpecific contributors to case study 15 were Saeid Molladavoudi(Statistics Canada),Benjamin Santos(Stati
71、stics Canada),Zachary Zanussi(Statistics Canada),Massimo De Cubellis(ISTAT,Italy),Fabrizio De Fausti(ISTAT,Italy),Matja Jug(Statistics Netherlands),Joeri van Etten(Statistics Netherlands)and Alex Noyvirt(Office for National Statistics,United Kingdom).CASE STUDY 16 UNITED NATIONS PET LAB:INTERNATIONA
72、L TRADESpecific contributors to case study 16 were the following members of OpenMined:Laura Ayre,Jack Bandy,Irina Bejan,Tudor Cebere,Phil Culliton,Kien Dang,Kyoko Eng,Ronnie Falcon,Bennett Farkas,Stephen Gabriel,Baye Gaspard,Shubham Gupta,Madhava Jay,Ionesio Junior,Yemissi Kifouly,Osam Kyemenu-Sarsa
73、h,Teo Milea,Ishan Mishra,Curtis Mitchell,George Muraru,Ivoline Ngong,Thiago Porto,Mark Rode,Rasswanth S.,Jess Stahl,Kellye Trask,Andrew Trask and Gatha Varma,Francesco Amato(ISTAT,Italy),Mauro Bruno(ISTAT,Italy),Massimo De Cubellis(ISTAT,Italy),J.A.van Etten(Statistics Netherlands),Jack Fitzsimons(O
74、blivious AI,Ireland),Ronald Jansen(United Nations Statistics Division),Matja Jug(Statistics Netherlands),Luke Keller(Census Bureau,United States),Karoly Kovacs(United Nations Statistics Division),Clarence Lio(United Nations Statistics Division),Sean Lovell(United Nations Statistics Division),Katelyn
75、 Mccall Kiley(Census Bureau,United States),Saeid Molladavoudi(Statistics Canada),Alex Noyvirt(Office for National Statistics,United Kingdom),Benjamin Santos(Statistics Canada),Rob Warmerdam(Statistics Netherlands),Peter Zandbergen(Statistics Netherlands)and Zachary Zanussi(Statistics Canada).CASE ST
76、UDY 17 UNITED STATES CENSUS BUREAU:DEPLOYING A DIFFERENTIALLY PRIVATE DISCLOSURE AVOIDANCE SYSTEM FOR THE 2020 US CENSUSSpecific contributors to case study 17 were Amy OHara(Georgetown University,United States)and Wade Shen(Actuate,United States).CASE STUDY 18 UNITED STATES DEPARTMENT OF EDUCATION:A
77、NALYSING STUDENT FINANCIAL AID DATA USING PRIVACY-PRESERVING RECORD LINKAGESpecific contributors to case study 18 were David Archer(Galois,Inc.,United States),Amy OHara(Georgetown University,Massive Data Institute,United States),Rawane Issa(Galois,Inc.,United States)and Stephanie Straus(Georgetown U
78、niversity,Massive Data Institute,United States).CHAPTER 4 STANDARDSContributors:Julian Padget(University of Bath,United Kingdom),Wo Chang(National Institute of Standards and Technology,United States).We thank the various British Standards Institute committees that reviewed and commented on a draft o
79、f the material presented in this document.We acknowledge the IEEE Standards Association(IEEE SA)for permission to reproduce extracts of Project Authorization Request(PAR)documents.We also acknowledge that permission to reproduce extracts from ISO standards is granted by BSI Standards Limited(BSI).No
80、 other use of this material is permitted.We are grateful to all the experts participating in the various standards bodies that have contributed to the standards and standards in development cited in this document.CHAPTER 5 LEGAL AND REGULATORY ISSUESContributors:Sulaika Duijsings-Mahangi(Statistics
81、Netherlands),Kuan Hon(United Kingdom),Yoichiro Itakuri(Higari Sogoh Law Offices,Japan),Julian Padget(University of Bath,United Kingdom),Robert Pisarczyk(Oblivious AI,Ireland),Loretta Pugh(CMS,United Kingdom),Andrew Sellars(Boston University,United States),Mayank Varia(Boston University,United States
82、),Alexandra Wood(Harvard University,United States).On behalf of the United Nations Committee of Experts on Big Data and Data Science for Official Statistics,we would like to thank all those who have contributed in smaller and larger ways to this guide on privacy-enhancing technologies for official s
83、tatistics.THE UN GUIDE ON PRIVACY-ENHANCING TECHNOLOGIES FOR OFFICIAL STATISTICS14ACRONYMS2PCSecure Two-party ComputationABUEAAttribute-Based Unlinkable Entity AuthenticationAIArtificial IntelligenceAPIApplication Programming InterfaceAWSAmazon Web ServicesBFVBrakerski-Fan-Vercauteren(HE scheme)BGVB
84、rakerski-Gentry-Vaikuntanathan(HE scheme)BUBoston UniversityBWWCBoston Womens Workforce CouncilCACentral AuthorityCARRIERCoronary ARtery disease:Risk estimations and Interventions for prevention and EaRly detectionCARTClassification and Regression TreeCBSCentraal Bureau voor de Statistiek/Statistics
85、 NetherlandsCCPACalifornia Consumer Privacy ActCDEICentre for Data Ethics and InnovationCDRCall Data RecordCENEuropean Committee for StandardizationCENELECEuropean Committee for Electrotechnical StandardizationCKKSCheon,Kim,Kim and Song(HE algorithm)CPUCentral Processing UnitCRSCommon Reference Stri
86、ngCSVComma Separated ValuesDASDisclosure Avoidance SystemDPDifferential PrivacyDP-SGDDifferentially Private Stochastic Gradient DescentEDPBEuropean Data Protection BoardEDPSEuropean Data Protection SupervisorEEAEuropean Economic AreaEHREletronic Health RecordENISAEuropean Union Agency for Cybersecur
87、ityFHEFully Homomorphic EncryptionGANGenerative-Adversarial NetworkGDPRGeneral Data Protection RegulationGPSGlobal Positioning SystemGPUGraphics Processing UnitGSBPMGeneric Statistical Business Process ModelGUIGraphical User InterfaceHEHomomorphic EncryptionHIHardware IsolationHLG-MOSHigh-Level Grou
88、p for Modernization of Official StatisticsHTTPSHypertext Transfer Protocol SecureIECInternational Electrotechnical CommissionIMDBInternet Movie DatabaseIMSIInternational Mobile Subscriber IdentityIND-CPAindistinguishability Chosen Plaintext AttackIND-CCAindistinguishability Chosen Ciphertext AttackI
89、PPUNECE Input Privacy Preservation Techniques projectISIInternational Statistical InstituteISMSInformation Security Management SystemISOInternational Organization for StandardizationISTATItalian National Institute of StatisticsITInformation TechnologyJTCJoint Technical CommitteeLANLocal Area Network
90、LSSLinear Secret SharingLWELearning With ErrorsMETATwitters ML Ethics,Transparency,and Accountability teamMLMachine LearningTHE UN GUIDE ON PRIVACY-ENHANCING TECHNOLOGIES FOR OFFICIAL STATISTICS15MLPMulti-Layer PerceptronMNOMobile Network OperatorMOOCMassively Online Open CoursewareMRIMagnetic Reson
91、ance ImagingMSSManagement System StandardsNISTNational Institute of Standards and TechnologyNPSASNational Postsecondary Student Aid Study groupNSLDSNational Student Loan Data SystemNSONational Statistical OfficeNTTSNew Techniques and Technologies for StatisticsOECDOrganisation for Economic Co-operat
92、ion and DevelopmentONSOffice for National StatisticsPARProject Authorization RequestPDSIPrivate Data Sharing InterfacePETPrivacy Enhancing TechnologyPIAPrivacy Impact AssessmentPIIPersonally Identifiable InformationPIMSPrivacy Information Management SystemPoCProof of ConceptPPMLPrivacy Preserving Ma
93、chine LearningPROMPatient-Reported Outcome MeasuresPSIPrivate Set IntersectionPUMFPublic-Use Microdata FilePWIPreliminary Work ItemRLWERing Learning With ErrorsSASupervisory AuthoritySDCStatistical Disclosure ControlSFESecure Function EvaluationSGXSoftware Guard ExtensionssMPCSecure Multi-Party Comp
94、utationSNARGSuccinct Non-Interactive ArgumentSQLStructured Query LanguageSSNSocial Security ActTDATopDown AlgorithmTEETrusted Execution EnvironmentTFHEFast Fully Homomorphic Encryption over the TorusTLSTransport Layer SecurityTSSTrusted Smart SurveyTTPTrusted Third PartyUNUnited NationsUNCEBDUN Comm
95、ittee of Experts on Big Data and Data Science for Official StatisticsUNECEUnited Nations Economic Commission For EuropeVAEVariational Auto-EncodersVPNVirtual Private NetworkW3CWorld Wide Web ConsortiumWYSIWYGWhat-You-See-Is-What-You-Get(model)ZKZero KnowledgeZKPZero Knowledge ProofTHE UN GUIDE ON PR
96、IVACY-ENHANCING TECHNOLOGIES FOR OFFICIAL STATISTICS16This document presents methodologies and approaches to mitigate privacy risks when using sensitive or confidential data,which are collectively referred to as privacy-enhancing technologies(PETs).National Statistical Offices(NSOs)are entrusted wit
97、h data that has the potential to drive innovation and improve national services,research,and social benefit.Yet,there has been a rise in sustained cyber threats,complex networks of intermediaries motivated to procure sensitive data,and advances in methods to re-identify and link data to individuals
98、and across multiple data sources.Data breaches erode public trust and can have serious negative consequences for individuals,groups,and communities.This document focuses on PETs that protect data during analysis and dissemination of sensitive information so that the benefits of using data for offici
99、al statistics can be realized while minimizing privacy risks to those entrusting sensitive data to NSOs.This document explores current approaches to data protection(e.g.,data de-identification,input party computation,contractual controls and agreements)and their associated limitations.In order to fa
100、cilitate experimentation on pilot projects and effective collaboration on“real world”use cases,the UN Privacy-enhancing Technologies Task Team founded the UN PET Lab.aThe team identified three core components to accelerate the adoption of PETs within the NSO community:1.Experimentation(PET Lab):a se
101、ries of active proofs-of-concept and pilot projects focused on the evaluation of PETs for real-world use cases in the official statistics community.2.Outreach&Training:focus on sharing learnings and insights from the use of PETs with the wider statistical community through training,public events,and
102、 educational material3.Support Services:a mechanism to enable those utilizing PETs to engage with the committee and its collaborators for support and adviceThe goals and progress of these three pillars are discussed along with their respective plans for the future.Then,two broad categories of PETs(e
103、.g.,input privacy,output privacy)are introduced,including secure multiparty computation,homomorphic encryption,differential privacy,synthetic data,distributed learning,zero-knowledge proof,and trusted execution environments.Each section defines a problem that the respective technique can solve and o
104、ffers its overview and history.With NSO professionals in mind,the primary security considerations and cost of using each technology are presented along with an example use case taken from an NSO domain along with a discussion of practical considerations for choosing appropriate PETs.Detailed case st
105、udies are presented that comprise a diverse range of use cases across sectors,leverage combinations of PETs,and involve collaboration among parties(such as multiple NSOs working together,NSOs working with other government agencies,and NSOs working with private sector organizations).Fifteen of the ca
106、se studies describe implementations that are in the concept or pilot stage and three that have been deployed in production environments.This document provides an overview of standards-making activities and identifies several new standards relevant to the processing of datasets,including standards un
107、der development and some that are a product of the precautionary principle applied to standards-making for artificial intelligence(AI).There has been a significant increase in standards-related activity relevant to PETs and data in AI,and more specifically,machine learning(ML),since the Privacy Pres
108、erving Techniques Handbook.1 In the case of AI/ML,earlier approaches to standardization sought to draw upon practice and experience collected over a period of time to benefit from hindsight whereas the current driver is foresight with the goal to prevent potential harms(“known-knowns”and“known-unkno
109、wns”).Given the expansion of activity dealing with PETs and the context in which they may be applied,standards are presented in two parts.The first identifies essential standards with sections on encryption and security techniques.The second considers indirectly related EXECUTIVE SUMMARYa.https:/off
110、icialstatistics.org/petlab/1.Archer,David W.,Borja de Balle Pigem,Dan Bogdanov,Mark Craddock,Adria Gascon,Ronald Jansen,Matja Jug,Kim Laine,Robert McLellan,Olga Ohrimenko,Mariana Raykova,Andrew Trask and Simon Wardley(2023).UN Handbook on Privacy-Preserving Computation Techniques.doi:10.48550/arXiv.
111、2301.06167.THE UN GUIDE ON PRIVACY-ENHANCING TECHNOLOGIES FOR OFFICIAL STATISTICS17standards that could affect the environment-technical and organizational-in which PETs may be deployed,with subtopics on cloud computing,big data,governance,AI,and data quality.For those interested in the“bigger pictu
112、re”,there is an additional section on Related Standards.There is increasing awareness of PETs across governmental,commercial and private organizations.The security and privacy properties they offer clearly connect with the values that are increasingly being embedded in legislative and regulatory fra
113、meworks.However,because PETs are new and do not map cleanly onto existing laws and regulations,it can be problematic to determine whether they are acceptable to use in any specific scenario.Indeed,that very issue imposes a substantial barrier to the adoption of PETs.Therefore,the final chapter offer
114、s an introduction to some of the key issues and underscores the importance of timely integration of legal advice into NSO projects.THE UN GUIDE ON PRIVACY-ENHANCING TECHNOLOGIES FOR OFFICIAL STATISTICS181.INTRODUCTION TO PRIVACY-ENHANCING TECHNOLOGIES1.1 MOTIVATIONS FOR THE USE OF PRIVACY-ENHANCING
115、TECHNOLOGIESOfficial statistics are a trusted source of information for governments around the world to make informed and data-driven decisions.As such,the breadth of information is collected from a range of data sources such as household and business surveys,population,economic or agricultural cens
116、uses,a variety of administrative records or even private sector data.Those data sources are the inputs for the compilation of statistics and indicators on the economy,the environment and the society.In many ways,official statistics offer a snapshot of a countrys development and rate of progress.Natu
117、rally,the more fine-grained the level of input data,the more nuanced the official statistics can be.However,the collection,processing,and dissemination of often sensitive data need to protect the privacy of persons and businesses.Additionally,looking at National Statistical Offices(NSOs)as part of n
118、ational and international data ecosystems,NSOs could potentially share much more data if able to protect their privacy.This inevitable tradeoff is the focus of this document,or more concisely:how can we use technology to mitigate privacy risks and give provable privacy guarantees throughout the coll
119、ection,processing,analysis and distribution life-cycle of potentially sensitive information.DATA PRIVACYProtecting data from unauthorized access,processing or distribution is the simple goal of privacy-enhancing technologies.With such broad applicability to the day-to-day processes of official stati
120、stics,it is in the interest of every NSO to have an adequate level of understanding of privacy-enhancing technologies.To highlight this point,let us sketch a simple example from census and survey information.Often governments will try to understand the income levels,job positions,education,race,and
121、religion of their citizens,and the places where they live.This allows the government to monitor the growth or decline of social inequality and injustice in the country and take action accordingly.However,as a citizen of the country,you may have valid questions and doubt about the security and privac
122、y of the use and dissemination of the data.At first glance,one might assume that this hesitation may be from tax evaders or criminals,but in fact,quite the opposite could be true.Honest citizens may also have a range of fears,from revealing their personal financial positions to neighbors and acquain
123、tances to the fear of persecution due to their ethnicity or religious beliefs.They may even fear that the information collected may be used by private corporations to target marketing campaigns at them without their consent.NSOs that actively promote and utilize privacy-enhancing technologies have t
124、he opportunity to build greater trust with the public and hence unlock new opportunities associated with more accurate and complete data collection.KEEPING DATA PRIVATE&SECUREThere are many points at which the privacy and security of data used for official statistics may be compromised,from the poin
125、t at which data is collected,transmitted between parties,stored,processed,and ultimately shared with decision-makers and the public.To mitigate potential risks at each of these stages of the data life cycle,different tools are available to the NSOs.Some tools required may be very familiar,such as en
126、suring TLS channels(HTTPS)when transmitting data between entities or ensuring that data is encrypted when it is stored in databases or as flat files on a server.An experienced IT security officer will be able to give many such examples of when encryption,authentication,authorization,and validation c
127、an be used in order to make sure data is not inadvertently exposed to inappropriate parties.These are mature domains and are not the focus of this guide.Despite encryption during transit and at rest being mature,there are still many areas in which data is left insecure and without guarantees of how
128、it is used.THE UN GUIDE ON PRIVACY-ENHANCING TECHNOLOGIES FOR OFFICIAL STATISTICS19PRIVACY-ENHANCING TECHNOLOGIESPrivacy-enhancing technologies(PETs)are technologies designed to safely process and share sensitive data.As discussed in the next section,there are two broad categories of PETs,namely PET
129、S for input and for output privacy.Input privacy focuses on how one or multiple parties can process data in a manner that guarantees the data is not used outside of that strict context.Output privacy focuses on modifying the results of a computation such that the output data cannot be used to revers
130、e engineer the original inputs.By using these technologies intelligently,safe data life cycles can be constructed,enabling collaboration,trust and providing confidence to data subjects.THE FOCUS OF THE REPORT There are,of course,many aspects of the data life cycle that pertain to the management and
131、protection of personal or private information.However,this report focuses on the analysis and dissemination of sensitive information:How can we perform analysis and extract insights from data which should not be disseminated?How can we aggregate data between parties who may have conflicts of interes
132、t in sharing plaintext data with one another?How can we guarantee how data has been used?Equally important is what this report is not about.In some communities,privacy technology implies the means to track and map the usage of data to consent forms,cookie policies,and other legal restrictions.While
133、these topics remain important,the scope of this report excludes them as most of these problems can be addressed with traditional software development and do not require the advanced cryptographic and statistical constructs outlined in the following chapters.THE UN GUIDE ON PRIVACY-ENHANCING TECHNOLO
134、GIES FOR OFFICIAL STATISTICS201.2 CHALLENGES IN USING PRIVACY-ENHANCING TECHNOLOGIES FOR OFFICIAL STATISTICSPrivacy-Enhancing Technologies,also referred to as Privacy-Preserving Techniques or even privacy technology,encompass a broad range of technologies that endeavor to achieve the privacy goals s
135、et out in the previous section.In practice,this collection of technologies forms the intersection of two prominent fields:statistics and cryptography.Statisticians typically present the statistical methodologies and constructs that they would like to investigate within a set of privacy constraints,w
136、hile cryptographers endeavor to craft a set of protocols and mechanisms,which protect each party within the provided formal set of constraints.As can be imagined,this leads to an ongoing tug-of-war between the flexibility of statistical analysis which can be performed and the enforceability of priva
137、cy constraints.Indeed,this is one of the age-old trade-offs seen under the guise of data governance,how to balance data usability with security and compliance.While in practice there is a never-ending list of possible scenarios in which privacy and statistics interface,for the sake of simplicity and
138、 conciseness,we classify techniques into two broad categories:input and output privacy.INPUT PRIVACYInput privacy endeavors to allow two or more parties to submit data into a calculation without the other respective parties seeing data in clear.This is actually trickier to achieve in practice than i
139、t may first appear.An example of input privacy is the case where two or more NSOs wish to reconcile their cross-border trade statistics.For each pair of countries,import data compiled by one country can be compared with the export data of the partner country.Whereas neither country is allowed to sha
140、re transaction-level trade information,it may be possible to exchange useful information regarding,for example,the number of transactions per traded product,the number of transactions per border crossing or the number of transactions per mode of transport.If specific traded products show a large dis
141、crepancy,more targeted information could be shared,for example,on the number of transactions of that product per month,the aggregated trade value per month maybe broken down by border crossing.Some information could be shared on the number of companies involved in the trade of a specific product,on
142、the condition that a minimum number of companies(3 or more)trade in that product.Similarly,information could be made available on the average value per kilogram or the average value per unit of the product.What should not be revealed is the identity of a trading company,or the unit price of a produc
143、t traded by a specific company.Broadly speaking,there are three popular approaches to input privacy:1.Finding a trusted third party or using a trusted legal entity,such as a national court system,to enforce pre-agreed contractual terms of use.2.Using pure cryptographic-based approaches.3.Leveraging
144、trusted execution environments.The trusted third-party approach looks straightforward.In essence,the two parties,which want to share data,would find a trustworthy third party which would receive the sensitive data and perform the calculations as desired.However,this approach does not work for most N
145、SOs,since most NSOs are by law not allowed to share sensitive data with any third party.So,this approach will not be further discussed in this guide.The use of pure cryptographic protocols is growing in popularity.In chapter 2,we describe Secure Multi-Party Computation(sMPC)and Homomorphic Encryptio
146、n(HE)in detail.These approaches both use cryptographic primitives to perform calculations on sensitive input data through rounds of communication between the parties.Overall,these approaches offer theoretical guarantees at the protocol level,which may be desirable in some settings.These approaches c
147、an be computationally expensive;hence specific attention should be paid to this aspect when applying these PETs,especially when large datasets are involved in the protocols.Finally,Trusted Execution Environments(TEEs),namely secure enclaves,endeavor to mimic the behavior of a trusted third party by
148、attesting the functionality performed by hardware or by a cloud provider.These approaches THE UN GUIDE ON PRIVACY-ENHANCING TECHNOLOGIES FOR OFFICIAL STATISTICS21require the trust of the hardware/cloud provided but offer a more flexible functionality over pure cryptography-based approaches.TEEs are
149、also discussed in detail in chapter 2.OUTPUT PRIVACYOutput privacy is a concept,which is familiar to most official statisticians and is generally known as statistical disclosure control.1 Output privacy aims to conceal sensitive individual data from being identified or re-identified from the dissemi
150、nated output.There are many approaches to output privacy,as can be seen by the rich literature on statistical disclosure controls.However,where these approaches meet the strict formalism of cryptographic research is with the use of differential privacy which offers a concise definition of output pri
151、vacy that can be calculated for and combined over multiple diverse operations and disclosures.In chapter 2,section 2.3,we will discuss differential privacy in more detail.CHALLENGES FACED BY PRIVACY TECHNOLOGYPETs are not yet widely used.We will discuss three major categories of challenges,which cur
152、rently limit the use of PETs,namely collaboration,the pace of cryptography development and cost.COLLABORATIONThe first big challenge is that privacy technology is often required in domains where there are many stakeholders from multiple organizations.Despite the best of intentions,technology that re
153、quires cross-and inter-organizational collaboration can often take a long time to be scoped,built,and ultimately used in a production setting.Friction does not just arise through different norms and processes between organizations,but very often between the language used between stakeholders from di
154、fferent communities,such as the technology,compliance,or legal communities.THE PACE OF CRYPTOGRAPHY DEVELOPMENTPrivacy technology,at its core,is a form of data security.As such,it requires the scrutiny warranted by other areas of computer security and cryptography.This is a slow-moving field,similar
155、 to other critical science domains such as aviation and pharmaceuticals.Small mistakes or overlooked circumstances can have large consequences.In the case of privacy technology,guarantees are made to the data subjects,that their data are being handled securely and privately.This also tends to lead t
156、o only few standards being developed and used commonly,as is seen with Transport Layer Security(TLS)and some other security standards.The notorious slow pace of R&D in computer security is determined by the time needed for the core research being performed by academics,all the way through to the eng
157、ineers who carefully implement the software packages for use in production.COSTCost can be a major factor when it comes to newly popularized technologies.Like most things of economic value,as something becomes more widely used its costs typically reduce.Some privacy technologies are not widely adopt
158、ed yet and as a result,a lot of security design and analysis must be performed before they can be used in production.This can make the overhead of one-time technologies very expensive.The hope is that as these tools continue to grow in popularity,some of them will become more widely available,cheape
159、r to use,and easier to support.CLASSICAL DATA PROTECTION APPROACHESAlthough policy or statute often restricts sensitive data sharing among organizations,some sharing does take place,and attempts are made to assure the privacy of the shared data in various ways.Below,we explore current approaches and
160、 their shortcomings.DATA DE-IDENTIFICATIONInput Privacy and Output Privacy are often supposedly protected by de-identifying or anonymizing dataremoving portions of the data that might be used to link the remainder to specific individualsprior to sharing it.Unfortunately,de-identification can often b
161、e ineffective and insecure due to potential re-identification attacks.1.Hundepool et al.,Statistical Disclosure Control(2012).THE UN GUIDE ON PRIVACY-ENHANCING TECHNOLOGIES FOR OFFICIAL STATISTICS22A variety of techniques have been shown to be able to expose seemingly anonymized data from which pers
162、onally identifiable information or key attributes have been removed.They include linkage attacks that leverage joint information from external data sources or homogeneity attacks that exploit scarcity of data.In addition,de-identification will impair the usability of data,lowering the value of stati
163、stical results to decision-makers.De-identification can further be expensive because it is often human reviewers who must survey the data and make decisions about which attributes to remove.Finally,de-identification is often specific to the intended computations to be performed,and so must be re-don
164、e prior to each distinct use of data.CONFIDENTIALITYConfidentiality refers to the legal,ethical and practical obligations that bind the NSOs not to disclose any sensitive information.Statistical agencies currently use various statistical disclosure mitigation techniques to protect the confidentialit
165、y and output privacy of their data subjects,while disseminating information of analytical values to their users.There is a direct correlation between the analytical utility of statistical products and the disclosure risk pertaining to the data subjects.For instance,whereas disclosure risk and analyt
166、ical value are low for global summary statistics,they are both fairly high for multi-dimensional tables with micro-data.The main goal of statistical disclosure methods is to balance the trade-off between data utility and disclosure risks.Various types of statistical disclosure risks can be identifie
167、d,including identity,attribute,and inferential disclosures.2 Additionally,multiple factors can influence these disclosure risks ranging from the data sources,such as census or survey data,to the analytical outputs,such as micro-data,analytical tables or graphs.Depending on the re-identification risk
168、 measures,different disclosure control techniques may be applied,which can be classified into non-perturbative and perturbative approaches.For example,among methods that are appropriate for micro-data dissemination include non-perturbative techniques,such as recoding and sub-sampling,as well as pert
169、urbative methods,such as data shuffling and injecting random noise to the data.Please refer to section 2.3 for more details on noise injection methods,in particular differential privacy.Other existing approaches include coarsening,post-randomization and suppression methods depending on the data type
170、 and characteristics.3The common pitfall of all disclosure control methods is that they have a negative impact on the quality of the products.More explicitly,data suppression methods reduce the information provided to the external users and data perturbation methods modify the data before disseminat
171、ion,while retaining the information content as much as possible.Even when less information is accessible to the user,there is still some disclosure risk present.In addition to statistical disclosure techniques,NSOs use non-statistical or physical disclosure methods to evaluate and reduce the risks.T
172、hese approaches include imposing and regulating access control to the data and using secure settings,license agreements(see below)and safe practices to reduce disclosure risks.INPUT PARTY COMPUTATIONIn some cases,input parties may perform computation on behalf of result parties directly,and then pas
173、s the results to those parties without the need for distinct compute parties.For example,a telecom company(input party)could perform on-demand computations for a NSO or for a research institute and pass only the results of the computations to them.While this approach provides strong Input Privacy gu
174、arantees,it also requires substantial effort on behalf of input parties which need to be willing to invest significant computational resources and may lack the expertise to perform complex computations.Input parties may also not have the scalability of resources to support analysis on behalf of mult
175、iple result parties.In addition,this approach requires that result parties provide the methodology and details of the analyses to be performed to the input parties,which result parties may be unwilling to do.CONTRACTUAL CONTROLS AND AGREEMENTSThe most popular current approach to achieving privacy go
176、als is to rely on legal terms and accountabilities.Input parties may require that compute and result parties contractually agree to keep input data private 2.Hundepool et al.,Statistical Disclosure Control(2012).3.Gartner,The State of Privacy and Personal Data Protection,2020-2022(2020).THE UN GUIDE
177、 ON PRIVACY-ENHANCING TECHNOLOGIES FOR OFFICIAL STATISTICS23and to strictly control access to computation outputs.Such agreements are ineffective and unsafe,if not also supported by implementation solutions relying on PETs,although they do allow for attribution of blame and potentially for assignmen
178、t of financial responsibility.In many cases,the financial remedy is often of no use to individuals whose data are compromised and could be assigned to organizations who collected the data rather than to those individuals.In addition,contractual control is ineffective against insider threats or compr
179、omise of systems by a cyber-attack.A PROMISING HORIZONDespite slow adoption,the future of privacy technology has never looked brighter.Today,there is an influx of investment by both large technology giants and venture capitalists alike,funding new approaches and endeavors in this space.Chip manufact
180、urers continue to increase the effectiveness of secure enclaves,all major cloud providers now support trusted execution providers,and open-source frameworks are becoming extremely popular.A recent whitepaper by Lunar Venture amplified these points.4In parallel to this,commercial and non-commercial e
181、ntities alike are endeavoring to comply with the ever-increasing global regulation pertaining to privacy technology.By 2023,it is estimated that 65%of the worlds population will have their personally identifiable information protected by modern privacy regulations and laws.5 Further by 2024,this is
182、said to affect 80%of organizations worldwide.While these regulations to date do not strictly specify or recommend a specific technology to leverage,their enforcement leads to wider investigation and adoption of privacy technology by those looking to be best in class.4.Lawrence Lundy-Bryan,Privacy En
183、hancing Technologies.Part 2(2020).5.Scannapieco et al.,“Input Privacy”(2021).THE UN GUIDE ON PRIVACY-ENHANCING TECHNOLOGIES FOR OFFICIAL STATISTICS241.3 A NEW APPROACH TO INTERNATIONAL COLLABORATION ON PETSThe 2019 UN Handbook on Privacy-Preserving Computation Techniques6 mostly gave insights on the
184、 concepts of privacy technology.The current document also provides information on the methodologies and approaches of these techniques,but in addition elaborates on a large number of use cases,which are described in the chapter 3.The practical application of the PETs shows their value.Will we be abl
185、e to share sensitive data while protecting privacy?If we manage to do so,we could potentially create value for our societies out of sensitive data,such as health records,population census data,mobile phone records or tax records.In addition to describing use cases,which have been designed and conduc
186、ted by others,the members of the task team on PETs also wanted to collaborate on actual use cases themselves.For this purpose it created the UN PET Lab which is described in this section.The objectives of the UN PET Lab are experimentation on pilot projects,learning by doing,and offering support ser
187、vices to those who want to be early adopters of PETs.The UN PET Lab was officially launched on 26 January 2022 at the EXPO 2020 event in Dubai.UN PET LABOver the past months,the task team on PETs has increased its efforts across three core pillars to accelerate the adoption of PETs within the commun
188、ity of official statistics,namely through:1.Experimentation:Experimentation is advanced through a series of active proofs-of-concept and pilot projects,focused on the evaluation of PETs for real-world use cases in the official statistics community.2.Outreach&Training:Outreach and training are promot
189、ed by spreading shared learnings and insights from the use of PETs to the wider statistical community through training,public talks,and educational collateral.3.Support Services:Finally,support services are offered through a mechanism to enable those using or intending to use PETs to engage with the
190、 UN PET Lab and its collaborators for support and advice.The combination of people,processes,and systems in place to drive these three pillars are referred to as the United Nations Privacy-Enhancing Technologies Lab,or the UN PET Lab for short.In this section,we outline the goals and progress of the
191、se three pillars in more detail,along with their respective plans for the future.EXPERIMENTATION The mission of the first pillar of this international collaborative effort is to enable practitioners from the national and international official statistics communities to get hands-on with using PETs.T
192、here is a wide range of benefits that come from practically trialing a technology within the context of a known problem space,including:1.Proving Value:While it is easy to describe hypothetical value creation from privacy technology at a high level,it is important to dig into the nuances of potentia
193、l projects to understand and demonstrate the full value of these technologies in the context of real-world problems of current interest.Understanding all of the benefits involved helps the community to better evaluate the risk-to-reward calculations involved in kicking off fully-fledged projects.2.U
194、nderstanding practical challenges:Unfortunately,utilizing privacy technology in many scenarios brings unforeseen challenges,such as those presented in Chapter 2.8 on practical considerations of PETs.By facilitating the usage of privacy technology within safe experimental environments,participants an
195、d collaborators can better understand potential issues,risks,and considerations before committing to using such technologies in production.3.Engaging with stakeholders:There are many stakeholders involved in any domain of data 6.Archer et al.,UN Handbook on PPTs(2023)THE UN GUIDE ON PRIVACY-ENHANCIN
196、G TECHNOLOGIES FOR OFFICIAL STATISTICS25governance,each weighing in with a different perspective from technical feasibility to data security and legal considerations.By running experimental trials and projects within a safe environment,these stakeholders can express their concerns and views ahead of
197、 production usage.The learnings from these help us to mitigate such frictions where possible and ultimately reduce the number of unknown barriers to entry for production-level usage.4.Building a privacy technology literacy:Finally,and certainly not least,active usage of privacy builds a level of lit
198、eracy within the community.Those involved both,directly and indirectly,develop holistic knowledge about the technology space and associated issues.This development of skills and knowledge will help to grow the community and act as an asset to the wider international statistics community.In the above
199、 benefits,one caveat that is emphasized is that trials are performed in a safe and flexible environment.Two ways in which the PET Lab creates such an environment are by leveraging non-sensitive data initially and by bootstrapping with general-purpose infrastructure.The first point is important in or
200、der to reduce the red tape in kicking off work in the first place,as well as eradicating the risks associated with data leakage.One such example has been the use of COMTRADE7 data(see chapter 3)and other such publicly available datasets.These datasets are desirable as they often represent data that
201、is known at a more nuanced,and correspondingly more sensitive,level but are not currently used as such.The second point is important as it allows the group to spin-up ad hoc servers and infrastructure to accommodate various privacy-enhancing technology,especially those approaches which benefit from
202、a third semi-trusted party involved.Fortunately,the UN Global Platform8 for Official Statistics infrastructure is available for exactly such scenarios.These experiments are documented and reviewed by the PET Lab experts,building towards a shared repository of experience and use cases.OUTREACH&TRAINI
203、NGThe second focus of the collaboration is aimed at sharing knowledge more broadly within the global community of official statistics.This has been a long-time focus of the PETs task team and a motivation for this very document.However,the PET Lab formalizes these efforts in a structured fashion.The
204、re are four types of educational resources disseminated:1.Official Guides&Overviews:These are documents that give formal collateral to those interested in learning about PETs.This document is an example of such material.2.Talks&Presentations:This involves active efforts to speak to subgroups and com
205、munities within the international statistics community about PETs.The goal of this is to put the discipline on the radar of practitioners who may not be familiar with the space and who would benefit from insights and awareness.Over the past couple of years,the group has been involved in presenting a
206、t Eurostat Conference on New Techniques and Technologies for Statistics(NTTS),63rd ISI World Statistics Congress,and the Road to EXPO Dubai workshop series.3.Use Case Repository:This is an online resource that gives details of use cases in PETs within the context of official statistics from around t
207、he world.This repository,or wiki,is regularly updated and welcomes updates from any person or team who would like to contribute.4.Collaborating with Massively Online Open Courseware(MOOCs):Finally,in order to spread the knowledge to a broad audience,and in order to certify practitioners based on the
208、ir learnings,the PETs task team and PET Lab collaborate on MOOCs to disseminate widely the accumulated knowledge.These ongoing efforts have already brought great tangible results to the community.For example,the collaboration with OpenMinds MOOC on Foundations of Private Computation9 has led to over
209、 9,000 learners registering and participating in formalized training on privacy technology.This is one of the largest public disseminations of training on privacy technology at a global scale today.a.https:/comtrade.un.org/datab.https:/unstats.un.org/bigdata/un-global-platform.cshtmlc.https:/courses
210、.openmined.orgTHE UN GUIDE ON PRIVACY-ENHANCING TECHNOLOGIES FOR OFFICIAL STATISTICS267 Scannapieco et al.(2021)SUPPORT SERVICESLastly,the PET Lab has begun to open its doors for a free consultation to institutes seeking to utilize PETs.The idea here is to enable those who are actively engaged in th
211、e planning,development or deployment of privacy technology to have access to the team to ask questions,pose topics for debate and discussion,request collaboration,and other related activities.The first example is collaboration with the UNECE Input Privacy Preservation(IPP)Techniques project which ha
212、s collected and investigated a number of statistical use cases that require protection on the input side.The IPP team initially developed methodology and a template10 to document use cases and is working on practical experimentation on techniques such as Private Set Intersection and Private Machine
213、Learning.During the experimentation phase teams organized presentations and joint sessions and some IPP project tracks used the UN PET Lab for practical testing(see use cases in chapter 3).To expand this to the wider statistical community practitioners can fill in a web form to request for such coll
214、aboration.Once the form has been submitted,it is automatically filed and will be reviewed by the PET Lab at the next meeting,typically within one month of submission.The reviewers will confirm the appropriateness of the request for support and assign 1-2 members to take an initial call with the appl
215、icant team.From there,the appropriate personnel will be asked to be involved ad hoc,depending on how the team can best support the applicant and the availability of the members.In order to provide such support,there must be clear limitations to the scope of the request.Given that experts of the PET
216、Lab all contribute on a voluntary basis,the requests should not be highly time-consuming.Equally importantly,the group is unable to take on any of the project liabilities for the applicant party due to the nature of support that can be provided.Nevertheless,the expectation is that this support will
217、be helpful to the wider statistical community.FUTURE DIRECTIONSThe ultimate goal of these endeavors is to work toward the creation of a community of practitioners,in which members of the community who are actively using PETs can support one another,organize conferences and share knowledge and suppor
218、t at an international level.This model has been successful in the domain of data science,and it is believed that as the usage of PETs continues to increase,a self-supporting community becomes viable.THE UN GUIDE ON PRIVACY-ENHANCING TECHNOLOGIES FOR OFFICIAL STATISTICS27BIBLIOGRAPHYArcher,David W.,B
219、orja de Balle Pigem,Dan Bogdanov,Mark Craddock,Adria Gascon,Ronald Jansen,Matja Jug,Kim Laine,Robert McLellan,Olga Ohrimenko,Mariana Raykova,Andrew Trask and Simon Wardley(2023).UN Handbook on Privacy-Preserving Computation Techniques.arXiv publication:2023,originally published 2019.doi:10.48550/ARX
220、IV.2301.06167.Gartner(2020).The State of Privacy and Personal Data Protection,2020-2022.url:https:/ 2022-06-13.Hundepool,Anco,Josep Domingo-Ferrer,Luisa Franconi,Sarah Giessing,Eric Schulte Nordholt,Keith Spicer and Peter-Paul de Wolf(July 2012).Statistical Disclosure Control.ISBN:978-1-118-34821-5.
221、Wiley.Lawrence Lundy-Bryan(2020).Privacy Enhancing Technologies.Part 2-The Coming Age of Collaborative Computing.Lunar Ventures.url:https:/ 2022-06-13.Scannapieco,Monica,Fabrizio De Fausti,Massimo De Cubellis,Matja Jug,Saeid Molladavoudi and Dennis Ramondt(2021).“Input Privacy:Towards a Logical Fram
222、ework for Defining Official Statistics Scenarios”.In:63rd ISI World Statistics Congress.url:https:/www.isi-web.org/files/docs/papers-and-abstracts/56-day2-ips047-input-privacy-towards-a-logica.pdf.Accessed 2022-07-01.CHAPTER 1.INTRODUCTION TO PRIVACY-ENHANCING TECHNOLOGIESTHE UN GUIDE ON PRIVACY-ENH
223、ANCING TECHNOLOGIES FOR OFFICIAL STATISTICS282.METHODOLOGIES AND APPROACHESPROBLEM DEFINITIONSecure multi-party computation(also called sMPC)is a cryptographic technique that mitigates the problem of input privacy when two or more (mutually distrusting)parties wish to compute an agreed-on function o
224、n data that they(or possibly other parties)provide to that computation,but are unwilling to disclose to others.In other words,sMPC is a technology that allows computation over data while preventing any participant from learning anything about the data except what can be learned from the output of th
225、e computation.sMPC also mitigates the problem of code assurance when parties need to know what function is computed on their shared data.That is,sMPC assures(depending on the specific choice of protocol)that the function computed on the data is the same as that agreed on by the partiesEXAMPLE USE CA
226、SEsMPC has been applied to many use cases.1 An illustrative use case is that of sharing individually identifiable data among a group of several government agencies to compute statistics and make policy decisions based on those statistics.For example,a recent use case2 allowed five distinct agencies
227、in County Government in the USA to share their unique data and compute the answers to queries such as,“How many persons that were incarcerated during a certain period had previously taken advantage of publicly provided mental health services or public housing?”The data provided by each agency includ
228、ed personal identifiers(for example,Social Security numbers),along with personal data such as mental health visit records and criminal records.sMPC was used to allow queries to be answered while keeping the input data strictly confidential to each party that provided it.In another use case,the Itali
229、an National Institute of Statistics(ISTAT)and the Bank of Italy have run a private set intersection protocol with analytics to enrich their statistics using information from both organisations such as age,number of children from ISTAT and mortgage information from the Bank of Italy.a It enabled them
230、 to perform analytics on the joint subset of individuals,identified by a unique tax code without sharing directly any of this sensitive data.OVERVIEWsMPC computation is based on one of several technologies.The most common technology choices are circuit garbling and linear secret sharing.The former i
231、s typically used in the case of two parties,while the latter may be used for groups of two to many parties.In both technologies,parties first agree on a function to be computed,and express that function as a logic circuit.While many functions can be described as circuits,some cannot,so not all funct
232、ions are practically computable in sMPC.While a given set of sMPC primitives can technically be Turing complete,typical sMPC protocols are non-branching,fixed-length programs in order to be reasonably efficient.This behavioural property may be likened to the difference between a sequencer,which does
233、 not support data determined branch conditions and uses a fixed number of gates for processing,and a general purpose computer.Circuit garbling protocols typically involve two parties.After agreeing on the function to be computed,one party assumes the role of garbler,while the other assumes the role
234、of evaluator.The garbler takes the agreed-on circuit and creates one or more encryptions of the circuit.A circuit encryption defines a randomly chosen value to represent the nominal logic values on each wire in the circuit.In addition,circuit encryption encrypts the functions of the logic gates in t
235、he circuit.The garbler can then communicate the encrypted circuit to the evaluator,but does not communicate the encryption keys to the evaluator.Thus the evaluator can evaluate the circuit without knowing the actual values on the circuits wire signals.The garbler also sends encryptions of her input
236、2.1 SECURE MULTI-PARTY COMPUTATION1.Archer et al.,“Applications of SMPC”(2018).2.Hart et al.,Privacy-Preserved Data Sharing(2019).a See Case Study 5 in Chapter 3.THE UN GUIDE ON PRIVACY-ENHANCING TECHNOLOGIES FOR OFFICIAL STATISTICS29data,using the same keys,to the evaluator.Through an additional cr
237、yptographic protocol,the evaluator can work with the garbler to encrypt the evaluators inputs to the circuit,in such a way that the garbler learns nothing about those inputs.The evaluator then evaluates the encrypted circuit on the encrypted inputs,achieving an encrypted output.That output is return
238、ed to the garbler to be decrypted.There are several open source software libraries that implement garbled circuit technology.Some operate only on Boolean gates-gates that have only logical 0 or 1 as input and output-while some operate on arithmetic gates that may have many possible input and output
239、values.Linear secret sharing(LSS)protocols proceed by dividing each input from a party into secret shares that are themselves random,but when combined(for example,by addition)recover the original data.sMPC relies on dividing each data input item into two or more such shares,and distributing these to
240、 compute parties.The homomorphic properties of addition and multiplication allow for those parties to compute on the shares to attain shared results,which when combined produce the correct output of the computed function.To perform the shared computation required for sMPC,all participating compute p
241、arties follow a protocol:a set of instructions and intercommunications that when followed by those parties implements a distributed computer program.There are several open source software libraries that implement LSS technology.As with circuit garbling,these libraries may operate on Boolean values,a
242、rithmetic values,or both,including floating point values.It should be noted that all sMPC protocols use communication among the compute parties frequently.In fact,estimations of run-time for sMPC protocols can be quite accurate using communication cost as the only estimating factor(that is,ignoring
243、estimates of computation delay at compute parties entirely).Thus the complexity of computation is most easily seen in sMPC by its impact on network communication cost.Many modern applications of multiparty computation endeavour to leverage the benefits of more than one sMPC approach such as efficien
244、cy or functionality,thus switching between linear secret sharing and garbled circuits.An example of a popular open source library which performs this is the ABY framework by the Cryptography and Privacy Engineering Group at TU Darmstadt and the corresponding compiler for it,EZPC,by Microsoft Researc
245、h.Figure 2.1.1:An overview of some of the ways sMPC and related technologies can be leveraged to preserve privacy under different settings.Standard Multiparty ComputationFederated LearningOutsourced sMPCOutsourced sMPC With Key GeneratorTwo Party Computation With Key GeneratorHomomorphic EncryptionA
246、ctive ComputeActive Compute&Data OwnerData OwnerKey GeneratorTHE UN GUIDE ON PRIVACY-ENHANCING TECHNOLOGIES FOR OFFICIAL STATISTICS30HISTORYsMPC was first formally introduced as secure two-party computation(2PC)in 1982(for the so-called Millionaires Problem),and in more general form in 1986 by Andre
247、w Yao.3 4 The area is also referred to as Secure Function Evaluation(SFE).The two-party case was followed by a generalization to the multi-party case by Goldreich,Micali and Widgerson.5 The high reliance on both available network bandwidth and network latency between parties kept sMPC mainly a theor
248、etical curiosity until the mid 2000s when major protocol improvements led to the realisation that sMPC was not only possible,but could be performed for useful computations on an internet latency scale.sMPC can be now considered a practical solution to carefully selected real-life problems(especially
249、 ones that require mostly local operations on the shares with not much interactions among the parties).Distributed voting,private bidding and auctions,sharing of signature or decryption functions and private information retrieval are all applications that exhibit these properties.The first large-sca
250、le and practical application of multiparty computation(demonstrated on an actual auction problem)took place in Denmark in January 2008.6A characterisation of available commercial and Government sMPC solutions would be almost immediately out of date,as would cataloguing the plethora of academic sMPC
251、research tools.Instead,for the purposes of providing some practical illustrations of the technology,we point towards some well known open-source sMPC frameworks and use cases documented by private companies.As sMPC continues to grow in popularity,so too do the number of academic-developed open sourc
252、e frameworks,which are typically used for experimental implementations and testing.One of the more popular of those is the ABY framework from TU Darmstadt,b a framework that supports mixed primitive sMPC.Microsoft Research has also built a compiler for ABY call EZPC.c It is worth noting that while t
253、he compiler on top of ABY does not explicitly emphasize its experiential nature,any framework built on ABY will inherit its caveats for production environments.There is also a growing number of public domain complete sMPC systems.These are either general libraries,general purpose systems or systems
254、that solve a specific application problem.In each of these three categories,we note SCAPI(from Bar-Ilan University)d-an API over various sMPC primitives the SCALE-MAMBA(from KU Leuven)e -a complete sMPC system swanky(from Galois Inc.).f-a set of Rust libraries for secure sMPC with garbled circuit,ob
255、livious transfer,private set intersection protocol Motion(from TU Darmstadt,Aarhus University and the University of Hamburg)g-a mixed protocol sMPC framework JIFF(from Boston University)h-a library allowing users to build applications JavaScript on top of sMPC protocols CrypTen(from Facebook)i-secur
256、e training and inference of machine learning models using sMPCExamples of such systems in commercial settings include the Sharemind statistical analysis system by Cybernetica,and cryptographic key management systems from Sepior and Unbound Tech.Other companies offer design consultancies in specific
257、areas based on sMPC technology.For example,Partisia helps design market mechanisms based on sMPC on a bespoke basis and Oblivious deployed sMPC as part of the contact-tracing effort for COVID-19 in India.3.Yao,“Protocols for secure computations”(1982).4.Yao,“How to generate and exchange secrets”(198
258、6).5.Goldreich et al.,“A Completeness Theorem for Protocols”(2019).6.Bogetoft et al.,“Secure Multiparty Computation Goes Live”(2009).b https:/ https:/ https:/cyber.biu.ac.il/scapi/e https:/homes.esat.kuleuven.be/nsmart/SCALE/f https:/ https:/ https:/ https:/crypten.ai/THE UN GUIDE ON PRIVACY-ENHANCI
259、NG TECHNOLOGIES FOR OFFICIAL STATISTICS31SECURITY MODELBecause sMPC assumes the possibility of mutually distrusting parties,it also assumes a new class of adversary:one that controls one or more participants in the computation.Such an adversary might be an insider threat,or might be a Trojan or othe
260、r penetrative,long-lived attack from outside an organization.This new class of adversary is typically described in terms of several traits in the literature:degree of honesty,degree of mobility,and proportion of compromised compute parties.Honesty.In the semi-honest or honest-but-curious adversary m
261、odel,such control is limited to inspection of all data seen by the corrupted participants,as well as an unlimited knowledge about the computational program they jointly run.In the covert model,an adversary may extend that control to modifying or breaking the agreed-upon protocol,usually with the int
262、ent of learning more than can be learned from observation alone.However,in this model the adversary is motivated to keep its presence unobserved,limiting the actions it might take.In the malicious model,an adversary may also modify or break the agreed-upon protocol,but is not motivated to keep its p
263、resence hidden.As a result,a malicious adversary may take a broader range of actions than a covert adversary.When non-technical stakeholders consider encryption as a risk mitigator,they typically assume a covert or malicious security model.Thus the honesty model should ideally be clearly communicate
264、d to all stakeholders to confirm its suitability for purpose.Mobility.A stationary adversary model assumes that the adversary chooses a priori which participants to affect.Such a model might represent for example that one compute participant is compromised,but others are not.Stronger versions of thi
265、s adversary mobility trait allow for an adversary to move from participant to participant during the computation.At present,a real-world analog of such an adversary is hard to imagine.Proportion of compromised parties.sMPC adversary assumptions fall into one of two classes:honest majority,and dishon
266、est majority.Just as there are a variety of participant adversary models for sMPC,there are also diverse sMPC protocols that provide security arguments that protect against those adversaries.Security is typically argued by showing that a real execution of an sMPC protocol is indistinguishable from a
267、n idealized simulacrum where all compute parties send their private inputs to a trusted broker who computes the agreed-upon function and returns the output.The number of parties that can be compromised is highly protocol dependent,but as a general rule-of-thumb,the greater the proportion one wishes
268、to defend,the higher the protocol overheads.COSTS OF USING THE TECHNOLOGYsMPC technology performance depends heavily on the functions to be securely computed.A typical metric for sMPC performance is computational slowdown the ratio of the latency of computation in sMPC to the latency of the same com
269、putation done without sMPC security.For general computation such as the calculations needed to process typical relational database query operators,recent results show a slowdown up to 10,000 times.Within the research community,the performance of sMPC is often benchmarked via a number of metrics such
270、 as the numbers of rounds of communication,the volume of data communicated,and the complexity or latency of the computation involved.sMPC such as linear additive protocols or garbled circuits,are considered efficient when compared to homomorphic encryption.However,this is achieved due to a greater n
271、umber of parties being involved and,typically,a larger number of rounds of communication being required.While it remains tricky to give guidance on where sMPC might be performant and where it might not,we can offer some general guidelines.Computations that rely heavily on addition,such as summations
272、,are typically faster than general computation,while computations that rely on division or other more complex functions are typically much slower.sMPC is typically designed to operate on integers via Galois fields.This can be easily extended to fixed-point arithmetic,but floating point operations ar
273、e much less easily represented and can require orders of magnitude more resources.As such,they are typically avoided.Computations that rely on generative functions such as random number generation are also typically slow.In contrast to homomorphic encryption,a specific 2-PC technique discussed in th
274、e next chapter,which currently only supports polynomial functions,general sMPC offers a broader set of possible operations.THE UN GUIDE ON PRIVACY-ENHANCING TECHNOLOGIES FOR OFFICIAL STATISTICS32PROBLEM DEFINITIONHomomorphic encryption(HE)is a cryptographic technology that allows for direct computat
275、ion(addition and multiplication)on encrypted data.It enables a party that provides data to outsource a computation.That is,no party aside from the party providing the data learns anything about the data in homomorphic encryption computations.Furthermore,only the party providing the data has the key
276、with which to decrypt the output.Code assurance is not yet practical in homomorphic encryption,and code privacy is not possible in these technologies.Typically,the HE security model offers privacy to the input provider,but not from the algorithm provider.Under such a scenario,there may be an ahead o
277、f time agreement to apply a particular circuit function,but there is no mechanism to confirm that the agreed circuit was indeed applied.EXAMPLE USE CASEA commonly cited class of applications for homomorphic encryption is in the medical domain,where regulations enforce strict patient data privacy mea
278、sures,but hospitals and medical clinics may nevertheless want to enable third-party service providers to analyze,evaluate,or compute on their data without directly sharing such data.For example,a service provider may offer an image analysis service for detecting tumors in MRI scans.A predictive mode
279、l can be evaluated directly on homomorphically encrypted data,avoiding the issue of medical data leaking to the service provider.For data storage providers a potential application is in performing analytics on encrypted customer data.For example,a customer may want to store a large encrypted databas
280、e using a cloud storage service and not have to download the entire database for simple computational queries,as this creates unnecessary logistical challenges and potentially exposes the full dataset to a potentially low security computation environment.Instead,all possible aggregation of the data
281、should be performed in encrypted form directly by the cloud storage provider avoiding unnecessary exposure of the data to the clients machine.In a similar context,Statistics Canada has used homomorphic encryption to train a neural network to classify product descriptions from scanner data.j The data
282、 came from retailers whose brands name and prices of various products were sensitive.Using HE has increased the security and privacy levels while allowing the cloud provider to be the compute party.OVERVIEWHomomorphic encryption refers to a family of encryption schemes with a special algebraic struc
283、ture that allows computations to be performed directly on encrypted data.Homomorphic encryption offers post-quantum security,but can result in a high computational overhead and large expansion of data representation.Thus,ideal applications have a relatively small but critical encrypted computation c
284、omponent,include a persistent storage aspect,and are hard or impossible to implement using other methods.The most commonly used(fully)homomorphic encryption schemes at this time are the Brakerski-Gentry-Vaikuntanathan(BGV)7 and the Brakerski-Fan-Vercauteren(BFV)8,9 schemes.Both allow encrypted compu
285、tation on vectors of finite field elements.The trade-offs between the different schemes are complicated and can be difficult to understand even for experts in the field.For very large and very small computations the BGV scheme has a performance advantage over the BFV scheme,but in many other cases t
286、he difference is negligible with modern optimisation techniques.On the other hand,the BGV scheme is more complicated and has a steeper learning curve than the BFV scheme.Other schemes have been proposed,but some have been shown to be insecure.2.2 HOMOMORPHIC ENCRYPTIONj See Case Study 9 in Chapter 3
287、.7.Brakerski et al.,“Leveled Fully Homomorphic Encryption”(2014).8.Fan et al.,Somewhat Practical Fully Homomorphic Encryption(2012).9.Brakerski,“Fully Homomorphic Encryption”(2012).THE UN GUIDE ON PRIVACY-ENHANCING TECHNOLOGIES FOR OFFICIAL STATISTICS3310.Cheon et al.,“HE for Arithmetic of Approxima
288、te Numbers”(2017).11.B.Li et al.,“Security of HE on Approximate Numbers”(2021).12.Rivest et al.,“Digital Signatures and Public-Key Cryptosystems”(1978).13.Rivest et al.,“On data banks and privacy homomorphisms”(1978).14.Gentry,“A fully homomorphic encryption scheme”(2009).15.Halevi et al.,Design and
289、 implementation of HElib(2020).k https:/ https:/palisade-crypto.org/m https:/inpher.io/tfhe-library/n https:/ recently popular approach is the CKKS algorithm,10 implemented in many open-source frameworks.It provides approximate arithmetic on real or complex numbers.This is a promising direction of r
290、esearch,but with the caveat that recent attacks have been shown to be devastating to the regime unless mitigating nuances are correctly employed as part of the protocol.11 As always when it comes to cryptography,we should be extremely cautious to deploy relatively new mechanisms without thorough inv
291、estigation and assessment.While in principle fully homomorphic encryption schemes allow arbitrary computation on encrypted data,in practice almost all efficient implementations use a so-called levelled mode where the encryption scheme is configured to support computations of only a specific or bound
292、ed size,typically resulting in significant performance improvements.For simplicity,in this handbook we freely use the term Homomorphic Encryption(HE)to refer to either Fully Homomorphic Encryption(FHE)or Levelled Fully Homomorphic Encryption.HISTORYEncryption schemes that support one single type of
293、arithmetic operation(addition or multiplication)have been known since the 1970s12 and are often said to be singly or partially homomorphic.The practical value of such a“homomorphic property”was first recognised and explored by Rivest,Adleman,and Dertouzos.13 In 2009 Craig Gentry described the first
294、so-called fully homomorphic encryption scheme14 that allows both additions and multiplications to be performed on encrypted data.This was a significant invention,because in principle such an encryption scheme can allow arbitrary Boolean and arithmetic circuits to be computed on encrypted data withou
295、t revealing the input data or the result to the party that performs the computation.Instead,the result would be decryptable only by a specific party that has access to the secret key typically the owner of the input data.This functionality makes homomorphic encryption a powerful tool for cryptograph
296、ically secure cloud storage and computation services and also a building block for higher-level cryptographic primitives and protocols that rely on such functionality.While theoretically powerful and academically interesting,the first homomorphic encryption schemes quickly turned out to be unusable
297、in terms of performance and key size.A significant amount of work was done over the next few years in inventing and implementing both simpler and faster homomorphic encryption schemes.This work culminated in the release of the homomorphic encryption library HElib15 by IBM Research,which improved the
298、 performance over prior homomorphic encryption implementations by several orders of magnitude.Today there are multiple open source homomorphic encryption libraries available implementing a variety of homomorphic encryption schemes suitable for different applications.These include Microsoft SEALk-imp
299、lementing both BFV and CKKS schemes.For the latter,Microsoft has also released a Python compiler that takes charge of choosing appropriate encryption parameters,rescaling and relinearization operations.PALISADEl-supporting a range of different schemes and variants thereof including not BFV,BGV,CKKS,
300、Levelled Somewhat HE and others TFHE(from Inpher)m-a implementation of TFHE-Fast Fully Homomorphic Encryption over the Torus Concrete(from Zama.ai)n-implementing a variant of TFHETHE UN GUIDE ON PRIVACY-ENHANCING TECHNOLOGIES FOR OFFICIAL STATISTICS34SECURITY MODELToday,all homomorphic encryption sc
301、hemes with close to practical performance are based on the Learning With Errors16(LWE),or Ring Learning With Errors17(RLWE)problems.In other words,one can show that if these homomorphic encryption primitives can be efficiently broken,then either LWE or RLWE can be efficiently solved for specific par
302、ameterisations.As LWE and RLWE have been studied extensively and are believed to be very hard to solve,there is strong reason to believe that the corresponding homomorphic encryption schemes are secure.As homomorphic encryption refers only to a type of encryption primitive and not a protocol,its sec
303、urity definition states merely that,given a ciphertext,an adversary without the secret key cannot obtain any information about the underlying plaintext.However,for secure uses of homomorphic encryption it is critical that no information about decrypted data is ever communicated back to the source of
304、 the corresponding encrypted data,unless that source is trusted not to misbehave;this includes seemingly innocuous information,such as a request to repeat a protocol execution,refusing to pay for a service,or revealing any change in behavior that can be expected to depend on the outcome of the encry
305、pted computation.As a result,outsourced storage and computation involving a single data owner should be considered as the primary use-case of homomorphic encryption.After receiving the result,the secret key owner must not perform any action that is observable to the service provider based on the dec
306、rypted result to avoid the attacks described above.In technical terms it means that HE is typically proven to be secure under the indistinguishability Chosen Plaintext Attack(IND-CPA)model and not indistinguishability Chosen Ciphertext Attack(IND-CCA1)or IND-CCA2.What this means in non-technical ter
307、ms is that HE does not give security guarantees if the adversary gets hold of decryptions of selected cipher texts.The aforementioned attack on CKKS scheme has exploited this and shown that IND-CPA security in this approximate scheme is not sufficient in practical scenarios.An adversary with access
308、to a decryption of a cipher text can recover the clients private key.Another subtlety is that most homomorphic encryption schemes do not provide input privacy for more than a single party,since there is only one secret(decryption)key:if a computation depends on the private encrypted input of two or
309、more parties,the encryption scheme is not guaranteed to protect these inputs from the owner of the secret key.Homomorphic encryption is also malleable by nature,so anyone intercepting a ciphertext can modify the underlying plaintext unless,for example,the ciphertext is cryptographically signed by th
310、e sender.It is important to understand that homomorphic encryption is a low level cryptographic primitive and building secure protocols from it is not possible without the help of a cryptography expert.Even in the simplest cases such protocols can result in unexpected or unintended security gaps.Mos
311、t homomorphic encryption based protocols can be proved to be secure only in a semi-honest security model,although there are exceptions where a stronger security model is achieved by combining homomorphic encryption with other primitives.18COSTS OF USING THE TECHNOLOGYThe use of homomorphic encryptio
312、n comes with at least three types of costs:message expansion,computational cost,and engineering cost.In HE systems,encrypted data is typically significantly larger than unencrypted data due to encoding inefficiency(converting real data into plaintext elements that can be encrypted)and inherent expan
313、sion from the encryption scheme(ratio of ciphertext size to plaintext size).Depending on the use-case,encoding inefficiency can range from the ideal case(no expansion at all)to an expansion rate measured in the tens or hundreds of thousands when the encoding method is poorly chosen.Thus in most case
314、s,one should not think of encrypting large amounts of data with homomorphic encryption,but instead carefully consider what data exactly will be 16.Regev,“On Lattices and Cryptography”(2009).17.Lyubashevsky et al.,“On Ideal Lattices over Rings”(2013).18.H.Chen et al.,“Labeled Private Set Intersection
315、”(2018).THE UN GUIDE ON PRIVACY-ENHANCING TECHNOLOGIES FOR OFFICIAL STATISTICS35needed for the desired encrypted computations and encrypt only that.The computational cost of homomorphic encryption is significant compared to unencrypted computation.The exact cost depends strongly on the parameterisat
316、ion of the encryption scheme and whether throughput or latency is measured.Namely,most homomorphic encryption schemes support natively high-dimensional vectorized computations on encrypted data,and if this vectorisation can be fully utilised it can increase the throughput by a factor of up to 1,000
317、or so.Developing complex systems with homomorphic encryption can be challenging and should always be done with the help of an expert,making the initial cost for such solutions potentially high.There are two reasons for this:the security model as discussed earlier can be hard to comprehend and evalua
318、te without special expertise,and the available homomorphic encryption libraries can be hard to use to their full potential without a deep understanding of how they work.It should also be noted that homomorphic encryption can be hard or impossible to integrate with existing systems.Instead,sophistica
319、ted applications of this technology can require substantial changes in existing data pipelines,data manipulation procedures and algorithms,and data access policies.THE UN GUIDE ON PRIVACY-ENHANCING TECHNOLOGIES FOR OFFICIAL STATISTICS36PROBLEM DEFINITIONDifferential privacy(DP)provides an informatio
320、n-theo-retic notion of Output Privacy.Its goal is to quantify the maximum amount of information about individual records in a database that could be leaked by releasing the result of any computation on that database.Keeping this amount small ensures that the individuals are protected irrespective of
321、 any side knowledge or post processing by an attacker.DP provides a more general notion of privacy as it covers any type of information derived from a database,contrary to other specialized definitions such as k-anonymity19 or l-diversity20 which only apply to the release of aggregates.Furthermore,D
322、P was designed to avoid pitfalls that previous attempts to define privacy loss incurred,especially in the context of either multiple releases or when adversaries have access to side knowledge.We note that such pitfalls also affect less sophisticated attempts at privacy preservation,such as aggregati
323、on alone or ad hoc noise addition to aggregate results.EXAMPLE USE CASESDifferential privacy is just over 15 years old as of this report and is being implemented in more and more industrial applications in database analysis,statistics,and machine learning.In recent years,some generic DP systems have
324、 been open sourced or made commercially available providing the first production-ready implementations.The interest generated by the solid principles behind DP and the growing concerns about online privacy have led to a number of real-world deployments,typically using ad-hoc algorithms for specific
325、applications.Two well-known applications of DP are its use in Google Chrome and Apples iOS/OSX to collect usage statistics in a privacy-preserving way.These applications follow the local model of DP,where each individual user privatizes their own data before sending it to a centralized server for an
326、alysis.For example,Chrome used this approach to discover frequently visited pages in order to improve its caching and pre-fetch features,while iOS uses it to discover words and emojis frequently used in a texting application to improve the language models used in typing assistance.Additionally,Micro
327、soft also announced that they employ DP in the local model to collect telemetry data from devices running their operating systems.The most well-known usage of the curator model is by the U.S.Census Bureau,who has released the results of the 2020 Census with differential privacy controls.O This was m
328、otivated by research showing that without the kind of protection provided by differential privacy it is sometimes possible to recover accurate information about individuals from Census data through aggregate statistics at different levels of granularity alone.OVERVIEWDifferential privacy specifies a
329、 property that a data analysis algorithm must satisfy in order to protect the privacy of its inputs.In this sense,DP is a privacy standard,rather than a single tool or algorithm.The DP property is stated in terms of an alternate world where the input of a particular individual has been removed from
330、or added to a database.DP requires that the outputs produced by the algorithm in the real and alternate world are statistically indistinguishable.Being a property of the algorithm means that such indistinguishability must hold regardless of what the database is and which individual we choose to remo
331、ve or add.DP is therefore not a property of the output,and cannot be measured directly by looking at the output of the algorithm on a given input database.Another crucial remark about the definition of DP is that the indistinguishability requirement is too strong to be satisfied by any deterministic
332、 algorithm.Randomness is therefore an indispensable ingredient in the design of any differentially private algorithm.The need for a robust definition of privacy becomes more 2.3 DIFFERENTIAL PRIVACY19.Sweeney,“k-Anonymity:A Model for Protecting Privacy”(2002).20.Machanavajjhala et al.,“L-diversity:P
333、rivacy beyond k-anonymity”(2007).O https:/www.census.gov/library/fact-sheets/2021/comparing-differential-privacy-with-older-disclosure-avoidance-methods.html THE UN GUIDE ON PRIVACY-ENHANCING TECHNOLOGIES FOR OFFICIAL STATISTICS3721.Narayanan et al.,“Robust De-anonymization of Large Sparse Datasets”(2008).22.Wood et al.,“Differential Privacy:A Primer”(2018).23.Dwork et al.,“The Algorithmic Foundat