《CETaS:2024第三方人工智能系统对英国国家安全的保障研究报告(英文版)(71页).pdf》由会员分享,可在线阅读,更多相关《CETaS:2024第三方人工智能系统对英国国家安全的保障研究报告(英文版)(71页).pdf(71页珍藏版)》请在三个皮匠报告上搜索。
1、DRAFT FOR REVIEW 0 Assurance of Third-Party AI Systems for UK National Security Rosamund Powell and Marion Oswald January 2024 Rosamund Powell and Marion Oswald 1 About CETaS.2 Acknowledgements.2 Executive Summary.3 Introduction.6 1.Understanding Third-Party AI Systems:Origins,Benefits,and Risks.10
2、1.1 Defining third-party AI.10 1.2 Cross-cutting risks and benefits.11 1.3 AI supply chain risks.12 2.Why Assurance?.13 2.1 What is AI assurance?.13 2.2 Challenges to effective AI assurance.13 3.A Model of AI Assurance for UK National Security.16 3.1 Co-creation of the assurance case.17 3.1.1 Existi
3、ng methods for documenting AI system properties.17 3.1.2 Companion guidance on evidentiary standards.25 3.2 System card template for UK national security.31 3.2.1 Summary information.32 3.2.2 Mission properties and legal compliance.33 3.2.3 The supply chain.35 3.2.4 Performance and security.37 3.2.5
4、 Ethical considerations.40 3.2.6 Iterative requirements.41 3.3 Assessing evidence.44 3.3.1 Skills to review system cards.44 3.3.2 The process of evidence assessment.46 3.3.3 Contractual protections.47 4.Implementation Considerations for Hypothetical Case Studies.50 5.Recommendations for Implementing
5、 AI Assurance.54 6.Conclusion.57 Appendix 1:Compiled System Card Template.59 Appendix 2:Glossary of Key Terms.67 About the Authors.68 Assurance of Third-Party AI Systems for UK National Security 2 About CETaS The Centre for Emerging Technology and Security(CETaS)is a research centre based at The Ala
6、n Turing Institute,the UKs national institute for data science and artificial intelligence.The Centres mission is to inform UK security policy through evidence-based,interdisciplinary research on emerging technology issues.Connect with CETaS at cetas.turing.ac.uk.This research was supported by The A
7、lan Turing Institutes Defence and Security Programme.All views expressed in this report are those of the authors,and do not necessarily represent the views of The Alan Turing Institute or any other organisation.Acknowledgements The authors are grateful to all those who took part in a research interv
8、iew or workshop for this project,without whom the research would not have been possible.The authors are also very grateful to Ian Brown,Elisabeth Mackay and Amy Harland for their valuable feedback on an earlier version of this report.Design for this report was led by Michelle Wronski.Rosamund Powell
9、 and Marion Oswald 3 Executive Summary This CETaS Research Report explores the potential for assurance processes to aid in the responsible adoption of AI across UK national security,with a particular focus on addressing the challenges which arise when industry suppliers contribute to the AI lifecycl
10、e.Involving industry in the design and development of AI capabilities for UK national security brings many benefits,including access to cutting-edge capabilities and the potential to save government time and money.But there are also risks,such as opaque supply chains,insufficient ethical due diligen
11、ce,and a lack of robust testing for AI security.While these risks apply to any government use of third-party AI,they are amplified in the national security context where tolerance for error is low,and systems must meet a higher threshold of robustness,security,and compliance.And,while much research
12、has focused on AI assurance in the public sector including in defence no concrete proposal has yet been made to account for the specific requirements of UK national security.We identify three cross-cutting governance challenges which are preventing national security bodies from determining which thi
13、rd-party AI systems to deploy.These are:1.Disparate access to information and skills,with government organisations often lagging behind industry in their understanding of the third-party technologies they plan to deploy.2.Divergent business models and motivations,with stronger incentives needed to i
14、mprove transparency from suppliers on the features of their AI systems.3.Distributed responsibility for the introduction of safeguards,with clearer consensus needed on who should conduct which aspects of the assurance process.Our report sets out how national security bodies and industry suppliers ca
15、n tackle these challenges using a tailored framework for AI assurance.Throughout this paper,AI assurance will be defined as:The portfolio of processes required to evaluate and communicate,iteratively throughout the AI lifecycle,the extent to which a given AI system:a)Does everything it says it is go
16、ing to do,and nothing it shouldnt do.b)Complies with the values of the deploying organisation.c)Is appropriate to the specific use case and envisioned deployment context.Assurance of Third-Party AI Systems for UK National Security 4 Our framework addresses these assurance components through four cor
17、e pillars:1.Robust documentation protocols in the form of a sector-specific system card template.When complete,this system card constitutes the AI assurance case the central document of compiled evidence that an AI system meets requirements.2.Companion guidance to clarify what constitutes sufficient
18、ly robust evidence to include in the system card.This includes the recommendation for national security bodies to curate a modular portfolio of assurance techniques(e.g.international standards,impact assessments,performance metrics,red teaming protocols)that have been approved for use in the high st
19、akes context of national security.3.Investment in skills for evidence review to enable national security decisionmakers to make thorough assessments of system cards.4.Contractual protections to mandate further transparency from suppliers of third-party AI,where relevant.While the bulk of this report
20、 focuses on detailing this assurance framework,we close by making recommendations for its implementation both in the near and long term.In the immediate term,we recommend both industry suppliers and national security bodies trial the framework on specific AI use cases,in place of current model cards
21、,to assess its applicability to a range of AI use cases and identify ways in which the assurance requirements set out here may be adapted to fit the specific risk profile of AI use cases.In the longer term we recommend national security bodies take the following actions to support implementation of
22、this assurance framework:Rosamund Powell and Marion Oswald 5 Recommendations for implementing AI assurance Build infrastructure for a sustainable assurance ecosystem,including further investments in platforms to host assurance cases and the curation of companion guidance,including a tailored nationa
23、l security portfolio of assurance techniques.Invest in skills for reviewing assurance cases(technical,ethical,and legal).We recommend government centres of AI expertise(e.g.the AI Safety Institute and Centre for Data Ethics and Innovation)support national security departments in AI assurance.Connect
24、 academic work on assurance to practitioner challenges to increase the availability of practicable assurance techniques that fill persistent gaps e.g.on AI security or data provenance.Develop exemplar assurance cases across a range of case studies to further specify how recommendations apply in cont
25、ext(such as for LLMs in intelligence analysis or autonomous agents for cyber defence).Draft bespoke contractual clauses to aid national security customers in ensuring suppliers are transparent about the properties of their AI systems.Assurance of Third-Party AI Systems for UK National Security 6 Int
26、roduction Designing,developing,and deploying an artificial intelligence(AI)system1 for UK national security already presents a challenge for policymakers,who must ensure adequate scrutiny of the system at each stage of the AI lifecycle2 to avoid unintended outcomes in high stakes contexts.This chall
27、enge becomes even harder when some of these stages instead occur within third-party organisations,potentially constraining government oversight of aspects of the development process.Nevertheless,involving third parties,particularly industry,is increasingly essential if national security bodies are t
28、o take full advantage of the powerful AI systems now available.Rapidly accelerating capabilities in the private sector,alongside skills shortages within government,mean industry suppliers can in some circumstances be the only option available to national security decisionmakers wishing to deploy the
29、 most advanced AI systems.In this report,we provide guidance on how to assess specific third-party AI systems against suitability criteria for national security deployment.We propose a step-by-step AI assurance framework which guides policymakers and industry suppliers through these decisions.Prior
30、to the deployment of a third-party AI system,the national security customer must address a three-part challenge.They must:1.Establish a robust understanding of what properties they want the AI system to possess,and how these properties might be guaranteed(e.g.through testing,evidence sharing,and con
31、tracting).2.Develop a strategy to maximise evidence available to them(e.g.through industry collaboration).3.Design clear protocols for assessing evidence,such that risk is minimised and ongoing checks are in place(e.g.through investments in staff expertise and sufficient people resourcing).The AI as
32、surance framework for UK national security presented here accounts for the three-part challenge described above.This framework centres on a proposal for a tailored system 1 Since the term was coined in 1955,the parameters of what constitutes artificial intelligence have often been vaguely drawn.For
33、the purposes of this study,our focus is primarily on machine learning technologies,defined throughout as technologies which use patterns in data to make predictions and thus improve performance over time.Please see Appendix 2 Glossary of key terms for a full definition.2 To include design,developmen
34、t and deployment of AI systems and to incorporate both technical and sociotechnical processes which occur in the AI lifecycle.Rosamund Powell and Marion Oswald 7 card template for UK national security,with the system card serving as the assurance case once populated the central document containing a
35、ll relevant evidence that an AI system meets requirements.The report also recommends longer-term actions from UK national security policymakers to help foster responsible AI innovation and thus overcome persistent assurance challenges.This report is directed at industry suppliers as well as national
36、 security bodies.Without industry contributions,the challenges of assuring third-party AI become significantly harder.Suppliers are often in a unique position with control over the project lifecycle,access to commercially sensitive information about the AI system,and significant bargaining power dur
37、ing contractual negotiations.Furthermore,industry bodies also face challenges when it comes to AI assurance.Currently,they lack sufficiently specific guidance on government requirements,leading to uncertainties as to which safeguards should be incorporated into their project lifecycles and communica
38、ted to government customers.Co-creation of the AI assurance case by supplier and customer is presented as the ideal method to facilitate robust assessment.Nevertheless,this framework is also adaptable for circumstances where national security bodies have no relationship with the supplier and consequ
39、ently must collect and assess all assurance evidence themselves,as is illustrated here through reference to hypothetical case studies.Research methodology and limitations This report addresses the following research questions:RQ1:What benefits and risks come from the deployment of third-party AI sys
40、tems?RQ2:What are the trade-offs regarding using third-party systems versus in-house development?RQ3:What guidance can help UK national security decisionmakers to interpret and interrogate third-party AI systems to ensure due diligence?RQ4:What should be included in assurance guidance for suppliers
41、developing AI capabilities for use in UK national security?Data collection for this study was conducted over a four-month period from June September 2023,including three core research activities:1.Literature review covering academic and policy literature on topics such as responsible development pra
42、ctices,AI supply chains,AI security,AI assurance,and AI procurement.Assurance of Third-Party AI Systems for UK National Security 8 2.Semi-structured interviews with 28 participants from government,industry and academia.3.Research workshop attended by 11 industry representatives with expertise in AI
43、assurance in the national security sector.The focus of this study was broad,covering the assurance of third-party AI systems across the whole UK national security landscape,with a particular focus on industry partnerships.It is beyond the scope of this report to lay out in depth recommendations for
44、dealing with specific technologies(e.g.biometrics,LLMs,computer vision)or for sectors outside of national security.Further work is needed to investigate the applicability of this framework to real-world use cases and to develop the technical aspects of AI assurance laid out here,in particular regard
45、ing AI security.Further work is also needed to examine the extent to which national security bodies should favour third-party or in-house design and development of AI systems in general.Our recommendations also do not address specific legal frameworks such as the information acquisition and disclosu
46、re requirements in the Security Service Act 1989 and Intelligence Services Act 1984,the warrantry and authorisation requirements,data safeguards and notices regime in the Investigatory Powers Act 2016,the Data Protection Act 2018,and the proportionality test in human rights law.However,legality and
47、compliance with warrantry and authorisation conditions will be key aims that assurance can address,thus informing the properties that the AI system must possess.For example,assurance processes can assist in obtaining the dataset and output information needed to determine levels of intrusiveness in r
48、elation to a proportionality assessment.3 Structure of this report The remainder of this report is structured as follows.Section 1 outlines the third-party AI landscape,exploring the risks and benefits these technologies bring across the national security sector.Section 2 focuses on the need to addr
49、ess these risks on a case-by-case basis,introducing AI assurance to do this.Section 3 forms the substantive analysis where we present a framework for AI assurance in the national security context.This assurance framework is further specified in section 4 through a discussion of its implementation in
50、 the context of hypothetical case studies.Finally,section 5 summarises core recommendations.3 Ardi Janjeva,Muffy Calder and Marion Oswald,“Privacy Intrusion and National Security in the Age of AI:Assessing proportionality of automated analysis,”CETaS Research Report(May 2023).Rosamund Powell and Mar
51、ion Oswald 9 Appendix 1 presents a compiled system card template for documenting AI system properties while Appendix 2 offers definitions of key terms used throughout this report.Assurance of Third-Party AI Systems for UK National Security 10 1.Understanding Third-Party AI Systems:Origins,Benefits,a
52、nd Risks 1.1 Defining third-party AI This report focuses on procurement and deployment decisions surrounding specific AI systems.Nevertheless,it is first necessary to define third-party AI systems,and understand the benefits and risks they bring to national security in general.Third-party AI systems
53、 come in many forms,with third parties contributing at any stage of the AI lifecycle,ranging from data collection and annotation to model training and validation.Increasingly,the modularity of AI systems means there are often multiple actors working together as part of an algorithmic supply chain,ea
54、ch contributing to distinct aspects of the systems functionality.4 Throughout this report,third-party AI systems are defined as any AI system where at least one stage of the AI lifecycle(design,development,deployment)occurs partially or wholly outside of the organisation that will deploy the system.
55、Three factors can be used to roughly map this landscape of third-party AI:A.The type and number of third parties involved:This could include academic institutions,private companies(start-ups,multinational technology companies,defence primes),public sector bodies(including another national security a
56、gency),or some combination of these.B.The nature of the third-party relationship:This could include AI systems designed in partnership with companies where formal relationships are established but can also include AI systems made commercially available by multinational tech companies.Even where the
57、relationship with the prime contractor is strong there may be other firms contributing down the supply chain.C.The extent of third-party involvement:Third party suppliers may become involved in any one stage of the AI lifecycle or may have full control over every stage,subsequently impacting how muc
58、h control the national security body has over each AI lifecycle stage.4 Jennifer Cobbe,Michael Veale and Jatinder Singh,“Understanding accountability in algorithmic supply chains,”in FAccT 23:Proceedings of the 2023 ACM Conference on Fairness,Accountability,and Transparency(New York:Association for
59、Computing Machinery,2023),1186-1197.Rosamund Powell and Marion Oswald 11 Section 4 offers guidance on how the assurance framework set out here might be applied across this varied third-party AI landscape,using three hypothetical case studies to structure discussion.1.2 Cross-cutting risks and benefi
60、ts Each AI product raises distinct concerns for national security decision-makers.Nevertheless,several benefits,risks and governance challenges recur across a range of third-party AI systems(Figure 1).5 Figure 1:Benefits,risks and governance challenges associated with third-party AI 5 Benefits and r
61、isks of third-party AI included in figure 1 were identified during research interviews and literature review.Assurance of Third-Party AI Systems for UK National Security 12 1.3 AI supply chain risks Complex supply chains present one of the most common and challenging risks of using third-party AI in
62、 the national security context as their complexity makes it difficult to guarantee their security,6 while further concerns exist around legal compliance and ethical practice all the way down a supply chain.7 The machine learning lifecycle of systems with complex supply chains is highly sociotechnica
63、l,8 meaning technical,legal,and ethical supply chain risks become intertwined.For example,data provenance has been described as the biggest issue for third-party AI,9 due to technical,policy and compliance concerns.On the technical side,there is a risk of poisoned data,bias,the data misbehaving,risk
64、 of attacks,10 while on the policy side concerns around copyright and intellectual property are preventing suppliers from being fully transparent.11 Beyond data provenance,instability in the supply chain with regard to compute and hardware sourcing can present a particular risk for load bearing AI s
65、ystems,while national security decision-makers also expressed concerns over model provenance,in particular in contexts where they do not have oversight over who else may be using an AI system.12 As noted by one interviewee,verifying the big data supply chain for compliance and security is not easy,b
66、ut that is the cost of doing things in defence and security.13 6 Nii Simmonds and Alice Lynch,“Mitigating supply chain threats:building resilience through AI-enabled early warning systems,”CETaS Expert Analysis(January 2023).7 Interview with industry expert,4 August 2023.8 Jennifer Cobbe,Michael Vea
67、le and Jatinder Singh,“Understanding accountability in algorithmic supply chains,”in FAccT 23:Proceedings of the 2023 ACM Conference on Fairness,Accountability,and Transparency(New York:Association for Computing Machinery,2023),1186-1197.9 Interview with government representative,5 July 2023.10 Inte
68、rview with government representative,19 July 2023;for more detailed analysis of these technical risks see section 3.2.4 performance and security.11 Interview with government representative,5 July 2023.12 Interview with government representative,5 July 2023.13 Interview with industry expert,4 August
69、2023.Rosamund Powell and Marion Oswald 13 2.Why Assurance?2.1 What is AI assurance?The term AI assurance is used in a variety of ways in the UK and internationally,contributing to confusion among experts.14 During our engagements,consensus emerged on what assurance does and does not involve.Assuranc
70、e does not involve eliminating risk from AI systems,15 setting rigid rules for developers,16 or quantitatively ranking AI systems against one another.17 Assurance is more nuanced than this and must at times be adaptable to the needs of stakeholders with divergent priorities.For instance,for some int
71、erviewees issues of AI security and performance were seen as most central to assurance,18 while for others the core focus was legal compliance and ethical due diligence.19 We define AI assurance as the portfolio of processes required to evaluate and communicate,iteratively throughout the AI lifecycl
72、e,the extent to which a given AI system:a)Does everything it says it is going to do,and nothing it shouldnt do.b)Complies with the values of the deploying organisation and upholds established ethical principles.c)Is legally compliant and appropriate to the specific deployment context.2.2 Challenges
73、to effective AI assurance Much progress has been made by other parts of the public sector towards successful AI assurance.The UKs Centre for Data Ethics and Innovation(CDEI)has spearheaded this work,20 while Dstl has outlined how assurance might be implemented in the defence 14 Across interviews,the
74、 scope of assurance varied,both regarding the processes involved and the properties seen as important to assure for.A government representative cited international variation as a contributing factor,with the US in particular favouring the term risk management over assurance.Interview with government
75、 expert,21 June 2023.15 Interview with academic expert,21 June 2023.16 Interview with government representative,21 June 2023.17 Interview with academic expert,6 July 2023.18 Interview with government representative,5 July 2023.19 Interview(2)with law enforcement member of staff,6 July 2023.20 HM Gov
76、ernment,CDEI portfolio of AI assurance techniques(Centre for Data Ethics and Innovation:2023),https:/www.gov.uk/guidance/cdei-portfolio-of-ai-assurance-techniques;HM Government,The roadmap to an effective AI assurance ecosystem(Centre for Data Ethics and Innovation:2021),https:/www.gov.uk/government
77、/publications/the-roadmap-to-an-effective-ai-assurance-ecosystem.Assurance of Third-Party AI Systems for UK National Security 14 context.21 Despite progress,the below challenges,illustrate why further work is needed for AI assurance to be deployed effectively in the UK national security context:1.Ex
78、isting frameworks have not specifically addressed national security needs:Existing work on AI assurance(e.g.by CDEI)doesnt adequately address factors such as protection against adversarial attack.22 2.Crowded landscape:Techniques for trustworthy AI are proliferating.Without structured ways to choose
79、 between all the standards,impact assessments and performance metrics on offer,developers and policymakers are left confused and overwhelmed.23 3.Separation of technical vs ethical assessment and a lack of intersecting skills:Currently,AI assurance tools tend to be either technical or ethical.Ethica
80、l and technical assessments need to occur in tandem.This requires a multidisciplinary team.24 4.Accommodating start-ups:Assurance entails a resourcing requirement,which is likely to favour larger companies.25 If entry costs are too high,and start-ups are left behind,there is potential for stifled in
81、novation and competition.26 5.Convoluted,theoretical frameworks:Practitioners expressed frustration at assurance frameworks which fail to specify requirements in terms they understood.27 One participant claimed that too much investment had been placed in academic work,resulting in assurance framewor
82、ks that are very confusing for most developers.28 6.Added bureaucracy:Procurement is already slow,and assurance could slow it down further.Additional safeguards are needed but should be balanced with efficiency.29 7.Divergent business models hamper communication:Industry suppliers are often reluctan
83、t to communicate transparently,for example due to concerns around trade secrecy and commercial IP.This can limit evidence available to government as part of an AI assurance case.30 8.Complex supply chains are poorly understood:Existing assurance frameworks struggle to account for disparate informati
84、on access across complex supply chains.31 In 21 HM Government,Assurance of Artificial Intelligence and Autonomous Systems:A Dstl Biscuit Book(Dstl:2021),https:/www.gov.uk/government/publications/assurance-of-ai-and-autonomous-systems-a-dstl-biscuit-book.22 Interview with government representative,5
85、July 2023.23 Interview with government representative,21 June 2023.24 Interview with government representative,21 June 203.25 Interview with industry expert,21 July 2023.26 Interview with industry expert,4 August 2023.27 Interview with industry experts,28 July 2023.28 Interview with academic expert,
86、26 July 2023.29 Interview with academic expert,4 July 2023.30 Interview with government representative,21 June 2023.31 Interview with academic expert,6 July 2023.Rosamund Powell and Marion Oswald 15 addition,increasingly agile development cycles mean assurance must be able to account for continuous
87、testing and iterative revision after deployment.32 9.Risks false sense of security:The success of AI assurance is ultimately limited by the capability and diligence of the people assessing assurance cases.It can easily become a rubber-stamping exercise,and lead to a false sense of security.33 In the
88、 national security context,this has even turned decision-makers away from the term assurance as it can give false confidence if residual risk is not appropriately communicated.34 32 Jennifer Cobbe,Michael Veale and Jatinder Singh,“Understanding accountability in algorithmic supply chains,”in FAccT 2
89、3:Proceedings of the 2023 ACM Conference on Fairness,Accountability,and Transparency(New York:Association for Computing Machinery,2023),1186-1197.33 Interview with academic expert,4 July 2023.34 Interview with government representative,22 June 2023.Assurance of Third-Party AI Systems for UK National
90、 Security 16 3.A Model of AI Assurance for UK National Security Our framework for AI assurance addresses the above challenges.It consists of two core stages,each with two constituent pillars(Figure 2).First,the assurance case must be created ideally through cooperation between suppliers and national
91、 security bodies.To support this process,we propose:a)A template for documenting AI system properties.b)The creation of companion guidance to support those filling out the template.Second,the assurance case must be reviewed to assess whether evidence is sufficient.To support this process,we propose:
92、a)Clarity on responsibilities for evidence review and investment in internal skills.b)Contractual clauses to mandate transparent sharing of evidence with reviewers.Figure 2:Two stage model for AI assurance Rosamund Powell and Marion Oswald 17 3.1 Co-creation of the assurance case 3.1.1 Existing meth
93、ods for documenting AI system properties The assurance case is the central document containing all relevant evidence that an AI system meets requirements,structured into a logical argument supporting some end goal or collection of desired properties.35 The choice of method for documenting these desi
94、red AI system properties is a crucial component of the assurance framework as it sets the standard on which properties are included.36 Selecting a documentation method also presents a particular challenge for third-party systems as evidence is typically generated by multiple actors,necessitating co-
95、creation of the assurance case.Numerous proposals have been made for how to document AI system properties(as illustrated by Figure 3).37 We focus on three methods which have shown significant promise,comparing their strengths and weaknesses.38 Our proposal will build on the strengths of each.Figure
96、3:Options for documenting AI properties as illustrated by Hugging Face 35 HM Government,Assurance of Artificial Intelligence and Autonomous Systems:A Dstl Biscuit Book(Dstl:2021),https:/www.gov.uk/government/publications/assurance-of-ai-and-autonomous-systems-a-dstl-biscuit-book.36 Mona Sloane et al
97、.,AI and procurement:a primer(New York University:Summer 2021),https:/archive.nyu.edu/handle/2451/62255.37 Hugging Face,“Model Card Guidebook,”https:/huggingface.co/docs/hub/model-card-guidebook.38 In selecting these methods for comparison,we acknowledge the existence of further documentation method
98、s such as data sheets and explainability factsheets.Assurance of Third-Party AI Systems for UK National Security 18 1.Model cards:Table 1:Strengths,weaknesses and examples of model cards Description Model cards are defined as files that provide information about a models purpose and details about it
99、s provenance,the data used for training,any known limitations and bias,or simply as files that accompany the models and provide handy information.39 They were proposed by Mitchell et al.(2018)to increase transparency through accessible information sharing.40 Examples Hugging Face:41 Hugging Face mod
100、el cards are widely adopted across the private sector.Their template is shared publicly and is designed to be filled in with descriptions of the model,its intended uses,limitations,biases and ethical considerations,the training parameters and experimental information,which datasets were used,and eva
101、luation results.These model cards require input from developers,sociotechnical experts,and project organisers.Bailo:42 A system introduced by GCHQ to ensure model cards are uploaded to a central repository for easy review.For each model card,two stages of review are required(a.technical assessment,b
102、.policy assessment).The aim is to manage the AI project lifecycle and enable compliance with organisational requirements.Algorithmic transparency recording standard:43 While not designed as a model card or to be incorporated into an approvals process,the algorithmic transparency recording standard(d
103、eveloped by CDDO and CDEI)offers many useful insights on the sorts of properties that must be included in a comprehensive overview of a model.It is aimed at 39 Hugging Face,“Model Cards,”https:/huggingface.co/docs/hub/model-cards.40 Margaret Mitchell et al.,“Model cards for model reporting,”in FAT*1
104、9:Proceedings of the Conference on Fairness,Accountability,and Transparency(New York:Association for Computing Machinery,2019),220-229.41 Hugging Face,“Model Cards,”https:/huggingface.co/docs/hub/model-cards.42 GCHQ/Bailo,“Bailo managing the lifecycle of machine learning to support scalability,impac
105、t,collaboration,compliance and sharing,”GitHub,https:/ HM Government,Algorithmic Transparency Recording Standard Hub(CDDO and CDEI:January 2023),https:/www.gov.uk/government/collections/algorithmic-transparency-recording-standard-hub.Rosamund Powell and Marion Oswald 19 public sector bodies rather t
106、han industry,requiring them to input clear information about the algorithmic tools they use,and why theyre using them.It requires public sector bodies to provide information including a system overview,contact details for the responsible team,details on how the tool will be used,mechanisms for revie
107、w,details on the datasets used and more.So far,it has been piloted across a range of public sector bodies from healthcare to policing and the cabinet office with updates made in response to user feedback.Strengths Weaknesses Succinct,with the potential to act as boundary objects,a single artefact th
108、at is accessible to users who have different backgrounds and goals when interacting with model cards.44 Ease of completion by developers.Existing uptake from developers indicates familiarity with this approach,45 and Bailo indicates similar familiarity among the national security community.46 Existi
109、ng versions promote breaking down performance criteria into results for individual demographic,cultural or domain relevant conditions.47 Frequently tailored for interpretation by individuals with AI or NLP expertise,offering insufficient context for non-experts.52 Focus on model can be oversimplisti
110、c given that safeguards are often built into the broader system,for instance covering the front-end graphical user interface as well as the model behind it.53 The integrity of the model card is highly reliant on the integrity of the creator(s),54 and there is not enforcement of transparency from dev
111、elopers associated with this documentation method.Typically,there is no distinction between the space for claims versus 44 Hugging Face,“Model Card Guidebook,”https:/huggingface.co/docs/hub/model-card-guidebook.45 Hugging Face,“User Studies,”https:/huggingface.co/docs/hub/model-cards-user-studies.46
112、 GCHQ/Bailo,“Bailo managing the lifecycle of machine learning to support scalability,impact,collaboration,compliance and sharing,”GitHub,https:/ Margaret Mitchell et al.,“Model cards for model reporting,”in FAT*19:Proceedings of the Conference on Fairness,Accountability,and Transparency(New York:Ass
113、ociation for Computing Machinery,2019),220-229.52 Anamaria Crisan et al.,“Interactive Model Cards:A Human-Centred Approach to Model Documentation,”in FAccT 22:Proceedings of the 2022 ACM Conference on Fairness,Accountability,and Transparency(New York:Association for Computing Machinery,2022),427-439
114、.53 Interview with industry expert 04/08/23.54 Margaret Mitchell et al.,“Model cards for model reporting,”in FAT*19:Proceedings of the Conference on Fairness,Accountability,and Transparency(New York:Association for Computing Machinery,2019),220-229.Assurance of Third-Party AI Systems for UK National
115、 Security 20 Includes room for ethical considerations,most often highlighting the importance of fairness.48 High degree of standardisation allows for ease of comparison between cards.49 Adaptations can be introduced to allow for context-specific factors to be included on a model card.50 Many complet
116、ed versions are publicly available,offering a starting point for developers.51 evidence,making them subjective.55 Suppliers tend to pitch their product rather than lay out limitations,meaning clear distinctions are needed.56 Often lack interactivity,57 for example offering means to further question
117、what is written on the model card.58 May not capture complex supply chains or chaining of multiple machine learning models in sequence.Software packages can automate the completion of model cards,but it takes the responsibility away from people to consider whether there is more that needs to be comm
118、unicated.59 Insufficient guidance is given on ethical components,it is hard to think about whether you have created a fair system,a sustainable one,an explainable one.60 48 Margaret Mitchell et al.,“Model cards for model reporting,”in FAT*19:Proceedings of the Conference on Fairness,Accountability,a
119、nd Transparency(New York:Association for Computing Machinery,2019),220-229.49 Ibid.50 Evidently AI,“A simple way to create ML model cards in Python,”Evidently AI Tutorials,15 June 2023,https:/ See HM Government,Collection:Algorithmic Transparency Reports(CDDO and CDEI:2023),https:/www.gov.uk/governm
120、ent/collections/algorithmic-transparency-reports.55 Interview with government representative,5 July 2023.56 Christopher Burr and Rosamund Powell,Trustworthy Assurance of Digital Mental Healthcare(Alan Turing Institute:2022),https:/zenodo.org/records/7107200.57 Interview with government representativ
121、e,19 July 2023.58 Anamaria Crisan et al.,“Interactive Model Cards:A Human-Centred Approach to Model Documentation,”in FAccT 22:Proceedings of the 2022 ACM Conference on Fairness,Accountability,and Transparency(New York:Association for Computing Machinery,2022),427-439.59 Interview with government re
122、presentative,8 June 2023.60 Interview with government representative,8 June 2023.Rosamund Powell and Marion Oswald 21 2.System cards:Table 2:Strengths,weaknesses and examples of system cards Description Some have argued in favour of a shift from model cards to system cards.61 Currently there is litt
123、le consensus on what should comprise a system card,with the only common feature being that instead of documenting a single model,system cards aim to document the features of all the models and other components which make up the final AI system.62 Meta,for instance,propose the system card approach is
124、 crucial to document how components interact in complex AI systems,63 applying their system card methodology to their Instagram Feed technology.64 Example GPT-4 System card:65 OpenAIs system card for GPT-4 aims to detail the testing and safeguards put in place to address safety challenges.It spans m
125、odel and system-level interventions,discussing adversarial testing,red teaming,and expert consultations.It is a long,free-form document(as compared to model cards,e.g.Hugging Face).Strengths Weaknesses Uptake from big tech shows a willingness to adopt this approach.66 Examples so far(see:ChatGPT)off
126、er insufficient structure.This makes it easy to be convinced by the evidence 61 Meta,“System cards,a new resource for understanding how AI systems work,”Meta Blog,23 February 2022,https:/ with industry expert,4 August 2023.62 Meta,“System cards,a new resource for understanding how AI systems work,”M
127、eta Blog,23 February 2022,https:/ with industry expert 4 August 2023;OpenAI,“GPT-4 System Card,”23 March 2023,https:/ Chaves Procope et al.,“System-level transparency of machine learning,”Meta Research,22 February 2022,https:/ Meta,“What is the Instagram Feed?,”Meta Tools,23 February 2022,https:/ Op
128、enAI,“GPT-4 System Card,”23 March 2023,https:/ Meta,“System cards,a new resource for understanding how AI systems work,”Meta Blog,23 February 2022,https:/ with industry expert,4 August 2023;OpenAI,“GPT-4 System Card,”23 March 2023,https:/ of Third-Party AI Systems for UK National Security 22 Account
129、s for features beyond the model,to include safeguards which can be introduced at different points in the AI lifecycle and/or as part of the final interface.67 This is particularly useful in a context where software development now often involves,to various degrees,integrating pre-built modular compo
130、nents provided as services and controlled by others into a complete product:not simply a system,but a system-of-systems.68 that is there,but difficult to see what is missing,especially for non-technical audiences.69 As with model cards,there can be insufficient distinction between evidence and claim
131、s.70 Current examples focus on technical rather than sociotechnical assessments(e.g.on properties such as fairness and explainability),and little attention is paid to the importance of legal compliance.71 67 Interview with industry expert,4 August 2023.68 Jennifer Cobbe,Michael Veale and Jatinder Si
132、ngh,“Understanding accountability in algorithmic supply chains,”in FAccT 23:Proceedings of the 2023 ACM Conference on Fairness,Accountability,and Transparency(New York:Association for Computing Machinery,2023)1186-1197.69 Interview with industry expert,4 August 2023.70 OpenAI,“GPT-4 System Card,”23
133、March 2023,https:/ Ibid.Rosamund Powell and Marion Oswald 23 3.Argument-based assurance:Table 3:Strengths,weaknesses and examples of argument-based assurance Description A process of using structured argumentation to provide assurance to another party(or parties)that a particular claim(or set of rel
134、ated claims)about a property of a system is warranted given the available evidence.72 This has been widely deployed in safety-critical domains to assure features of complex engineering systems,and since expanded to software and AI.73 Most recently,it has been expanded to address AI ethics.74 Example
135、 Argument pattern for explainability:75 The conceptual gap between ethical AI principles,for example fairness or explainability,and concrete evidence is large.Argument-based assurance uses structured flowcharts to break these broad goals down into sub-goals which are then each supported by multiple
136、pieces of evidence.For a single goal,such as explainability,a highly complex flowchart is needed to fully justify and communicate how each piece of evidence comes together to support the stated goal.Due to the complexity of these assurance cases,it would be infeasible to expect developers to start f
137、rom scratch for each new AI system they wish to assure.It has instead been suggested that argument patterns for common goals like explainability should be developed to offer a starting point for developers,who then simply need to adapt them for the specific AI use case they have in mind.Strengths We
138、aknesses 72 Christopher Burr and David Leslie,“Ethical Assurance:A Practical Approach to the Responsible Design,Development,and Deployment of Data-Driven Technologies,”AI and Ethics 3(2023):73-98.73 John McDermid,Yan Jia and Ibrahim Habli,“Towards a Framework for Safety Assurance of Autonomous Syste
139、ms,”in Proceedings of the Workshop on Artificial Intelligence Safety 2019(CEUR:2019),1-7.74 Christopher Burr and David Leslie,“Ethical Assurance:A Practical Approach to the Responsible Design,Development,and Deployment of Data-Driven Technologies,”AI and Ethics 3(2023):73-98;Christopher Burr and Ros
140、amund Powell,Trustworthy Assurance of Digital Mental Healthcare(Alan Turing Institute:2022),https:/zenodo.org/records/7107200;Zoe Porter,Ibrahim Habli and John McDermid,“A principle-based ethical assurance argument for AI and Autonomous systems,”Arxiv(March 2022).75 Christopher Burr and Rosamund Pow
141、ell,Trustworthy Assurance of Digital Mental Healthcare(Alan Turing Institute:2022),https:/zenodo.org/records/7107200.Assurance of Third-Party AI Systems for UK National Security 24 The structure of arguments places more pressure on suppliers to back up claims with evidence.76 Offers options to invol
142、ve impacted groups in determining what should be included as a top-level goal for an AI assurance argument.77 Provides clarity on how evidence relates to claims about system properties,in particular in more complex scenarios where multiple pieces of evidence back up a single claim.78 Argument patter
143、ns can be repurposed for multiple AI systems,allowing exemplars to be adapted for future AI systems.79 Can be complex and onerous to complete,and challenging to understand for those who need to interpret assurance cases.80 Limited uptake by suppliers will be a key limitation for this approach,as wil
144、l the lack of sufficient internal skills and resources to review complex assurance cases.Suggested concepts(e.g.fairness,trustworthiness)can be uncertain and difficult to evidence due to both subjectivity and complexity.81 Would need to be adapted to reflect different priorities and principles which
145、 are relevant in a national security context(e.g.the specific definition of proportionality used in this context).82 The structure of each assurance case is bespoke,to the point where comparing AI systems becomes challenging.Each of these three documentation methods offers key insights into best pra
146、ctice for UK national security,in particular highlighting the need to:1.Balance accessible,concise documentation with detail to aid interpretation by non-experts.2.Ensure documentation structure is consistent and facilitates easy comparison and identification of gaps by reviewers.76 Christopher Burr
147、 and Rosamund Powell,Trustworthy Assurance of Digital Mental Healthcare(Alan Turing Institute:2022),https:/zenodo.org/records/7107200.77 Christopher Burr and Rosamund Powell,Trustworthy Assurance of Digital Mental Healthcare(Alan Turing Institute:2022),https:/zenodo.org/records/7107200.78 Christophe
148、r Burr and David Leslie,Ethical Assurance:A Practical Approach to the Responsible Design,Development,and Deployment of Data-Driven Technologies,AI and Ethics 3(2023):73-98.79 Interview with academic expert,8 June 2023.80 Interview with academic expert,8 June 2023.81 Zoe Porter,Ibrahim Habli and John
149、 McDermid,“A principle-based ethical assurance argument for AI and Autonomous systems,”Arxiv(March 2022).82 Interview with academic expert,8 June 2023.Rosamund Powell and Marion Oswald 25 3.Accommodate flexibility to adapt AI assurance to emerging technologies and specific use contexts,while establi
150、shing consensus on properties which must always be documented.4.Implement a framework which builds on industry practices so that uptake across the sector is maximised,while also putting sufficient pressure on industry to increase transparency.5.Clarify how assurance builds on related processes(e.g.l
151、egal compliance and procurement).We take forward the strengths of each documentation method in our proposal for a tailored system card template for the UK national security context,with our system card incorporating features from each of the above examples(see Section 3.2).3.1.2 Companion guidance o
152、n evidentiary standards Documentation methods such as model and system cards would not go far towards solving the challenges associated with third-party AI if they were simply filled in with descriptive claims.This is arguably the most significant limitation of model cards in their current form.As n
153、oted by one interviewee,it is possible for people to just write down an opinion on the model card.It is very subjective still.83 This is especially problematic when subjective input is communicated across multiple organisations where underlying assumptions differ.84 Consequently,in proposing a tailo
154、red system card template for national security,we must be more stringent about the sorts of evidence that may be used to support claims set out within it,providing companion guidance to users of the system card on how to fill it out.Table 4 below,reproduced from CDEIs work on AI assurance,85 summari
155、ses some of the key assurance techniques which may be used to generate evidentiary artefacts to support claims made within an AI system card.83 Interview with government representative,5 July 2023.84 Christopher Burr and Rosamund Powell,Trustworthy Assurance of Digital Mental Healthcare(Alan Turing
156、Institute:2022),https:/zenodo.org/records/7107200.85 CDEI,“Techniques for assuring AI systems,”https:/cdeiuk.github.io/ai-assurance-guide/techniques.Assurance of Third-Party AI Systems for UK National Security 26 Table 4:Examples of assurance techniques(reproduced from CDEI AI assurance guide)Assura
157、nce technique Description Impact assessment Used to anticipate the effect of a system on environmental,equality,human rights,data protection,or other outcomes.Risk assessment Similar to impact assessments but are conducted after a system has been implemented in a retrospective manner.Bias audit Asse
158、ssing the inputs and outputs of algorithmic systems to determine if there is unfair bias in the input data,the outcome of a decision or classification made by the system.Compliance audit A review of a companys adherence to internal policies and procedures,or external regulations or legal requirement
159、s.Specialised types of compliance audit include system and process audits and regulatory inspection.Certification A process where an independent body attests that a product,service,organisation or individual has been tested against,and met,objective standards of quality or performance.Conformity ass
160、essment Provides assurance that a product,service or system being supplied meets the expectations specified or claimed,prior to it entering the market.Conformity assessment includes activities such as testing,inspection and certification.Performance testing Used to assess the performance of a system
161、 with respect to predetermined quantitative requirements or benchmarks.Formal verification Establishes whether a system satisfies some requirements using the formal methods of mathematics.However,it is rarely easy to choose exactly which assurance technique should be applied and how they should be c
162、ombined,especially given that the above list is not exhaustive.Additional techniques such as red teaming and international AI standards may be used as evidence in the system card.And,for each category,there will be many specific examples available(e.g.AI impact assessments focus on overlapping but v
163、aried priorities for instance human rights,fairness,data protection,security,safety,privacy,and sustainability).Rosamund Powell and Marion Oswald 27 Furthermore,national security-specific techniques will also be needed,such as structured frameworks to assess proportionality of AI systems.86 Two oppo
164、sing approaches currently exist on how such techniques for evidence generation should be combined as part of an end-to-end assurance process:1.Pipeline of standardised benchmark protocols and evaluations to embed and assess each of the features you want to document.In this scenario,there would be a
165、single checklist of techniques to implement for every AI system to assess each property within the system card in turn.2.Modular portfolio of assurance techniques87 selected in a context-specific manner depending on the AI use case.In this scenario,for each property you wish to assure for(e.g.fairne
166、ss)there will be multiple options for how to evidence it with distinct assurance techniques selected depending on the use case.Each approach comes with advantages and weaknesses.A pipeline of standardised metrics facilitates easy comparison between the different third-party AI systems on offer.This
167、is particularly useful for technical properties such as performance where quantitative techniques are available to rank AI systems against one another.88 This approach also allows more resources to be dedicated towards verifying whether a smaller number of evaluation metrics are truly robust,resulti
168、ng in more confidence in the techniques which do get used.However,it is harder to standardise tests for qualitative properties such as fairness and explainability.A preoccupation with standardised benchmarks could even lead to an overreliance on technical evaluation as opposed to sociotechnical and
169、qualitative tests,something cited as a problem by numerous interviewees,89 as these do not produce such clear-cut results in the form of scores or rankings.90 Furthermore,a pipeline of standardised evaluations offers little flexibility for evidentiary standards to be adjusted to specific use cases.F
170、inally,even for technical properties,standardised benchmarks risk overconfidence based on tools which are not fully interpretable,and which dont sufficiently disaggregate results for distinct tasks.91 Overall,reliance on a pipeline of standardised tests can lead to 86 Ardi Janjeva,Muffy Calder and M
171、arion Oswald,“Privacy Intrusion and National Security in the Age of AI:Assessing proportionality of automated analysis,”CETaS Research Report(May 2023).87 HM Government,CDEI portfolio of AI assurance techniques(Centre for Data Ethics and Innovation:2023),https:/www.gov.uk/guidance/cdei-portfolio-of-
172、ai-assurance-techniques.88 HuggingFace,“Open LLM Leaderboard,”https:/huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard.89 Interview with government expert,21 June 2023;Interview with academic expert,8 June 2023;Interview with academic expert,6 July 2023;Interview with academic expert,21 June
173、2023.90 Interview with academic expert,6 July 2023.91 Ryan Burnell et al.,“Rethink reporting of evaluation results in AI,”SCIENCE 380,no.6641(April 2023):136-138.Assurance of Third-Party AI Systems for UK National Security 28 assurance being viewed as a tick-box exercise,without sufficient room for
174、reflection on whether these are the right tests for the specific technology under consideration.In contrast,a modular portfolio of techniques allows national security bodies to adapt their risk appetite for high-stakes versus low-stakes use cases and to accept a broader range of evidence submissions
175、 for qualitative system properties where multiple methodologies may be available.92 A modular repository of AI assessment techniques can also more easily accommodate regular revision as technologies evolve and the techniques to assess them are rendered outdated.93 This approach is particularly suite
176、d to third-party AI for two reasons.First,suppliers are developing their own assurance techniques.Palantir for example have released AI on RAILS94,a responsible AI lifecycle framework,while Microsoft have an AI Fairness Checklist.95 But,each organisation does assurance differently.This variation in
177、industry approaches is further illustrated by“frontier AI”labs published policies.96 National security bodies need to be flexible enough to accept evidence submitted from suppliers where it is provably robust,even if the evidence is not submitted in the form the agency would have deemed ideal had th
178、ey developed the AI system themselves.Second,for each third-party AI system,the approach to AI assurance will need to be adjusted depending on which stage of the AI lifecycle the national security body is overseeing.For instance,adversarial testing may offer useful evidence of AI security if the nat
179、ional security body only has oversight of the deployment phase.But,if they instead have control over the design phase,they can promote security-by-design,perhaps adopting a security standard such as ISO/IEC AWI 27090 Cybersecurity Artificial intelligence.97 A repository of ex-ante and ex-poste assur
180、ance techniques gives the flexibility to test for AI system properties in different ways,depending on the specificity of the use case and the relationship with the third-party supplier.92 Jacqui Ayling and Adriane Chapman,“Putting AI ethics to work:are the tools fit for purpose?,”AI and ethics 2(202
181、2):405-429.93 Jacob Mkander et al.,“Auditing large language models:A three-layered approach,”Arxiv(June 2023).94 Palantir,“AI on RAILs:A responsible AI lifecycle framework,”Palantir Whitepaper,2023,https:/ Michael Madaio et al.,“Co-designing checklists to understand organizational challenges and opp
182、ortunities around fairness in AI,”in CHI 20:Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems(New York:Association for Computing Machinery,2020),1-14.96 Department for Science,Innovation and Technology,“Emerging processes for frontier AI safety,”October 2023,https:/assets.
183、publishing.service.gov.uk/media/653aabbd80884d000df71bdc/emerging-processes-frontier-ai-safety.pdf.97 ENISA,Cybersecurity of AI and standardisation(ENISA:March 2023).Rosamund Powell and Marion Oswald 29 There is a tension between the need to provide more prescriptive and practical guidance,while als
184、o showing adaptability to evidence submitted in a variety of forms.We propose a compromise between a narrow pipeline of standardised metrics and existing extensive portfolios of generalist assurance techniques.Specifically,we propose companion guidance for AI assurance in the national security conte
185、xt.This guidance should include:A.A narrow,curated repository of assurance techniques which are appropriate for the national security context.This repository should cover the full range of AI system properties laid out in the below system card.It should include specific examples of impact assessment
186、s,audit methodologies,performance metrics,red teaming protocols,and more.B.Comprehensive guidance on choosing between techniques where multiple may be available,to help suppliers and/or national security staff to choose evidence that is most appropriate to their circumstances.This should build on CS
187、ET Georgetowns matrix for selecting responsible AI frameworks.98 For example,which audit methodologies are best suited for generative AI?Which impact assessments are best for high-stakes AI applications?Which performance metrics are appropriate for particular use domains?C.Exemplar assurance cases f
188、or specific AI technologies to ensure recommendations are grounded in the specific and distinct challenges raised by LLMs as opposed to computer vision,or AI for intelligence analysis as opposed to AI for business operations.This companion guidance should be shared with suppliers alongside the syste
189、m card template to allow them to evidence their claims more easily.It should also be shared with internal national security staff to aid them in filling out system cards.It is beyond the scope of this report to produce this companion guidance,and we recommend this for future work.However,we recommen
190、d policymakers look to the example set by the CDAO Responsible AI Toolkit released by the US Department of Defence.99 This interactive toolkit guides users through tailorable and modular assessments,tools,and artifacts throughout the AI product lifecycle and offers guidance to current and future DoD
191、 industry partners.It is a living document which will be regularly 98 Mina Narayanan and Christian Shoeberl,A matrix for selecting responsible AI frameworks(CSET Georgetown:June 2023).99 US Department of Defence,“CDAO Releases Responsible AI(RAI)Toolkit for Ensuring Alignment With RAI Best Practices
192、,”US Department of Defence Press Release,14 November 2023,https:/www.defense.gov/News/Releases/Release/Article/3588743/cdao-releases-responsible-ai-rai-toolkit-for-ensuring-alignment-with-rai-best-p/.Assurance of Third-Party AI Systems for UK National Security 30 updated.100 A similar approach is ne
193、eded in a national security context to support industry partners and direct them towards sufficiently thorough assurance techniques.100 Ibid.Rosamund Powell and Marion Oswald 31 3.2 System card template for UK national security Drawing on the above analysis,alongside insights from research engagemen
194、ts,we propose a tailored approach to system cards for third-party AI systems in the national security context.The framework presented here should be viewed as a starting point,with regular iterations needed to keep pace with developments in AI.Six sections form the core of the system card,with indus
195、try collaboration advantageous for the completion of three of these.Each section will be covered in depth below as we detail the rationale behind the structure of the system card before presenting instructions for filling it out,directed at both industry suppliers and national security customers.A c
196、ompiled system card template can be found in Appendix 1 which illustrates how these sections would be presented to users in practice.Figure 4:System card structure Before detailing the specifics of this system card template,it is necessary to clarify the status of legal compliance within this framew
197、ork.Often,ethical and legal issues are on the same document(model card)but this raises tricky questions,101 as ethics and compliance play distinct roles in assurance.As noted by one expert participant,assurance shouldnt just be about setting the ground of what is legally acceptable,but instead encou
198、rage people to go beyond this.Just because it is legal,doesnt mean it is moral.102 It is crucial for system cards to address both legal and ethical concerns,but their role must be carefully distinguished.Ethical due diligence must build upon the foundations of legal compliance,not the other way arou
199、nd.Internal review teams will need to be meticulous in ensuring legal 101 Interview with government representative,19 October 2023.102 Interview with academic expert,8 June 2023.Assurance of Third-Party AI Systems for UK National Security 32 compliance and go beyond this to actively promote ethical
200、best practice.To enable this,legal compliance is separated from ethical due diligence in this system card.It should be borne in mind that the completed system card itself may have more general legal and compliance relevance,for example as information or evidence in a subsequent inquiry into the use
201、of technology or in relation to compliance with authorisations or warrants in respect of data handling or analysis.3.2.1 Summary information To enable successful communication about AI systems,including among non-technical users,assurance cases need to be accessible to a range of stakeholders.To ena
202、ble this,our system card begins with summary information before delving into more in-depth analysis of system properties.As with the algorithmic transparency recording standard,the existence of this summary information should not be taken as an excuse for the remainder of the system card to become i
203、naccessible to non-expert audiences.103 Nevertheless,providing summary information is important to encourage senior decision-makers who are pressed for time to engage with the assurance process.As noted by one senior decision-maker,what you need is headline points and the assurance that people who h
204、ave deep expertise in a trusted role have had a close look.104 This summary information is therefore intended as a supplement rather than alternative to the detailed evidence set out below.Table 5:System card section one Part 1:Summary Information Instructions System details:Please provide AI system
205、 name,1-2 sentence description of the system and its constituent components,version,and implementation so far.105 Mission objectives fulfilled and use cases across the organisation:Please summarise the positive contributions made by the system towards the organisations goals and give an account of h
206、ow load bearing the AI system may be across the organisation.106 103 HM Government,Algorithmic Transparency Recording Standard Hub(CDDO and CDEI:January 2023),https:/www.gov.uk/government/collections/algorithmic-transparency-recording-standard-hub.104 Interview with government representative,8 Augus
207、t 2023.105 Hugging Face,“Annotated Model Card Template,”https:/huggingface.co/docs/hub/model-card-annotated.106 Interview with government representative,8 August 2023.Rosamund Powell and Marion Oswald 33 Internal roles and responsibilities:Detail the key internal decisionmakers responsible for filli
208、ng out and reviewing this system card,including policy,legal,and technical expertise,and clear separation between the roles of filling out the system card with relevant evidence,and assessing the completed system card.Supply chain summary:Please summarise the information given in part 3,including li
209、st of organisations/departments responsible for design,development,and deployment&at least one contact for each organisation/department.License:If applicable,details of the licensing/procurement arrangement are to be provided here.Summary and key take-aways:Please summarise key take-aways from the f
210、ollowing sections(mission properties&legal compliance,performance&security,ethics).A red/amber/green scale may be used to highlight sections of concern.Iterative review summary:Provide dates for any anticipated updates to the AI system and for next review and update of this system card.3.2.2 Mission
211、 properties and legal compliance This system card section covers internal foundational checks that must be conducted by the deploying organisation(i.e.the national security body),without industry contributions.Evidence set out here should focus a)on the positive contribution the AI system can bring
212、to the organisation and b)the legal status of its application in its stated use context.Table 6:System card section two Part 2:Mission properties and legal compliance Instructions Context and scope of use107 A)Delineate clear parameters for AI system use:Set out who in the organisation will be using
213、 this AI system,how often,and for what purpose.If the AI system in question is being repurposed by the national security body 107 HM Government,Algorithmic Transparency Recording Standard Hub(CDDO and CDEI:January 2023),available at:https:/www.gov.uk/government/collections/algorithmic-transparency-r
214、ecording-standard-hub.Assurance of Third-Party AI Systems for UK National Security 34 from the purpose for which it was designed,this should be flagged here.This section should also set out any prohibited uses that have been identified as risky.B)Account for how the AI system will impact existing or
215、ganisational processes and existing workers:Set out the extent of integration of this system,both with existing human decision-making processes,108 and with existing technology systems.Where relevant,this may include reference to an assessment of the impact the AI system will have on employees worki
216、ng conditions,for example through a Good Work Algorithmic Impact Assessment.109 C)Non-algorithmic options considered:110 Please detail why the AI system in question is preferential to the non-algorithmic options available,including a comparison to the current method for completing this task if relev
217、ant.Legal basis The legal basis,requirements and powers for the development and use of the AI system alongside other legal compliance requirements that the assurance process will help to support should be set out.This section may include but is not limited to:The overarching statutory or legal funct
218、ions for which the AI system is being developed.Any limitations,restrictions,or constraints on the exercise of data acquisition and/or analysis for the purposes of national security or other purposes,including those within the Investigatory Powers Act and associated warrants and authorisations.Consi
219、deration of the human rights principle of necessity and proportionality111 in relation to the development and use of the AI system.Any requirement for the tools output to be used evidentially or in legal proceedings.Licensing/model acquisition Provide details of the model/software licensing agreemen
220、t(or other procurement structure such as bespoke development),including details of contractual transparency requirements and other protections.Links to contracts should be included here for 108 HM Government,Algorithmic Transparency Recording Standard Hub(CDDO and CDEI:January 2023),https:/www.gov.u
221、k/government/collections/algorithmic-transparency-recording-standard-hub.109 Institute for the Future of Work,“Good Work Algorithmic Impact Assessment,”IFOW Guidance(March 2023),https:/www.ifow.org/publications/good-work-algorithmic-impact-assessment-an-approach-for-worker-involvement.110 HM Governm
222、ent,Algorithmic Transparency Recording Standard Hub(CDDO and CDEI:January 2023),https:/www.gov.uk/government/collections/algorithmic-transparency-recording-standard-hub.111 Ardi Janjeva,Muffy Calder and Marion Oswald,“Privacy Intrusion and National Security in the Age of AI:Assessing proportionality
223、 of automated analysis,”CETaS Research Reports(May 2023).Rosamund Powell and Marion Oswald 35 further detail,and details of the invitation to tender(ITT)process should be set out if this took place(including possible assurances which were requested in the ITT process).3.2.3 The supply chain System c
224、ards can help address the lack of visibility over the supply chain by increasing transparency.However,completing this section will present a challenge for government and suppliers alike.Co-completion of the system card by government and supplier is crucial to getting a full picture,ensuring system c
225、ards are not just a one-and-done affair but a place where collaborators are working together to collectively address the problem.112 Three system card sub-sections are proposed to account for AI supply chains.First,the system card template must be filled out with a mapping of the supply chain,focuse
226、d on organisations and individuals who contribute in some way to the final AI system.It is essential to identify relevant contributors to the lifecycle as far as is possible,both to enable communication across multiple organisations about the AI system and to enable future responses to algorithmic h
227、arms.Ideally,at least one vetted and cleared individual will contribute to the system card for sensitive use cases to facilitate fully open discussions.113 Next,users of the system card must address questions of provenance.This section offers space for suppliers to link to further details on data pr
228、ovenance and also includes considerations of model provenance and system provenance,in addition to sourcing of compute and hardware.114 As set out by Dstl,a layered approach ensures adequate attention is paid to granular components of a system(data,hardware,compute)in addition to the final products(
229、models,systems,even systems-of-systems).115 Where industry suppliers are unwilling or unable to supply this information,contractual protections may be used to enforce transparency(as accounted for in section 3.3).If not,it will be up to national security customers to fill this section in as far as p
230、ossible before deciding whether they can accept residual risk.Finally,the system card gives room for additional evidence to be submitted in the form of a supply chain risk assessment.This can help provide a fuller picture where evidence gathered on provenance is deemed insufficient.And,it can accoun
231、t for broader supply 112 Ian Brown,“Expert Explainer:allocating accountability in AI supply chains,”Ada Lovelace Institute Paper(June 2023),https:/www.adalovelaceinstitute.org/resource/ai-supply-chains/.113 Interview with government representative,19 July 2023.114 Interview with industry expert,21 J
232、uly 2023.115 HM Government,Assurance of Artificial Intelligence and Autonomous Systems:A Dstl Biscuit Book(Dstl:2021),https:/www.gov.uk/government/publications/assurance-of-ai-and-autonomous-systems-a-dstl-biscuit-book.Assurance of Third-Party AI Systems for UK National Security 36 chain concerns,fo
233、r example issues of ethical due diligence,legal compliance,and secure practices down the supply chain.Table 7:System card section three 116 This model of the AI lifecycle was developed by The Alan Turing Institute and accounts for the highly sociotechnical nature of AI design,development and deploym
234、ent.See Christopher Burr and David Leslie,“Ethical Assurance:A Practical Approach to the Responsible Design,Development,and Deployment of Data-Driven Technologies,”AI and Ethics 3(2023):73-98.117 Kiran Karkera,“Why is provenance important for AI,”Kiran Karkera Medium,10 July 2020,https:/kaal- 3:The
235、Supply Chain Instructions Supply chain mapping&industry contributors Please identify whether the following stages of the AI lifecycle116 were government-led or industry-led.Please also attribute each stage to a specific organisation,or for organisations over 100 people,to a specific department.Addit
236、ionally,please nominate a point of contact at each relevant organisation,or at each department at larger organisations.Their role should be described,both regarding the project lifecycle itself and the co-completion of this system card.Any vetted and cleared contributors from industry should be iden
237、tified as potential collaborators on this system card.Source:Model of the AI lifecycle,reproduced from Burr&Leslie,2023.Provenance Provenance here is defined as the chronology of the ownership,custody or location of a historical object,117 and should be accounted for with regard to:Rosamund Powell a
238、nd Marion Oswald 37 3.2.4 Performance and security Performance has been central to model cards since their inception and is a cross-sector priority for AI.123 As a result,it is one of the most well accounted for features in existing model cards.In contrast,deep consideration of AI security is often
239、lacking.As noted by one government expert,I would hope in the real world most companies are good at the other 118 Edd Gent,“Public AI Training Datasets Are Rife With Licensing Errors,”IEEE Spectrum,8 November 2023,https:/spectrum.ieee.org/data-ai.119 Interview with government representative(2),19 Ju
240、ly 2023.120 Australian Government,Critical Technology Supply Chain Principles(Government of Australia:2021);MITRE,“System of Trust Framework,”https:/sot.mitre.org/framework/system_of_trust.html.121 Interview with industry expert,4 August 2023.122 Interview with industry expert,21 July 2023.123 Marga
241、ret Mitchell et al.,“Model cards for model reporting,”in FAT*19:Proceedings of the Conference on Fairness,Accountability,and Transparency(New York:Association for Computing Machinery,2019),220-229.A)Data:What training data was used?Where was it sourced?Please link to full datasets if possible and pr
242、ovide details of any updates to datasets through the AI lifecycle.Please link to audits of relevant datasets where available(for instance through the data provenance initiative).118 B)Hardware:Please detail the hardware feeding into this system,including details of how it was sourced.C)Compute:Pleas
243、e detail the source of compute for this system and how ongoing compute requirements will be met.D)Model:Please provide details of each of the models which feed into this system,including any prior iterations of these models.E)System:Please account for how the above components were combined to create
244、 the final system,including details of any further components not accounted for above.Supply chain risk assessment Various forms of evidence may be submitted here,to include:Reports from government site visits to assess suppliers.119 Evidence of compliance with established frameworks for supply chai
245、n security e.g.MITREs system of Trust Framework or the Australian Governments Critical Technology Supply Chain principles.120 Completed questionnaires from suppliers which detail how their data collection process was a)legally compliant and b)ethical.121 Assessments of whether suppliers other custom
246、ers may raise security concerns.122 Assurance of Third-Party AI Systems for UK National Security 38 testing what they might be missing is this security stuff.124 As a result,we stick closely to Hugging Face proposals for documenting AI performance,adapting them for a national security context.We the
247、n make initial recommendations for including AI security on a system card,proposing further work is needed to explore best practice for documenting AI security features.Performance metrics should be highly tailored to the use context.125 The risk is not poor performance in general,but rather that pr
248、oducts are the jack of all trades,but not necessarily the master of what you need.126 The system card should not simply detail results from performance evaluations,but the rationale for choosing a particular metric for a particular context.127 In most cases,the national security body will need to do
249、 their own performance tests to supplement supplier assessments which cannot fully replicate the final use context.Performance should be disaggregated across a variety of factors,with careful consideration given to the foreseeable salient factors for which model performance may vary.128 Users of the
250、 system card should justify the way in which performance has been disaggregated.129 Performance metrics should be taken as just one part of a much larger picture.A precision,accuracy,recall,or F1 score delivered without context can give the appearance that performance has been robustly assessed,but
251、without explanation of its results in the wider system this can be illusory.130 Existing model cards only tell you so much and they dont tell you how you defended the data against poisoning or other more specific things.131 In line with this,participants wanted system cards to include more detail on
252、 AI security,but recognised this would require longer term research I honestly think the tool that we need is way,way more research in this area.132 For instance,AI standards were cited as offering useful evidence that suppliers have done due diligence on AI security.However,with many standards left
253、 in draft,suppliers are left in a tricky position where ground rules are not fully established and evidence becomes less clear cut.133 Below,we account for how features of AI security may 124 Interview with government representative,5 July 2023.125 Margaret Mitchell et al.,“Model cards for model rep
254、orting,”in FAT*19:Proceedings of the Conference on Fairness,Accountability,and Transparency(New York:Association for Computing Machinery,2019),220-229.126 Interview with law enforcement lawyer,4 July 2023.127 Margaret Mitchell et al.,“Model cards for model reporting,”in FAT*19:Proceedings of the Con
255、ference on Fairness,Accountability,and Transparency(New York:Association for Computing Machinery,2019),220-229.128 Ibid.129 Ibid.130 CETaS research workshop,25 September 2023.131 Interview with government representative,5 July 2023.132 Interview with government representative,5 July 2023.133 CETaS r
256、esearch workshop,25 September 2023.Rosamund Powell and Marion Oswald 39 be evidenced in the short term,with further research needed to offer more robust evidence in this area.Table 8:System card section four Part 4:Performance and Security Instructions Performance Please provide results from context
257、-specific performance metrics and detail the rationale for selecting these metrics.This section should include details on precision and recall at different classification thresholds,the classification thresholds that have been used,robustness to out-of-sample inputs,live incident rates,and,where rel
258、evant,an account of error likelihood.134 For each result given,the rationale for selecting the specific metric should be given alongside the rationale for disaggregating results in the way that has been chosen(e.g.according to gender,ethnicity,or other relevant considerations).Security Please detail
259、 all available evidence that AI security has been considered throughout the project lifecycle.Evidence presented here may include:Compliance with international standards on AI security,for example ISO/IEC 42001 alongside other relevant ISO and IEEE standards.135 Evidence of compliance with NCSC prin
260、ciples on security of AI or guidelines for secure AI system development.136 Reports from red teaming exercises and adversarial testing.137 Details of data hosting/management plans.138 Description of implementation of AI security protocols laid out by MITRE ATLAS or OWASP.139 134 CETaS research works
261、hop,25 September 2023.135 CETaS research workshop,25 September 2023.136 Interview with government representative(2),19 July,2023;NCSC,“Principles for the security of machine learning,”NCSC Guidance,31 August 2022,https:/www.ncsc.gov.uk/collection/machine-learning;NCSC,“Guidelines for secure AI syste
262、m.development,”November 2023,https:/www.ncsc.gov.uk/files/Guidelines-for-secure-AI-system-development.pdf.137 CETaS research workshop,25 September 2023.138 CETaS research workshop,25 September 2023.139 MITRE,“MITRE ATLAS(Adversarial Threat Landscape for Artificial Intelligence Systems),”https:/atlas
263、.mitre.org;OWASP,“AI Security and Privacy Guide,”https:/owasp.org/www-project-ai-security-and-privacy-guide/.Assurance of Third-Party AI Systems for UK National Security 40 Where possible,please provide details of residual security risks to facilitate ongoing monitoring.3.2.5 Ethical considerations
264、Compared to other properties documented through model and system cards,ethical considerations can be highly abstract,making it even harder for those responsible for AI system design and development,particularly from a technical background,to understand and implement them.The companion guidance discu
265、ssed in Section 3.1.will therefore be particularly important to support users of this system card section.In particular,such guidance will enable future iterations of this system card template to point those filling it out in the direction of national security vetted AI assurance techniques,rather t
266、han the more generalist methodologies listed by the OECD and CDEI.However,in the case of AI ethics there is one further gap that must be filled in by national security bodies.Specifically,they must agree on common definitions of the core principles they wish to prioritise.This has been done in defen
267、ce,140 as well as in the context of broader government AI policy,as set out in a pro-innovation approach to regulation.141 GCHQ have already defined what they consider to be the major challenges to ethical AI listing fairness,transparency and accountability,empowerment,and privacy.They have also gon
268、e further in defining these challenges,often in practical terms.For example,fairness has been defined with reference to three key obstacles which must be overcome for AI systems to be considered fair namely data fairness,design fairness,outcome fairness.142 Already,this can help to guide users of th
269、is system card template towards which ethical considerations they should address.However,the national security community is yet to commit to a final set of principles.We recommend they must do so in order to support this assurance process.In doing so,we propose they prioritise principles that are gr
270、ounded in the real-world impacts of AI systems,that they translate principles such that they correspond directly to the needs of teams who are responsible for assurance,and that they complement and bolster principles which have already been defined in the legal context(e.g.necessity and proportional
271、ity).Ultimately,140 HM Government,Ambitious,safe,responsible:our approach to the delivery of AI-enabled capability in Defence(Ministry of Defence:June 2022),https:/www.gov.uk/government/publications/ambitious-safe-responsible-our-approach-to-the-delivery-of-ai-enabled-capability-in-defence/ambitious
272、-safe-responsible-our-approach-to-the-delivery-of-ai-enabled-capability-in-defence#annex-a-ethical-principles-for-ai-in-defence.141 HM Government,A pro-innovation approach to AI regulation(DSIT and Office for AI:August 2023),available at:https:/www.gov.uk/government/publications/ai-regulation-a-pro-
273、innovation-approach/white-paper.142 GCHQ,“Pioneering a New National Security:The Ethics of AI,”2021,https:/www.gchq.gov.uk/artificial-intelligence/index.html.Rosamund Powell and Marion Oswald 41 such a list of principles and their precise definitions should be included in the below system card secti
274、on in place of GCHQs stated ethical AI challenges.Table 9:System card section five Part 5:Ethical considerations Instructions Please detail how the below set of ethical challenges have been addressed by the project team throughout the AI lifecycle:Fairness Transparency and accountability Empowerment
275、 Privacy In doing so,you should consider drawing on the techniques for responsible AI set out in CDEIs portfolio of assurance techniques and the OECDs tools for trustworthy AI,both of which include reference to a range of assurance techniques from external audits to technical fairness assessments,AI
276、 standards,and impact assessments.143 Please note that it will often be relevant to include multiple pieces of evidence to evidence a single ethical principle,and to make clear how your evidence supports the stated end goal.3.2.6 Iterative requirements The final stage of the system card sets out pla
277、ns for ongoing monitoring and future assessment.Model and system cards in their current form have been critiqued for being too static.144 Simply setting a timeline for future review is insufficient.This system card should be easily updated to account for changes to governance processes,real-world im
278、pacts,and system updates from suppliers.For example,one participant noted that the system card should assess how people are using the model and the downstream impacts of these interactions.145 Others repeatedly emphasised the need to create a process which can eventually accommodate AI systems that
279、are constantly learning and updating.146 If the 143 HM Government,CDEI portfolio of AI assurance techniques(Centre for Data Ethics and Innovation:2023),https:/www.gov.uk/guidance/cdei-portfolio-of-ai-assurance-techniques;OECD,“Catalogue of Tools and Metrics for Responsible AI,”https:/oecd.ai/en/cata
280、logue/tools.144 Interview with government representative,19 July 2023.145 Interview(2)with law enforcement member of staff,6 July 2023.146 CETaS research workshop,25 September 2023.Assurance of Third-Party AI Systems for UK National Security 42 national security body wishes to re-use a previously su
281、pplied algorithm in a new context,or combine it with other AI systems,they will need at the very least to revisit this system card,updating it with new evidence.They may even need to start a new system card if a large quantity of existing evidence has been rendered outdated or irrelevant.Table 10:Sy
282、stem card section six Part 6:Iterative requirements Instructions Evidence of internal skills base to effectively use the system AI literacy needs to improve if third-party AI tools are to be effectively assessed and monitored.147 National security teams should justify that they have plans in place t
283、o upskill internal teams to become effective users of new AI systems.This could include descriptions of training to be conducted prior to deployment or of data science and AI policy representation within the team.Ongoing monitoring provision,protections against accidental misuse&impact mitigation pl
284、an:What tests have been put in place to monitor the impacts of the system as it is deployed?Are mechanisms put in place to allow users to report errors?How do these feed into decisions about any updates or potential model retirement?It may be relevant to include a link to an internal plan for impact
285、 monitoring and mitigation which sets out in depth protocols for dealing with pre-identified potential adverse impacts.148 The necessity of this should be determined by national security bodies depending on how high-risk they judge the use of an AI system to be.Details of timelines:A)Timeline for sy
286、stem updates:This system card should account for future updates to AI systems,being updated with each supplier update or retraining cycle.In the future,this system card should be trialled 147 CETaS research workshop,25 September 2023.148 David Leslie et al.,Artificial Intelligence,Human Rights,Democ
287、racy,and the Rule of Law:A primer(Council of Europe:2021),https:/edoc.coe.int/en/artificial-intelligence/10206-artificial-intelligence-human-rights-democracy-and-the-rule-of-law-a-primer.html.Rosamund Powell and Marion Oswald 43 on online learning AI systems to assess the extent to which it can beco
288、me a living document.149 B)Timeline for system card review:Set a timeline for review of the system card.It may be relevant to review a system even if it has not been updated,for example in response to impact monitoring or to changes in scope of use,or when approaching the end of an authorised data r
289、etention period.National security bodies should commit to timelines in advance while also remaining flexible to bring reviews forward when needed.149 Interview with government representative,19 July 2023.Assurance of Third-Party AI Systems for UK National Security 44 3.3 Assessing evidence The fitne
290、ss-for-purpose of assurance processes depends on the existence of tangible evidence that can justify the claims made,and the ability of decision-makers to assess the evidence in the context of their own risk acceptance threshold.The question then shifts to the evaluation of that evidence and who wit
291、hin an organisation or potentially externally within a regulatory/audit/oversight structure will assess the validity and robustness of the evidence provided,and thus provide approval for a project to proceed or require mitigations.In other words,the process of deciding what the evidence reveals abou
292、t the system being assured.3.3.1 Skills to review system cards The interconnected and interdependent nature of many models make this assessment role a particular challenge.As Cobbe et al.point out Statistical guarantees may not hold when systems are composed together,and it is not straightforward to
293、 evaluate a whole system when each individual component may have been evaluated under different threat models(or other criteria).150 According to Brown,issues that must be considered not only relate to performance,but to copyright,data protection,product liability/negligence,equalities/bias and huma
294、n rights(including those of workers involved in developing AI).151 Therefore,assessment of evidence put forward to address such broad issues is not likely to be a one-person or one-discipline job.Much will depend upon transparency mechanisms that enable a flow of critical information,in particular f
295、rom the supplier(s)to the customer,and on the knowledge,understanding and critical skills of the persons carrying out the evaluation task.Our interviewees generally agreed that a range of people should be involved in the reasoning and evaluation process,proposing a range of ways in which this could
296、be operationalised in practice:150 Jennifer Cobbe,Michael Veale and Jatinder Singh,“Understanding accountability in algorithmic supply chains,”in FAccT 23:Proceedings of the 2023 ACM Conference on Fairness,Accountability,and Transparency(New York:Association for Computing Machinery,2023)1186-1197.15
297、1 Ian Brown,“Expert Explainer:allocating accountability in AI supply chains,”Ada Lovelace Institute Paper(June 2023),https:/www.adalovelaceinstitute.org/resource/ai-supply-chains/.Rosamund Powell and Marion Oswald 45 One suggested a multi-disciplinary board which would inform a senior responsible of
298、ficer on technical,community issues,legal and wider ethical questions(e.g.source of training data etc.)in order to understand the risks.152 Another commented that I would see it as a sociotechnical thing.It is not just the model itself,checking the accuracy of the model,but also how people are using
299、 the model and the downstream impacts of these interactions.153 It was further suggested that,due to the mission requirements that a model will be designed to achieve,people who are close to the use case itself will be best placed to make a lot of these context specific judgments.154 Concern was exp
300、ressed,however,that the required multidisciplinary in-house expertise was in limited supply,and ethics or multidisciplinary boards may not have the capacity to review all systems.155 Furthermore,approvers or oversight boards may be reluctant to take ultimate accountability for approving a model,espe
301、cially in circumstances where no one person may have sufficient understanding or visibility of the whole system.156 Ultimately,the oversight of third-party AI must become integrated with wider institutional processes for assessing risk,and therefore part of senior management responsibility,as other
302、risks are.Interviewees had mixed opinions regarding community engagement with assurance.Many were supportive of the idea in principle but uncertain of the feasibility or appropriate process.The danger of participation-washing was mentioned if community engagement was surface-level only or lacking in
303、fluence.Within a national security context where specific operational information could not be shared,it was suggested that a scenario or hypothetical context could inspire deliberation of the benefits and harms of technologies,with the outcomes feeding into the assurance process:It is easier to inv
304、olve impacted groups in higher level decisions and steering or policy rather than specifics.157 152 Interview with law enforcement lawyer,4 July 2023.153 Interview with law enforcement expert,6 July 2023.154 Interview with industry expert,21 July 2023.155 Interview with government representative,5 J
305、uly 2023.156 Jennifer Cobbe,Michael Veale and Jatinder Singh,“Understanding accountability in algorithmic supply chains,”in FAccT 23:Proceedings of the 2023 ACM Conference on Fairness,Accountability,and Transparency(New York:Association for Computing Machinery,2023)1186-1197;David Gray Widder and Da
306、wn Nafus,“Dislocated accountabilities in the AI supply chain:Modularity and developers notions of responsibility,”Big Data and Society(2023).157 Interview with academic expert,6 July 2023.Assurance of Third-Party AI Systems for UK National Security 46 3.3.2 The process of evidence assessment Evaluat
307、ion of evidence was not regarded by our interviewees as a one-off task,but a process of rolling review against set timelines or factors(as acknowledged in GCHQs BAILO process).158 This process must cover deployment into a real system159 and further review if errors or concerns are detected or report
308、ed.This view reflects the conclusion in literature that assurance should be seen as an ongoing process to improve practices.160 In terms of evidence presentation,observability,combined with live monitoring/audit,and structured,accessible information were regarded as essential.It was suggested that o
309、ld-fashioned site visits should be included in the assurance process.161 Having access to cleared/vetted individuals within suppliers was said to be important,162 as was the ability to obtain independent operational testing results.163 However,a key issue for the process of evaluation is determining
310、 where you set the threshold for risk.At what point do you sign off on capability and determine that this risk is acceptable.164 It will be necessary in the overall assurance process to clarify the response to certain evidence presented-such as accuracy levels,performance against specified standards
311、 and data provenance-and therefore where the red lines will fall.These levels of tolerance will differ across use cases and depending upon potential harm and urgency.For instance,should a provider failing to reveal the source of their training data be a red flag in defence and national security?165
312、ALARP(as low as reasonably practicable)was mentioned by one interviewee,but in the wider context of whether society would regard the decision made as tolerable or acceptable.ALARP is a concept used in safety-critical industries such as aviation as a goal in relation to the management of health and s
313、afety risks.166 Taddeo et al.recommend the ALARP framework as a way to tackling the AI risk predictability problem,with a greater duty 158 GCHQ/Bailo,“Bailo managing the lifecycle of machine learning to support scalability,impact,collaboration,compliance and sharing,”GitHub,https:/ Interview with ac
314、ademic expert,4 July 2023.160 Jessica Morley et al.,“From what to how:an initial review of publicly available AI ethics tools,methods and research to translate principles into practices,”Sci Eng Ethics(August 2020).161 Interview with government representative,19 July 2023.162 Interview with governme
315、nt representative,19 July 2023.163 Interview with law enforcement lawyer,4 July 2023.164Interview with industry expert,21 July 2023.165 Interview with government representative,26 June 2023.166 Health and Safety Executive,“Risk management:Expert guidance-ALARP at a glance,”https:/www.hse.gov.uk/enfo
316、rce/expert/alarpglance.htm.Rosamund Powell and Marion Oswald 47 of care being required for higher stakes decisions.167 However,it may be an oversimplistic way of assuring the use of third-party models within the national security context,bearing in mind the categories of information that we recommen
317、d are included in the system card approach above,and the importance of the overarching statutory framework,in particular the legal concepts of necessity and proportionality.168 Generally,our interviewees did not consider independent oversight bodies as integral to assurance processes,although it was
318、 noted that Commissioners and other regulators may require visibility of assurance.It was,nevertheless,regarded as deserving of further consideration,provided the oversight body had the appropriate expertise,particularly on the technical side.169 Establishing assessor functions internally risks mark
319、ing your own homework favourably.Checks and balances within the public sector architecture should be established to mitigate this risk,including through industry secondments.170 3.3.3 Contractual protections It was clear from our research interviews that the integration of contractual requirements a
320、nd protections was essential to the success of any assurance process.As one interviewee put it,contract is king.171 Contractual warranties might in some circumstances mitigate lack of disclosure due to trade secrecy concerns,although incomplete knowledge may not be sufficient in a national security
321、context.Contractual clauses may cover specific requirements such as:The ability to conduct audits and spot-checks(including by an independent third-party),and to access summary information.172 Such audits may require the supplier to provide access to their intellectual property via trusted safe harb
322、our or escrow settings.Data provenance including evidence trails on sourcing of datasets,particularly in complex supply chains.167 Mariarosaria Taddeo et al.,“Artificial Intelligence for national security:the predictability problem,”CETaS Research Reports(September 2022).168 Ardi Janjeva,Muffy Calde
323、r and Marion Oswald,“Privacy Intrusion and National Security in the Age of AI:Assessing proportionality of automated analysis,”CETaS Research Reports(May 2023).169 Interview with government representative,22 June 2023.170 Interview with industry expert,21 July 2023.171 Interview with law enforcement
324、 lawyer,4 July 2023.172 Interview with academic expert,4 July 2023.Assurance of Third-Party AI Systems for UK National Security 48 Requirements to report and remedy faults,and mechanisms to preserve a snapshot of a system when harm occurs.As noted by Brown,AI systems can change with new inputs or tw
325、eaks to their architecture.This means saving time-stamped versions of systems so that the cause of harms can be examined later,as happens already with self-driving vehicles.173 In national security contexts,consideration should be given to additional or specific transparency and liability requiremen
326、ts in contracts around the use of foundation models,174 open source,off-the-shelf or generic components and particular training data,or other material that may cause security or misuse concern.This could include restrictions or limitations on use plus mandating extra compliance and transparency resp
327、onsibilities.Research participants also raised the need for boilerplate confidentiality and transparency conditions to be enhanced to include transparency around other customers of the supplier which may cause security concerns,175 and limitations around improvements to the system stemming from the
328、customers datasets or other information.Contracts and procurement processes also provide the opportunity to set out definitions of key assurance terms,and to specify concrete obligations such as disclosure of bias and accuracy testing results.For examples of draft standard clauses addressing the abo
329、ve issues,see example clauses drafted by the City of Amsterdam on:technical transparency(including technical specifications/source code and data inputs);procedural transparency(including the choices and assumptions made)and explainability(including,where necessary and appropriate,requirements on the
330、 supplier to be able to explain on an individual level why the tool has come to a particular decision and provision of any information required for legal proceedings).176 173 Ian Brown,“Expert Explainer:allocating accountability in AI supply chains,”Ada Lovelace Institute Paper(June 2023),https:/www
331、.adalovelaceinstitute.org/resource/ai-supply-chains/.174 Defined here as AI models designed to produce a wide and general variety of outputs,and capable of a range of possible tasks and applications,such as text,image or audio generation.See:Elliot Jones,“Explainer:what is a foundation model?”Ada Lo
332、velace Institute Paper(July 2023),https:/www.adalovelaceinstitute.org/resource/foundation-models-explainer/.175 The National Security and Investment Act 2021 permits scrutiny of corporate acquisitions and mergers that may cause a national security risk.A similar concept could be reflected in contrac
333、tual terms for changes in the provider that might raise similar risks.176 Government of Amsterdam,“Contractual terms for algorithms,”https:/www.amsterdam.nl/innovation/digitalisation-technology/algorithms-ai/contractual-terms-for-algorithms/.Rosamund Powell and Marion Oswald 49 Standard contractual clauses for AI procurement have also been published by the EU to support trustworthy,fair,and secure