《GitGuardian:应用安全秘密检测白皮书(英文版)(26页).pdf》由会员分享,可在线阅读,更多相关《GitGuardian:应用安全秘密检测白皮书(英文版)(26页).pdf(26页珍藏版)》请在三个皮匠报告上搜索。
1、Whitepaper|Implementing Automated Secrets Dection for Application Security 1/26Implementing Automated Secrets Detection for Application SecurityHow do we secure the new way of building software?Applications are no longer standalone monoliths,they now rely on thousands of building blocks:cloud infras
2、tructure,databases,SaaS components such as Stripe,Slack,HubSpot This is a significant shift in software development.Dev&Ops teams from large organizations use thousands of secrets like API keys and other credentials in order to interconnect these components together.As a result,they now have access
3、to more sensitive information than companies can keep track of.The risk is that these secrets are now spreading everywhere.We call“secrets sprawl”the unwanted distribution of secrets in all the systems developers use.Secrets sprawl is even more difficult to control with growing development teams,som
4、etimes spread over multiple geographies.Not even taking into consideration that developers are under hard pressure due to a growing number of technologies to master and shortened release cycles.In this whitepaper,we look at the implications of secrets sprawl,and present solutions for Application Sec
5、urity to further secure the SDLC by implementing automated secrets detection in their DevOps pipeline.What developers call a secret is anything that allows access to a system,often programmatically.API keys,private keys,database credentials,security certificates are perfect examples.Secrets are keys
6、 to the kingdom:they give access to cloud infrastructure,SaaS components,databases,internal portals or microservicesUnderstanding the benefits of mitigating secrets sprawlWhat are the threats associated with secrets sprawl?A focus on secrets in source code:why are they so bad?Challenges associated w
7、ith secrets sprawlGitGuardian:automated secrets detection throughout the SDLCWhy is it hard to detect secrets?Remediating exposed secretsAbout GitGuardian456911152022Whitepaper|Implementing Automated Secrets Dection for Application Security 4/2601Understanding the benefits of mitigating secrets spra
8、wl Whitepaper|Implementing Automated Secrets Dection for Application Security 5/26Understanding the benefits of mitigating secrets sprawlwhat are the threats associated with secrets sprawl?No company wants credit card numbers in plaintext in databases,PII in application logs,bank account credentials
9、 in a Google Doc.Secrets benefit from the same kind of protective measures.As a general security principle,where feasible,data should remain safe even if it leaves the devices,systems,infrastructure or networks that are under organizations control,or if they are compromised.It is no surprise that cr
10、edential stealing is a well-known adversary technique described in the MITRE ATT&CK framework.MITRE ATT&CK T1081:Credential Access/Credential in Files“Adversaries may search local file systems and remote file shares for files containing passwords.These can be files created by users to store their ow
11、n credentials,shared credential stores for a group of individuals,configuration files containing passwords for a system or service,or source code/binary files containing embedded passwords.”SourceOf course the term“passwords”must be taken in the broadest sense,and Application Security professionals
12、prefer to talk about secrets.Secrets accessed by malicious threat actors can lead to information leakage and allow lateral movement or privilege escalation,as secrets very often lead to other secrets.Furthermore,once an attacker has the credentials to operate like a valid user,it is extremely diffic
13、ult to detect the abuse and the threat can become persistent.Whitepaper|Implementing Automated Secrets Dection for Application Security 6/26Understanding the benefits of mitigating secrets sprawlSurprisingly,secrets stored in source code is the current state of the world although this is admittedly
14、a bad thing.Source code is made to be duplicated and distributed,therefore lives in multiple places.Source code is a leaky asset and you never know where it is going to end up:it can be cloned to a compromised workstation or server,intentionally or accidentally published in whole or in part,uploaded
15、 to your website,released to a customer,pasted in Slack,end up in your package manager or mobile application Additionally,it would just take one compromised developer account to compromise all the secrets they have access to.Hardcoded credentials make it very difficult to know what secrets a develop
16、er accessed,and almost impossible to roll keys after they leave.The shift to the“everything-as-code”(containers,infrastructure-as-code,policy-as-code,etc.)is making secrets even more ubiquitous in the tech stack,and harder to keep in control.01020304A focus on secrets in source code:why are they so
17、bad?Spotlight on CodecovThe Codecov case is an interesting textbook case.A favorite of many open-source projects,Codecov is a code coverage tool.Between January 31st and April 1st,2021,it was compromised by attackers who were able to extract all of the environment variables of Codecovs customers.The
18、y gained initial access by extracting a static GCP service account credential from one layer of Codecov Docker image,which enabled them to tamper with a downstream CI script.Attackers were thus able to piggyback on Codecov to enter its users private code repositories,exposing many more secrets.Sourc
19、eWhitepaper|Implementing Automated Secrets Dection for Application Security 7/26How bad can it git?:the NCSU study that reports thousands of credentials leaked on public GitHub Per day.Here are some key takeaways:-Independent study.-Large scale study:millions of repositories and billions of files sc
20、anned,with over 200k credentials detected.-Keys leaked at a rate of thousands per day.-Conservative approach,targeting only 15 different types of API keys and 4 asymmetric private key types.“Consequently,our work is not exhaustive but rather demonstrates a lower bound on the problem of secret leakag
21、e on GitHub.The full extent of the problem is likely much worse than we report.”-Secrets are often leaked accidentally,not intentionally.-High confidence that most of these secrets are indeed sensitive.-Developer inexperience(measured as a small number of repos with few contributions on GitHub)is no
22、t strongly correlated with leakage.SourceUnderstanding the benefits of mitigating secrets sprawlWhitepaper|Implementing Automated Secrets Dection for Application Security 8/26In 2021,GitGuardian detected over 6 million secrets pushed to GitHub,twice the amount compared to 2020.On average,3 commits o
23、ut of 1,00 exposed at least one secret,a fifty percent increase compared to the previous year.SourceUnderstanding the benefits of mitigating secrets sprawlWhitepaper|Implementing Automated Secrets Dection for Application Security 9/2602Challenges associated with secrets sprawlWhitepaper|Implementing
24、 Automated Secrets Dection for Application Security 10/26Challenges associated with secrets sprawlThe git history makes it more complicated than first thought.Most vulnerabilities like cryptography weaknesses or SQL injection vulnerabilities only express themselves the moment the code is deployed.Ex
25、posed secrets are unlike these vulnerabilities,because any secret reaching version control system must be considered compromised and requires immediate attention.This is true even if the code is never deployed.Implementing secrets detection is not only about scanning the most actual version of your
26、master branch before deployment.It is also about scanning through every single commit of your git history,covering every branch,even development or test ones.Why do code reviews fail at secrets detection?Reviewers are only concerned with the difference between current and proposed states of the code
27、,not with the entire history of the project.If a commit adds a secret and another one later deletes it,this has a zero net effect that is not of any interest to reviewers.But the vulnerability is there!Reviewers prefer to focus on errors that cannot be automatically detected,like design flaws.As a g
28、eneral principle,security automation should be implemented wherever it can,so that humans focus on where they bring the most value.Enforcing good security practices at the organization level is hard.Difficulty increases with the size of the organization,number of repositories,number of development t
29、eams and their geographies,Best practices to prevent secrets sprawl include:Educating developers on why they must not hardcode secrets in code,ticketing systems,share them through messaging systems or in a Dropbox or a Wiki.Educating developers on how to safely store,share and retrieve secrets.Imple
30、menting automated secrets detection.Of course,educating developers to not hardcode secrets in source code is a great starting point,but it is hardly scalable and still leaves too much space for human errors.Plus code reviews notably fail at detecting secrets,and take time and energy that would rathe
31、r be spent on things where developers deliver the most value.0102Whitepaper|Implementing Automated Secrets Dection for Application Security 11/26Challenges associated with secrets sprawlHomegrown tools and scripts are hard to build,maintain and keep up-to-date.Some companies have built internal tool
32、s,often derived from Open Source.There are many Open Source tools that help you find leaked secrets,like truffleHog.Build vs Buy is an old dichotomy and you probably already have an opinion about it.An enterprise-grade solution is expected to provide precision,coverage and ease-of-use guarantees tha
33、t come with tight integration into your workflows,without the burden of having to maintain it and keep it up-to-date.03A lot of source code lives in the history This Django contributions graph is very common.There are as many additions than there are deletions!Deletions does not mean that the code c
34、annot be accessed anymore.Deleted only means buried!Django contributions graphSource:https:/ secrets detection throughout the SDLC Whitepaper|Implementing Automated Secrets Dection for Application Security 13/26GitGuardian:automated secrets detection throughout the SDLCLike SAST,DAST,dependency scan
35、ning or container scanning,secrets detection takes hard work and is an Application Security category in itself.But its even more than that.Let us guide you through some of the key principles to automate secrets detection throughout your SDLC.WHERE IN THE SDLC TO IMPLEMENT AUTOMATED SECRETS DETECTION
36、?The git protocol uses hooks to trigger certain actions at certain times in the software development process.There are client-side hooks,that execute locally on developers workstations,and server-side hooks,that execute on the centralized version control system.Here are some general principles about
37、 fitting security into your DevOps pipeline:The earlier a security vulnerability is uncovered,the less costly it is to correct.Hardcoded secrets are no exceptions.If the secret is uncovered after the secret reaches centralized version control server-side,it must be considered compromised,which requi
38、res rotating(revoking and redistributing)the exposed credential.This operation can be complex and involves multiple stakeholders.People bend the rules,often in an effort to collaborate better together and do their job.Security must not be a blocker.It should allow flexibility and enable information
39、to flow,yet enable visibility and control.On one hand,security measures will be bypassed,sometimes for the worst.But on the other hand,it is also good sometimes that the developer can take the responsibility to bypass them.Whitepaper|Implementing Automated Secrets Dection for Application Security 14
40、/26GitGuardian:automated secrets detection throughout the SDLCSecrets detection is probabilistic:algorithms achieve a tradeoff between not raising false alerts and not missing keys.Which means that even the best algorithms can fail and need human judgement.The previous principles advocate for the fo
41、llowing:Client-side secrets detection early in the software development process is a nice to have:implement pre-commit or pre-push hooks when possible.The good thing with pre-commit is that the secret is never added to the local repository.This comes in handy since removing a secret from the git his
42、tory can be very tricky,even client-side(server-side is even harder and requires to force push).Server-side secrets detection is a must have:depending on the size of your organization,enforcing client-side secrets detection might not be an easy task,as this requires access to your developers worksta
43、tions.Weve heard many times from Application Security professionals that this is not something they felt confident to do.In any case,keep in mind that client-side hooks can(and must,secret detection being probabilistic)be easy to bypass,hence the absolute necessity for server-side checks where the u
44、ltimate threat lies:the remote centralized repository.Pre-commit hooks put the onus on developers to keep their code free from secrets before contributing to the teams codebase.Low cost of remediation.Easy two-click integration of GitGuardian with GitHub,GitLab,and BitBucket provides complete visibi
45、lity over your repositories.ggshield CLI and pre-commit hooks need to be set up individually for each developer.Monitoring at the VCS level is not preventive.Security teams need to stay on high alert when new incidents are raised.Remediation cost is high.ProsProsConsConsWhitepaper|Implementing Autom
46、ated Secrets Dection for Application Security 15/26GitGuardian:automated secrets detection throughout the SDLCYou can also integrate GitGuardian anywhere in your SDLC using our API,which can be self-hosted on premise.For example,the API can be used to create a pre-push check or be integrated seamles
47、sly in a CI pipeline.Secrets detection is probabilistic:some secrets are easier to find than others.There is a tradeoff between low number of false alerts and low number of missed credentials.Good secrets detection is a two-step process:harvest presumed credentials first,then get rid of your worst c
48、andidates.Each step can be achieved through a variety of methods,but it is really the subtle combination of all these methods that achieves the best performance!How to get started implementing secrets detection?With the nature of git comes a unique challenge.Most security vulnerabilities only expres
49、s themselves in the actual version of the source code,once used in production.But old commits can contain valid secrets.-First,scan existing code history(all commits from all branches in all projects)to start on a clean basis.-Then continuously scan all incremental changes,every time a new commit is
50、 pushed to any branch of any project.Why is it hard to detect secrets?Whitepaper|Implementing Automated Secrets Dection for Application Security 16/26GitGuardian:automated secrets detection throughout the SDLCSTEP 1:HARVEST CANDIDATESMETHODCONSPROSEntropy:look for strings that appear random.Regular
51、expressions:match known,distinct patterns.Good for penetration testing,open sourcing a project or bug bounties because it brings a lot of results.These results must be reviewed manually.-Low number of false alerts.-Known patterns make it easier to later check if the secret is valid or not or if this
52、 is an example or test key(see Step 2).-Unknown key types will be missed.-Credentials without a distinct pattern will be missed,which means lots of missed credentials!Think about passwords that can be virtually any string in many possible contexts,APIs that dont have a distinct format,-Lots of false
53、 alerts(it is very frequent to see URLs,file paths,database IDs or other hashes with high entropy),which makes it impossible to use this method alone in an automated pipeline.-Some keys are inevitably missed because the entropy threshold to be applied depends on the charset used to generate the key
54、and its length.Whitepaper|Implementing Automated Secrets Dection for Application Security 17/26GitGuardian:automated secrets detection throughout the SDLCSTEP 2:FILTER BAD CANDIDATESMETHODCONSPROSLook for known sensitive patterns in the context of the candidate.The idea is to aggregate weak signals.
55、For example,a sensitive filename,combined with an assignment variable containing the word key in it,and the import of a Python wrapper for the Datadog API.Validate the candidate by doing an API call against the associated service.Use a dictionary of anti-patterns to get rid of certain example or tes
56、t keys.The presumed credential should not contain linguistic sequences of characters.Allows to filter certain credentials like those containing EXAMPLE or TEST or XXXX in them,or those found in test files or directories.There is no real con,this method is always good to implement,but wont be able to
57、 filter all examples or test keys.There can be no more doubt,your candidate is valid!Plus you can use the opportunity to gather information about permissions associated with the key and account owner.This information is useful for prioritization and remediation purposes.-You need to know the associa
58、ted service,or at least come up with a list of potential services.-Not all credentials can be easily checked programmatically.Think about OAuth strings,private keys,usernames and passwords,-Some services are not accessible from anywhere(like outside of a given private network),so the credential migh
59、t be considered invalid despite still posing a threat.Often allows to associate a presumed credential with a given service depending on the code surrounding it.This is helpful to validate the candidate by doing an API call,see next method!The notion of“context”is difficult to define(think of a large
60、 commit patch or file for example,or a variable declared in one location and used somewhere else in the repository).Whitepaper|Implementing Automated Secrets Dection for Application Security 18/26GitGuardian:automated secrets detection throughout the SDLCstep 3:GitGuardians secret sauce!Which is not
61、 a secret anymore(as can be seen on our blog!)Weve raised hundreds of thousands of alerts already,including pro bono alerts on public GitHub.When raising alerts,we gather both implicit and explicit feedback:Explicit feedback when a developer or security team marks an alert as a false alert.Implicit
62、feedback when a developer takes down a public repository or deletes a public commit a few minutes after we sent an alert.This feedback is then injected into our algorithms.Whitepaper|Implementing Automated Secrets Dection for Application Security 19/26GitGuardian:automated secrets detection througho
63、ut the SDLCFinding a secret in source code is like finding a needle in a haystack There are a lot more sticks than there are needles,and you dont know how many needles might be in the haystack.In the case of secrets detection,you dont even know what all the needles look like!As a cybersecurity vendo
64、r,customers often ask us about the precision of our secrets detection algorithms.What is the percentage of the secrets that you detect that are actual secrets?.This question is perfectly legitimate,especially in the context of security teams being overwhelmed with too many alerts.Alarm fatigue is no
65、t the only pain.Considering the impact that a single undetected credential leak can have for an organization,were also often asked:How many secrets do you miss?.Ideally,you want your detection system to achieve at the same time:-A low number of false alerts raised,and-A low number of secrets missed.
66、Balancing the equation to ensure that the algorithm captures as many secrets as possible without flagging too many false results is an intricate and extremely difficult challenge that takes a dedicated team.Whitepaper|Implementing Automated Secrets Dection for Application Security 20/26GitGuardian:a
67、utomated secrets detection throughout the SDLCWhat about the programming language that is analyzed?This is the easy part of secrets detection,which,for the most part,is not language specific.Of course,there are some subtleties to take into account,like the way variables are assigned in any programmi
68、ng language.But there is no need to support all the different syntaxes in their greatest details.The same algorithms can be applied to any project,in any programming language.A few other aspects to consider:-When building algorithms for probabilistic scenarios,they will change over time.There is no
69、perfect solution that can remain the same,trends will change,secrets will change,data will change,formats will change and therefore,your algorithm will need to change.-You might want to be able to implement custom detectors,for example in order to detect API keys giving access to internal microservi
70、ces specific to your company.Whitepaper|Implementing Automated Secrets Dection for Application Security 21/26GitGuardian:automated secrets detection throughout the SDLCThere are two key aspects for implementing successful remediation process at scale:Use playbooks,process flows running a series of a
71、utomated jobs without the manual intervention of your security engineers.Involve the developers.At GitGuardian,we believe developers should be front-and-center when it comes to remediating incidents they are responsible for.For this,you have the choice between providing external access to incidents(
72、we call this feature Developer-In-The-Loop),or inviting them to join the workspace to view their incidents in-app.Once you have put everything together to halt the progress of secrets sprawl with the help of developers,it will be time to tackle historical incidents(secrets found after running the hi
73、storical scan of your repositories).It is common to see thousands of incidents on this backlog,which means planning for a long-term strategy will be required.If you are interested in knowing more about the remediation strategies we implemented in the past,dont hesitate to contact GitGuardian.Remedia
74、ting exposed secretsEvery time a secret is pushed to the git server,it must be considered compromised and revoked.The remediation process is a big part of implementing a secrets detection program for an organization.It should be conceived as a shared responsibility between Development,Operations and
75、 Application Security teams.Revoking the secret might require special rights or approvals,some secrets might be harder to revoke than others,secrets need to be renewed and redistributed without impacting production systems and development work.04About GitGuardianWhitepaper|Implementing Automated Sec
76、rets Dection for Application Security 23/26About GitGuardianGitGuardian is a global cybersecurity startup focusing on code security solutions for the DevOps generation.Founded in 2017 by Jrmy Thomas and Eric Fourrier,GitGuardian has emerged as the leader in secrets detection and is now focused on en
77、abling the Shared Responsibility Model of AppSec by starting first with getting the developers experience right.Widely adopted by developer communities,GitGuardian is used by more than 200 thousand developers and is the#1 app in the security category on the GitHub Marketplace.Its enterprise-grade fe
78、atures truly enable AppSec and Development teams in a collaborative manner to deliver a secret-free code.Its detection engine is based on 350 detectors able to catch secrets in both public and private repositories and containers at every step of the CI/CD pipeline.GitGuardian raised 56M$to date and
79、is backed by prominent investors including Scott Chacon,Co-Founder of GitHub,and Solomon Hykes,Founder of Docker.It provides two tools aimed at securing two different perimeters.The first product,GitGuardian Public Monitoring,scans all public GitHub,at scale,in real-time.The product links developers
80、 with their companies,and then monitors these developers,especially on their personal repositories,where 80%of the corporate leaks on GitHub occur.Companies often dont know that these repositories exist,and dont have visibility on them,let alone the authority to enforce security measures there.The p
81、roduct comes in the form of a SaaS dashboard used by Incident Response,Threat Intelligence,and Application Security teams to find leaked credentials,investigate and remediate quickly.The second product,GitGuardian Internal Repositories Monitoring,scans corporate repositories,private or open source.T
82、he product is natively integrated with GitHub,GitLab and Bitbucket.It includes an API as well to integrate anywhere in your SDLC and tools used by your developers.The product comes in the form of a dashboard used by Application Security teams to detect credentials and collaborate with developers to
83、remediate quickly.Available in SaaS and On Prem.Whitepaper|Implementing Automated Secrets Dection for Application Security 24/26About GitGuardianHow we sell cybersecurity software at GitGuardian:our ManifestoAs you may have noticed,most of our resources are ungated on our website.We believe our expe
84、rtise on this application security topic is unique and should be accessible by the widest possible audience.If you need more information regarding secrets detection and remediation,including on the technical aspects,we encourage you to contact us.Our technical sales team will seek to understand your
85、 needs and recommend the appropriate solution.Contact UsOur commitments:We help first:if you dont want to jump directly in a call with our sales reps,we are happy to share materials with you upfront.This way you can evaluate whether or not having a conversation with our reps is worth your time.Consu
86、ltative approach:we will come up early in the sales process with a structured,straightforward questionnaire to help you evaluate your needs and requirements,weigh them so that you can compare us with your alternatives.Radical transparency:we are always keen on sharing the technical details of what w
87、e do with your technical teams.Even our secret sauce is,well,not a secret anymore!Directness:if we feel we are not a good fit for your needs,we will let you know early in the process,and suggest relevant alternatives.Products that are easy to test:-For GitGuardian Public Monitoring:weve been monitor
88、ing the whole GitHub public activity and detecting secrets leaked there for over three years now.During the sales process,if you allow us to do so,we will show your security team the GitGuardian dashboard populated with actual data from your companys perimeter.Whitepaper|Implementing Automated Secre
89、ts Dection for Application Security 25/26About GitGuardian-For GitGuardian Internal Monitoring:you will be given access to a free trial with unlimited features for you to test the product in real conditions before potentially buying.Simple,predictable pricing.You wont need an advanced degree to understand our pricing! 2022 GitGuardian.All Rights Reserved.