《GitGuardian:2021年泄密扩散状态报告(英文版)(23页).pdf》由会员分享,可在线阅读,更多相关《GitGuardian:2021年泄密扩散状态报告(英文版)(23页).pdf(23页珍藏版)》请在三个皮匠报告上搜索。
1、The state of Secrets Sprawl on GitHubHOW LEAKY CAN IT GITGITGUARDIAN STATE OF SECRETS SPRAWL ON GITHUB2SummarySecrets Sprawl 4Findings 7Where leaks come from 10Why 11What type of secrets do we find 12File extensions that cause data breaches 13Pro bono alerting 16What happens after a leak 17Recommend
2、ations 20To conclude 21GITGUARDIAN STATE OF SECRETS SPRAWL ON GITHUB3GitHub is more than ever“The Place to Be”for developers when it comes to innovating,collaborating and networking.This amazing“octoverse”gathers more than 50 million developers working on their personal and/or professional projects.
3、So when 60 million repositories are created in a year and nearly 2 billion contributions*are added,some mistakes can happen,such as leaked secrets,Intellectual Property or PII.Some companies may think:I dont really care about public GitHub,we are not open sourcing our code,everything is stored on ou
4、r private repositories.But what about the developers of these companies they most likely have open source repositories and can leak secrets.*State of the octoverse 2020GITGUARDIAN STATE OF SECRETS SPRAWL ON GITHUB4Lets now focus on secrets.You would say that secrets stored in internal Version Contro
5、l Systems is a very bad practice but in fact it is much more frequent than you would think.But why is that?API keys,database connection strings,private keys,certificates,usernames and passwords As organizations move to cloud architectures,SaaS platforms and microservices,developers handle increasing
6、 amounts of sensitive information,more than ever before.To add to that,companies are pushing for shorter release cycles,developers have many technologies to master,and the complexity of enforcing good security practices increases with the size of the organization,the number of repositories,the numbe
7、r of developer teams and their geographical spread.As a result,secrets are spreading across organizations,particularly within the source code.This pain is so huge that it even has a name:Let us introduce you to the concept of“secrets sprawl”and how this can lead to public exposure of some of your mo
8、st sensitive assets.Secrets Sprawlweve uncovered millions of secrets and sent nearly 1 million pro bono alerts to developers in 2020 alone.At GitGuardian,weve been monitoring every single commit pushed to public GitHub since July 2017.Three and a half years laterGITGUARDIAN STATE OF SECRETS SPRAWL O
9、N GITHUB5Keeping secrets encrypted and tightly wrapped makes it harder for developers to both access and distribute them.This can lead developers to choose the path of least resistance when handling them which may include hardcoding them into source code,distributing them through email or messaging
10、systems like Slack,saving them directly into config files and storing them inside internal wikis.Once secrets start to enter different systems:Attackers can move laterally through infrastructure You lose visibility over where secrets end up.SECRETSSECRETS SPRAWLA secret can be any sensitive data tha
11、t we want to keep private.When discussing secrets in the context of software development,secrets generally refer to digital authentication credentials that grant access to services,systems and data.These are most commonly API keys,usernames and passwords,or security certificates.Secrets are what tie
12、 together different building blocks of a single application by creating a secure connection between each component.Secrets grant access to the most sensitive systems.Learn more about secrets on our blogCommitA commit is an incremental change that has been made to an individual or set of files.When m
13、aking a commit,the difference(or diff)between the current version of files and the previous version is saved,including data that was removed.6GITGUARDIAN STATE OF SECRETS SPRAWL ON GITHUBSo here is a deep dive into what we findGITGUARDIAN STATE OF SECRETS SPRAWL ON GITHUB72.535251public commits scan
14、ned/daymore repositories created last yearmore contributionsto open source projectspublic commits scanned/yearalmostWHAT ARE WE LOOKING ATAND THE VOLUME IS GROWING*State of the octoverse 2020MB%GITGUARDIAN STATE OF SECRETS SPRAWL ON GITHUB81585of leaks on GitHub occur within public repositories owne
15、d by organizations.of the leaks occur on developers personal repositories.%Secrets present in all these repositories can be either personal or corporate and this is where the risk lies for organizations as some of their corporate secrets are exposed publicly through their current or former developer
16、s personal repositories.more thanoverA GROWING NUMBERWHAT DO WE FINDWHERE DO WE FIND THE SECRETScompared to previous year20%+52secrets detected/daysecrets detected in 2020KM9GITGUARDIAN STATE OF SECRETS SPRAWL ON GITHUBWe launched this audit,and several leaked secrets were brought to our attention.W
17、hat was very interesting and what we didnt anticipate was that most of the alerts came from the personal code repositories of our developers.Anne Hardy,CISO10GITGUARDIAN STATE OF SECRETS SPRAWL ON GITHUBIndia BrazilUnited StatesNigeriaFranceRussiaUKCanadaBangladeshIndonesia01 02 03 04 05 06 07 08 09
18、 10Where leaks come fromTOP 10GITGUARDIAN STATE OF SECRETS SPRAWL ON GITHUB11Usually these leaks are unintentional,not malevolent.They happen because:Developers typically have one GitHub account that they use both for personal and professional purposes,sometimes mixing the repositories.It is easy to
19、 misconfigure git and push wrong data.It is easy to forget that the entire git history is still publicly visible even if sensitive data has since been deleted from the actual version of source code.WhyHuman error exists,but the key is to be alerted and be able to take appropriate action when a leak
20、is found.Anne Hardy,CISOHuman error is nothing you can avoid and prevent,especially if it is not an error but just laziness,or even provoked,implement a risk based approach and simply add many layers to prevent it in your whole lifecycle.David Dos Neves-Munich Re12GITGUARDIAN STATE OF SECRETS SPRAWL
21、 ON GITHUB12What type of secrets do we findSecrets are digital authentication credentials that grant access to services,systems and data(API keys,usernames and passwords,or security certificates).The volume and diversity of these digital authentication credentials is growing fast as architectures mo
22、ve to the cloud but also rely on more and more components and apps.All these categories of secrets expose companies to easy and direct attacks.Cloud provider and data storage secrets by data loss but also by allowing infrastructure suppression.Identity provider and messaging system by allowing legit
23、imate identity usage.Social networkCloud providerAWS,Azure,Google,Tencent,AlibabaData storageMySQL,Mongo,PostgresOtherincluding CRM,cryptos,identity providers,payments systems,monitoringDevelopment tools Django,RapidAPI,OktaPrivate keysMessaging systemsDiscord,Sendgrid,Mailgun,Slack,Telegram,TwilioV
24、ersion Control PlatformGitHub,GitLabGoogle keys27.6%15.9%15.4%11.1%8.4%6.7%1.9%0.8%0.4%12%Collaboration toolsAsana,Atlassian,Jira,Trello,ZendeskOur larger customers,with 2,000 or more employees,deploy an average of 175 apps per customer,while our smaller customers,with 1,999 or fewer employees,deplo
25、y an average of 73 apps per customer.*Okta13GITGUARDIAN STATE OF SECRETS SPRAWL ON GITHUBAs you might expect,with the many programming languages,frameworks and coding practices adopted throughout the world,there is a very long list of extensions that can contain secrets,here is the view of the top 1
26、0.Top 10 file extensions account for 81%of all the results The top 3 accounting for over 56%of the results File extensions can be grouped into 3 categories Programming languages:Python,JavaScript,PHP,TypeScript Data serialization files:JSON,XML,YAML,.properties Forbidden or sensitive files:.env,.pem
27、Learn more about how secrets leak through file extensions on our blog*.File extensions that cause data breaches on GitHubPythonAll othersJavaScriptEnvironmentJSONPropertiesPEMPHPXMLYAMLTypeScript27.9%19.1%18.8%9.7%7.5%4%3.6%3.2%2.2%2.1%2%TOP 1013*Read the articleGITGUARDIAN STATE OF SECRETS SPRAWL O
28、N GITHUB14Publicly disclosed examples of recent data breaches through leaked credentials.EXAMPLES OF SECRETS LEAKSUN Data Breach*January 2021.gitcredentials in a public repository giving hackers access to private repositories with sensitive information.Starbucks Data Breach*January 2020JumpCloud API
29、 key found in GitHub repository.Equifax Data Breach*April 2020Leaked secrets in personal GitHub account granted access to sensitive data for Equifax customers.Uber Data Breach*May 2014Hackers discovered credentials in a personal public repository on GitHub that granted access to a database containin
30、g private information of thousands of Uber drivers.*Read the article*Read the article*Read the article*Read the articleGITGUARDIAN STATE OF SECRETS SPRAWL ON GITHUB15A user that first writes his code with credentials in the code so that it is easier to write/debug,he then forgets to remove it from a
31、ll his files after his work is done.He then commits and pushes his changes.When he understands that he made a mistake,either he does a deletion commit or a push force so that the secrets do not appear in his current version.Most of the time,he forgets that git and the internet are not forgiving:Secr
32、ets can be accessed in the git history even if they arent in the current version of code anymore,and public data hosted on GitHub can be duplicated and cloned into multiple different locations.WHAT USUALLY HAPPENSThis is when a user pushes professional work on a personal repository while not really
33、understanding git/GitHub.In his repository,we find shell commands history,environment files as well as copyrighted content.When the developer understands that he made a mistake,he only adds a deletion commit(or multiples if he doesnt find all leaks at the same time).This commit has a message such as
34、“remove secrets from repo”.The credentials that he leaked will not be revoked and will remain public in his git history.WHEN IT REALLY GOES WRONGThats blazing fast,GitGuardian Recently,I Pushed my Flask app with Postgres URI of Heroku Database.And within 5 minutes or so,I received a warning about th
35、at.TBH,Fall in love with this Tweet treatments12(1)(2)Usage When using Tweets in your marketing,make sure they are real and exist on the platform.Also,dont alter the message.Other things to know:(1)For Tweet treatments,to closely reflect our service,use Helvetica Neue Regular for the handle,the Twee
36、t,and timestamp.Use Helvetica Neue Bold for the username.(2)Dark mode Tweets can be used as an alternative to white when the color scheme or context feels appropriate.(3)If youre using a Tweet+Media template,dont alter the image.Dont pull elements out of context,editorialize,or discriminate based on
37、 content.Always credit Tweets by displaying the accounts full name and handle,and credit Twitter by using the our logo.Twitter cant provide permission to use third-party Tweets,logos,or images.If youre using third-party content,please consult with your legal team to assess any legal risk.If the Twee
38、ts are your own,youre free to use or display them,so long as you comply with these guidelines.(3)GitGuardian hey folks,I owe you another beer.Today Ive committed another DO API key to the public repository Thanks for your service Tweet treatments12(1)(2)Usage When using Tweets in your marketing,make
39、 sure they are real and exist on the platform.Also,dont alter the message.Other things to know:(1)For Tweet treatments,to closely reflect our service,use Helvetica Neue Regular for the handle,the Tweet,and timestamp.Use Helvetica Neue Bold for the username.(2)Dark mode Tweets can be used as an alter
40、native to white when the color scheme or context feels appropriate.(3)If youre using a Tweet+Media template,dont alter the image.Dont pull elements out of context,editorialize,or discriminate based on content.Always credit Tweets by displaying the accounts full name and handle,and credit Twitter by
41、using the our logo.Twitter cant provide permission to use third-party Tweets,logos,or images.If youre using third-party content,please consult with your legal team to assess any legal risk.If the Tweets are your own,youre free to use or display them,so long as you comply with these guidelines.(3)2GI
42、TGUARDIAN STATE OF SECRETS SPRAWL ON GITHUB16Pro bono alerting937,539700,000558,085860,000ALERTS WERE SENT PRO BONO UNIQUE REPOSITORIESDEVELOPERS WERE ALERTED PRO BONOUNIQUE COMMITSSuch knowledge of leaked credentials comes with a great responsibility.We alert developers in a pro bono manner.Here is
43、 an idea of the volume of alerts we sent in 2020.IT REPRESENTEDI try to never rewrite git history but when GitGuardian mailed me today about leaked keys I was on that rewrite like there is no tomorrow!Thanks GitGuardian!Tweet treatments12(1)(2)Usage When using Tweets in your marketing,make sure they
44、 are real and exist on the platform.Also,dont alter the message.Other things to know:(1)For Tweet treatments,to closely reflect our service,use Helvetica Neue Regular for the handle,the Tweet,and timestamp.Use Helvetica Neue Bold for the username.(2)Dark mode Tweets can be used as an alternative to
45、white when the color scheme or context feels appropriate.(3)If youre using a Tweet+Media template,dont alter the image.Dont pull elements out of context,editorialize,or discriminate based on content.Always credit Tweets by displaying the accounts full name and handle,and credit Twitter by using the
46、our logo.Twitter cant provide permission to use third-party Tweets,logos,or images.If youre using third-party content,please consult with your legal team to assess any legal risk.If the Tweets are your own,youre free to use or display them,so long as you comply with these guidelines.(3)2GITGUARDIAN
47、STATE OF SECRETS SPRAWL ON GITHUB17What happens after a leakGitGuardians algorithm reaction to a leak is 4 seconds(Mean Time To Detect).The alert is sent right away.When a secrets detection solution is in place,security teams also receive dual alerts to make sure they can follow up,remediate and rep
48、ort easily on security incidents.25 minutes Median Time To React.The developer is on the front line of the issue,which allows to nullify most of the potential damage very quickly,if the developer takes immediate action after the alert.GITGUARDIAN STATE OF SECRETS SPRAWL ON GITHUB18If you leave your
49、keys to your house in the lock and you notice they are gone then you change the locks.Allan Alford19GITGUARDIAN STATE OF SECRETS SPRAWL ON GITHUBGitignore is not a Vault!REMINDER Gitignore allows you to tell what file you dont want to commit.Your files containing your secrets should be listed in you
50、r gitignore file but your secrets should not be described in plain text in your gitignore fileHundreds of developers committed this mistake in 2020.Dont bury the secretREMINDER If you search GitHub for“removed AWS key”you will see thousands of results.Removing a hardcoded secret and pushing a new co
51、mmit only buries the secret in the history,making it harder for you to find but still accessible to attackers.GITGUARDIAN STATE OF SECRETS SPRAWL ON GITHUB20RecommendationsCompanies cant avoid the risk of secrets exposure even if they put in place centralized secrets management systems.These systems
52、 are typically not deployed on the whole perimeter and are not coercitive as they do not prevent developers from hardcoding credentials stored in the vault.Solutions are available for them to automate secrets detection and put in place the proper remediation,but the market is far from mature on this
53、 subject.Companies need to scan not only public repositories but also private repositories to prevent lateral movements of malicious actors.Some best practices can be followed to limit the risk of secrets exposure or the impact of a leaked credential:Never store unencrypted secrets in.git repositori
54、es Dont share your secrets unencrypted in messaging systems like Slack Store secrets safely Restrict API access and permissions.Following best practices is not sufficient and companies need to secure the SDLC with automated secrets detection.Choosing a secrets detection solution they need to take in
55、to account:Monitoring developers personal repositories capacities Secrets detection performance*Accuracy,precision&recall Real-time alerting Integration with remediation workflows Easy collaboration between Developers,Threat Response and Ops teams.*Learn more about detection performanceDevelopers tr
56、aining programs should be put in place although these do not eradicate the risk of leaked credentials.GITGUARDIAN STATE OF SECRETS SPRAWL ON GITHUB21There are millions of commits per day on public GitHub,how can organizations look through the noise and focus exclusively on the information that is of
57、 direct interest to them?How can they make sure their secrets are not ending on their developers personal repositories on GitHub?They cant avoid that developers have personal repositories,they need automated detection and efficient remediation tools.To concludeIn this state of secrets sprawl on GitH
58、ub analysis we focused on secrets although this is not the only sensitive information that can end up being publicly exposed:Intellectual Property,personal and medical data are also at risk.But this is for another State of Report!GITGUARDIAN STATE OF SECRETS SPRAWL ON GITHUB22GitGuardians secrets de
59、tection engine has been running in production since 2017,analyzing billions of commits coming from GitHub.Since day one we began to train and benchmark our algorithms against the open source code.It allowed GitGuardian to build a language agnostic secrets detection engine,integrating new secrets or
60、new way of declaring secrets really fast while keeping a really low number of false positives.We have developed the vastest library of specific detectors being able to detect more than 200 different types of secrets*.*You can find the exhaustive list hereABOUT GG DETECTION ENGINE,DATA GATHERING&METH
61、ODOLOGYWe are also collecting feedback from the alerts we are sending including the pro bono alerts:Explicit feedback when a developer or security team marks an alert as a false alert.Implicit feedback when a developer takes down a public repository or deletes a public commit a few minutes after we
62、sent an alert.Our secrets detection engine is High precision:We want to keep a low number of false positives to avoid alert fatigue.High recall:We want to keep a low number of secrets missed to keep our customers safe.Fast:While speed is less important than recall and precision our secrets detection
63、 engine is designed to be fast and scan a common git repository history under a minute.Community and customer driven:Our engine is constantly trained and improved by the feedback of the hundreds of thousands developers using our applications and by the feedback of our customers.GitGuardian is solvin
64、g the issue of secrets sprawling through source code,a widespread problem that leads to some credentials ending up in compromised places or even in the public space.The company solves this issue by automating secrets detection for Application Security and Data Loss Prevention purposes.GitGuardian helps developers,ops,security and compliance professionals secure software development,define and enforce policies consistently and globally across all their systems.GitGuardian solutions monitor public and private repositories in real time,detect secrets and alert to allow investigation and quick