There’s no way around it: bugs happen. And sometimes, those bugs expose you to bad actors who have less-than-pure intentions for you and your customers. At Cedar we have the responsibility to protect the healthcare data of millions of patients; that’s not something we take lightly. Everyone has a right to the privacy of their healthcare data, and proactively addressing security threats before they have customer impact is a top priority.
Security vulnerabilities and risks snowball in complexity as they move through the development process. The good news? Fixing an authorization vulnerability during a pair coding session rather than in production is more efficient.
At Cedar we believe in moving security to the left, collaborating with developer teams to introduce security as early as possible. At the far left of the development lifecycle we fix vulnerabilities at the architecture review level—before a single line of code is written. That’s because at the rightmost side of the development lifecycle is production, where vulnerabilities are exposed to real attackers. Fixing issues in the architecture and design phase is always the goal, but in practice is difficult to automate and relies on human processes.
If (or when) humans fail and vulnerabilities sneak through the architecture and design review process, the next best place for your team to catch them is during implementation. Unlike the architecture and design review process, code is much easier to analyze using tools. Automation to catch vulnerabilities during implementation is an opportunity for a security team to have an outsized impact on risk across the organization.
Static code analysis is a staple of security focused CI/CD; it is an automated process which reviews an entire code base for security issues, catching bugs before deployment and reducing the risk of humans missing crucial issues. Although there are many benefits to static code analysis, we were repeatedly running into results that were outdated, and/or shuffled off into a data store without ever being actioned.
We addressed these issues by building code scanning which runs on every commit and provides developers with real time presentation of findings. Vulnerabilities found via static application security testing (SAST) automatically send a Slack message to the security team. This provides opportunities to collaborate and educate developers about security vulnerabilities.
Identifying the vulnerabilities
Static code analysis parses through our codebase and looks for known vulnerable code. We use Semgrep as our scanner of choice, as it has rulesets populated by the community for all major languages and libraries. When we first implemented static code analysis at Cedar, we ran Semgrep across our code base on a nightly basis and piped the results into Kondukto as our vulnerability management system.
There were a few issues associated with this pattern.
- By running the code only on a nightly basis, the findings were outdated and had already been merged into master by the time the scan flagged vulnerabilities.
- There wasn’t automation to ensure that findings were reviewed and actioned.
- Critical findings did not have automated alerting which diminished our ability to meet SLAs.
- Engagement with developers occurred on a manual basis and it was difficult for developers to remember all the details of old code
Embarking on a GitHub journey
Because of the vulnerabilities we identified, we turned to GitHub. We use GitHub Actions to perform linting and end-to-end workflow testing on each commit as a requirement for merging. Our new GitHub action scans the code in the pull request and leaves a comment if any vulnerabilities are found.
Initially we scanned every file modified within the pull request. But, this flagged a lot of vulnerabilities which were not introduced by the author, and the lack of relevance irritated our developers. The comments we were leaving on pull requests got ignored. Once we started scanning only the individual lines added in the pull request, we got much more engagement and cooperation from our teams.
When we were able to reliably catch findings and surface those findings to our developer teams, we needed a way to alert ourselves as the security team. Finding live vulnerabilities as they are introduced is an opportunity to educate developers and foster a security minded culture. So, we had the GitHub Action send our internal security slack channel a ping when new findings were detected.
Meaningfully reducing our risk
We identified three features that our static code analysis workflow needed to meaningfully reduce our risk:
- Scanning must occur on a timely basis, ideally on every commit;
- Developers need to be alerted about findings in real-time with information that allows them to remediate the issue;
- And finally, the security team should be engaged on any new findings, which implies that our scanning needs to produce high signal and low noise output.
Based on our security team’s bandwidth, we defined “high signal” as less than five findings in any given week. We decided on five findings based on the amount of time they took to triage. For a typical finding, it took approximately fifteen minutes to examine the finding and reply to the developer. When we accounted for the bandwidth cost of context switching, this number could realistically double, so eliminating false positives was crucial.
The largest sources of false positives were vulnerabilities in third-party libraries. Semgrep would almost always flag something in new external libraries. Static code analysis of third party code can be useful, but at Cedar we’ve found more effective strategies to reduce the risk of vulnerable third party libraries such as version pinning and research into known vulnerabilities. Once we eliminated scans for external libraries, we found that there were recurring false positives specific to design patterns that we used in our test code. These were only vulnerabilities if there was the potential for malicious input to be passed to the test code. Eliminating scanning of test directories was a huge win in tuning our system for high signal findings.
Keeping the false positive level low by tuning our alerting gives us the space to engage thoughtfully with every vulnerability that the static code analysis flagged. One of the common pitfalls in static code analysis workflows is to rely on the reported severity of findings. With Semgrep in particular, the rated severity is only broadly grouped into “Info”, “Error”, and “Critical”. Findings are assigned a severity based on whoever initially wrote the detection rule, so there is no established framework associated with this risk determination.
By keeping our reports high signal, we are able to investigate each finding regardless of severity. This prevents us from being at the mercy of inconsistent severity ratings and potentially missing critical findings.
Once the static code analysis system was up and running, we were able to see results right away: three XSS bugs, two local file inclusions, and one deserialization issue were identified and fixed within the first 90 days. Each of those bugs provided us with the opportunity to engage with our developer teams, educate them about the vulnerability and let them know that we’re here to help.
The power of catching bugs early
It's important that you understand your users' workflows so you don't introduce additional friction when developing new tools. If we had launched a beta testing group with our first iteration of scanning, then we would have identified the issue with whole-file scanning before our tool irritated developers across the company.
One of our security philosophies at Cedar is to explain “why” with empathy. Engaging with developer teams as early as possible via familiar communication channels provides the opportunity to explain security vulnerabilities in a collaborative setting. Our role as a security team is not to act as gatekeepers, but to empower teams to make smart decisions about security risks through education. This project hammered home the importance of respecting existing workflows and engaging personally to explain with empathy. And, these communication principles have proven much more effective and idiomatic than spamming developers with indecipherable vulnerability descriptions!
Static Code Analysis has the potential to reduce risk across your organization, supplementing manual code reviews and reducing the risk of humans missing critical vulnerabilities. GitHub Actions makes it easy to integrate Semgrep in your existing CI/CD workflows, so get out there and start hacking! If you have any questions feel free to reach out to our team at security@cedar.com, we are always happy to help.
Max Chen is a Sr. Security Engineer at Cedar. To learn more about Max, visit his personal website here.