Ghosts of Log4j: Open-source vulnerabilities confound software developers
Illustration: Si Weon Kim
Most of the code in typical applications comes from open-source projects, importing dozens — and often, hundreds — of components created by volunteers. As the Log4j incident shows, those deep dependencies can carry critical vulnerabilities.
January was a busy month for Max Thauer.
As a consultant for cyber incident response firm Mandiant, Thauer found himself responding to a massive influx of calls from companies that needed to investigate potential attacks and breaches. The culprit? A series of vulnerabilities in an unassuming open-source component, Log4j 2. The vulnerabilities allowed attackers to scan for and exploit software that included the popular logging library, a population of code estimated at anywhere from 8 percent to 15 percent of Java-based software applications, according to experts.
The vulnerabilities — disclosed in early December — led to published exploit code, and soon after, to attackers scanning the internet for vulnerable applications and services.
“My January was pretty consumed with Log4j — just back-to-back-to-back investigations,” Thauer told README. “All of them specifically revolved around it.”
Log4j was not a solitary event. The world of application security is haunted by the unseen dependencies of open-source software (OSS).
In late March, for example, vulnerabilities in specific components of the widely used Spring web application framework left some Java-based applications vulnerable. And JavaScript has an even greater propensity to be affected by unseen dependencies: The popular JavaScript NPM package ecosystem has millions of software components that have whole classes of flaws, such as hidden property abusing and prototype pollution, which affect any software relying on those components.
In fact, almost all applications contained some open-source software in 2022, with OSS making up 78 percent of the average codebase and half of those codebases had at least one high-risk vulnerability, according to a recent review of more than 2,400 commercial app codebases by software management firm Synopsys. At the annual Black Hat security conference in 2021, Tim Mackey, a security strategist for the firm, put together a simple JavaScript application and declared eight dependencies, which expanded into a web of 133 imported libraries that went eight levels deep.
“This was stuff that was end of life, with explicit deprecation statements, and people were still happily using it,” he said at the time. “That represents a real problem, especially with dependencies that form the basis of our infrastructure. Maybe it’s time for a reset on some of these projects to get a handle on the security implications of this legacy code.”
Dependency creep becomes a security problem
The discovery of the original Log4j vulnerability (CVE-2021–44228) is typical of the bubbling up of dependency issues. The Apache Software Foundation, whose logging-services working group maintains Log4j, was first notified about the vulnerability during the Thanksgiving holiday last year. Gary Gregory, a member of the Apache Software Foundation and a member of the Apache Logging project management committee, worked on the effort to fix the flaw.
“It became apparent within the first 24 hours that this was a big deal,” he told README. “This was a ‘drop everything you are doing and take care of this’ issue.”
Unfortunately, information about the vulnerability had leaked out to at least one Chinese forum by early December, and the Apache Logging group had to rush out a fix on December 10. The result: A patch required its own patch to fix the flaw. By early January, three related Log4j updates had been released.
Releasing a series of patches is not ideal, but the gravity of the response to the Log4j vulnerability was not unwarranted. In mid-December, Google researchers analyzed the Java ecosystem and found that 35,000 packages, or 8 percent of the biggest “Maven Central” Java package repository, had been impacted by the core vulnerability. In its 2022 Open Source Security and Risk Analysis report, Synopsys found that 15 percent of the codebases audited by the firm had a vulnerable Log4J component.
In addition, the Google research found that developers were not including Log4j themselves, but the package was included as part of another dependency. The average application included the Log4j package as a dependency of a dependency of a dependency of a dependency of a dependency — that is, a developer would have to dig down five levels to find the code that imported Log4j. In some cases, the dependency tree extended to nine levels.
That sort of transitive tree requires a change in the developer mindset, says Brian Behlendorf, general manager for the Open Source Security Foundation (OpenSSF). “I’m old enough to remember a time when the Apache Web server depended on libc and the OpenSSL libraries and very few other things, and we didn’t have to worry much about what sorts of vulnerabilities that dependencies we were delivering,” he told README.
The threat of dependencies
Non-JavaScript software tends to have scores of dependencies — 70 in the case of PHP and 68 for Ruby, according to Synopsys. The median Java application uses 100 libraries, according to an analysis by software security firm Contrast Security.
JavaScript, however, is in a class by itself. Owing to the atomic architecture of applications that use the NPM registry, the average JavaScript application may only import 10 libraries directly, but those libraries rely on dozens of libraries, which rely on dozens more. In the end, an average JavaScript application has a massive dependency tree consisting of more than 680 components. In one egregious case, importing a single library — Gatsby.js — results in more than 19,000 additional components being added to a software project.
The volume of dependencies represents a problem in its own right, according to OpenSSF’s Behlendorf.
“There is a risk in the very highly fragmented, very atomic ecosystems out there like NPM, where — if each module only has a single set of eyeballs on it, and someone who in the course of solving a problem made a library and put it on repo, but doesn’t really maintain it — I think there is a risk to that,” he said. “I don’t feel comfortable dictating to NPM — that thou shalt change — but I do think end users of these components should be aware of the contributory risks that [the extensive dependency tree] brings.”
OpenSSF has embarked on projects to identify the most critical dependencies. In conjunction with the Harvard Laboratory for Innovation Science, OpenSSF released the Census II report, taking stock of the most used and polar open-source packages. Because of JavaScript’s unique and massive dependency issue, the lists are broken down into four top-500 for non-JavaScript packages and four top-500 lists for JavaScript packages.
While it did not merit a spot on previous — albeit, smaller — lists, Log4j took the №38 spot on the Census II rankings for non-JavaScript components.
No easy fix
Solving the problem of dependencies in open-source software requires not just significant investment in the developers who are maintaining projects, but also better tools and processes for the maintainers of those projects. It’s a hard sell — programmers volunteer their time for open-source projects because they are interested in creating something interesting and functional, not to chase down bugs and enforce security controls.
Log4j, for example, is a popular project with a moderate level of support, but if you need to force people to focus on security, it may not work, says Apache’s Gregory.
“We have a non-trivial number of people — eight — working on Log4j,” he said. “We are all volunteers, and we all work on what we want to work on — there is no overlord. But we take pride in our work.”
While automation and static analysis tools can help find bugs, they still tend to generate false positives and require a lot of work to tune. Moreover, Gregory said he doubts that a static checker would have caught the Log4j issue — or the way to exploit the vulnerability, frequently called Log4Shell.
“I’m pretty sure none of these would reveal the underlying cause of Log4Shell — at least not yet,” he said. “It is a very unfortunate combination of features that were put in separately a long time ago, along with some unexpected behavior that we didn’t realize happened.”
OpenSSF has taken a wider view, making sure that projects have the information and resources that they need to understand which dependencies are critical and the processes needed to secure software. In addition to the open-source software census, the group has created a best-practices badge for project to display if they are taking certain baseline security steps.
“We need more projects to fill that out,” said OpenSSF’s Behlendorf. “It is a great checklist for maintainers of a projects to go through.”
The problem is that architectural changes to software projects tend to happen slowly, and the threat actors move fast, he says.
“Sometimes it feels like new threats or new kinds of attacks emerge more quickly than we are making progress on, so it is still a very dynamic environment, and I don’t know if there is a simple rubric,” he said, adding: “A lot of good news happening in this space, but the adversaries, the threats are only growing over time.”