Attackers see developers as low-hanging fruit
Illustration: Si Weon Kim
Developers must be increasingly wary of actively malicious code that makes its way into their software supply chains.
An unknown threat actor uploaded hundreds of malicious packages to the Python Package Index (PyPI) in April to steal data from developers’ systems, redirect cryptocurrency from legitimate recipients to their own coffers and undermine the security of software made with those packages. The attacker initially relied on basic “typosquatting,” essentially claiming a name similar to a trusted resource but with a few misplaced or misleading characters meant to ensnare unwitting developers, but quickly ratcheted up the sophistication of their stealth and data-mining techniques.
The campaign was at least a partial success—the collection of malicious packages were downloaded more than 75,000 times, and the attacker received at least $100,000 worth of cryptocurrency, according to an analysis of the attack published last month by application-security firm Checkmarx.
"In this attack, the end-users consuming the malicious packages [the developers] were targeted," said Yehuda Gelb, a security researcher at Checkmarx. "The attackers aimed to deploy these packages to infiltrate systems, steal sensitive data, manipulate cryptocurrency transactions, and gain unauthorized access to various applications and platforms used by these package consumers."
Cyberattacks attempting to sneak onto developers' systems through open-source components have skyrocketed ever since a malicious code injection attack on the popular “event-stream” project rattled the open-source community in 2018. In the last year, for example, more than 150,000 malicious packages were discovered in various ecosystems, bringing the total over the past five years to 245,000, according to software security and management firm Sonatype.
The trend should be a warning to companies. Attackers aren’t limiting their efforts to finding vulnerabilities in finished software but are increasingly focused on compromising developers and the development pipeline. Defenders’ current approach of focusing on the integrity of the software, and not taking the security of their systems into account, resembles manufacturers preventing defects in cars while leaving the factory itself open to attack, said Brian Fox, CTO of Sonatype.
"A lot of teams are used to not worrying about what happens in the developer machine, as long as they catch vulnerabilities before they ship them to production," Fox said. "But this is about people trying to blow up the factory, not trying to create dangerous cars. And the way you defend against that [threat] has to be different."
From typosquatting to dependency confusion
Techniques for inserting malicious code into the software supply chain have evolved over time from manual attacks to mass typosquatting to more targeted dependency confusion.
An early example of a targeted attack occurred in 2018, when an attacker posing as a developer proposed an update to the event-stream project hosted on the Node Package Manager (npm) repository popular among JavaScript developers. After the proposal was accepted, the attacker pushed a final update to their code that turned the Bitcoin wallet application Copay, which depended on event-stream, into a trojan horse that harvested users’ cryptocurrency and private keys.
Such targeted attacks required significant effort, so in many cases, have been supplanted by mass typosquatting campaigns. In a demonstration of how effective this technique can be, HashiCorp senior director of security engineering William Bengston created 1,131 fake projects names with similar to popular packages; those fakes were downloaded more than half a million times in a two-year period. "If these typosquat packages were written with malicious intent and we assume one attempt per install, that would mean 530,950 machines could have been compromised over the two year period," he said.
Attackers are finding other methods of inserting malicious code into the software supply chain. Since many developers seek answers from coding websites, such as StackOverflow, just leaving vulnerable code in an answer to a query could lead to compromises, said Henrik Plate, a security researcher with Endor Labs, a software supply-chain security firm. And another form of attack—"dependency confusion"—works by naming a malicious package the same as a popular software component, but with a higher version number, so that automated package updates will download the "newer" package.
Chris Reid / Unsplash
These attacks are all very inexpensive for attackers, in terms of effort — a significant benefit, he said.
"These kinds of attacks, where the attacker just has to come up with a name, or they try to trick developers into downloading something, are very cheap for attackers to conduct," Plate said. "The techniques are very easy to scale and the marginal costs for yet another malicious package is close to zero. Even if only a few developers just download and install that malicious package, it may be still worthwhile because it didn't cost [the attacker] much at all."
Change the import process
A significant change that could protect developers is to prevent code from executing when it is installed. Some software ecosystems, such as Go, prevent install scripts from running when a component is imported, but others, such as Python, will run a setup or initialization script, putting developers at risk of allowing untrusted code to run, Shachar Menashe, senior director of security research for JFrog, a DevOps security platform, told README.
Preventing that execution wouldn’t stop malicious packages from targeting end users, but at least developers’ systems would be better protected.
"The main cases we are seeing today are novice attackers that are just using auto-install scripts, so when the developer installs a package, it immediately runs code," Menashe said. "So the safest — and I believe in the future, both NPM and PyPI will move to this type of installation — is one that doesn't cause automatic code execution."
Other defensive measures, such as requiring multi-factor authentication for developers and preventing new projects from using names similar to existing packages, could also help mitigate some threats.
Shifting mindsets
Defending against malicious attacks targeting the software supply chain also requires developers to change how they think about the vulnerable (and malicious) components they use in their software. Sonatype’s Fox told README that when programmers scan applications and find vulnerabilities imported from open-source modules, they often see them as latent security issues that haven’t been triggered. But, he said, the presence of these components is still a problem—and in some cases could be considered a successful attack.
"People are conditioned to think about vulnerabilities as almost like a passive thing, right? 'Yeah, there's a vulnerability, but it might not be exploitable in my application,'" Fox said. "I find that I have to point out, however, that a vulnerable component is sitting in your cache, or sitting on your developer machine. That's not a potential threat. That is an attack that was successfully delivered."
Developers need to harden their own environments, starting with using tools to verify what software they are including in their project, the vulnerabilities that software may have, and whether a patched version is available. While organizations need to adopt more secure processes, individual developers can adopt a more suspicious mindset, he said.
"Just like you might be suspicious these days to not just go and randomly download some application from the internet and run it — remember, there was a time when that was thought to be more or less safe — I think most people understand that's not safe anymore," Fox said. "Developers need to be thinking about [their] dependencies that way as well."
While open-source ecosystems allow a common infrastructure on which to base software development, the open nature of the ecosystems make curbing malicious activity difficult, said Checkmarx's Gelb.
"Implementing both automated and manual security measures to scan for malicious code is a step in the right direction, although it may require substantial financial resources which not all platforms have," he said. "Much of the responsibility lies on the consumer's [developer's] end, it's imperative to thoroughly vet open-source packages before incorporating them into their systems to minimize the risk of compromise."