The internet is hooked on packages. Hackers have noticed

Illustration: Si Weon Kim

Cyberattacks targeting the “packages” that underpin global software programs have rattled the open-source community and exposed gaps in developers’ supply chain security practices.

Modern life relies on packages — and not just ones with Amazon’s logo printed on them.

Developers around the world depend on various software packages, prewritten codebases that can be easily incorporated into new projects to carry out certain functions. They’re the programming equivalent of having a tire delivered instead of reinventing the wheel.

But some mail-order tires arrive with holes: Recent cybersecurity breaches have made it increasingly clear that relying on packages and the indexes that provide them can leave developers susceptible to supply chain attacks and introduce vulnerabilities in their code.

On Thursday, a hack of the popular “coa” library on the NPM package index broke other JavaScript projects that rely on it. The attack came less than two weeks after another NPM package, “ua-parser-js,” was similarly hijacked to distribute malware, prompting government and industry warnings. Developers have most recently downloaded each package about 9 million and 8 million times a week, respectively, according to NPM statistics.

Those aren’t the first times hackers have used packages to infect developers’ systems. Software company JFrog Inc. revealed this summer that Python developers were being targeted by malicious packages distributed via the widely used Python Package Index (PyPI). JFrog senior director of security research Shachar Menashe told README that the company discovered similar problems affecting the RubyGems package manager used by Ruby programmers, the Apache Maven tool used by Java developers, and NPM.

“Some people didn’t realize this was possible and were surprised” when JFrog made its first disclosure after using an automated scanning tool the company is developing, Menashe said. “But other developers said ‘this is well known, but nobody’s doing anything about it.’ Someone should definitely do something about it.”

How packages can make or break the internet

Packages are typically created by developers to share their solution to a coding problem, saving time for their peers. The shortcuts are listed on a central platform such as NPM, allowing other developers to use them with the help of a package manager that handles everything from installation to updates. Packages can count on other packages to work — and it’s the package manager’s job to keep track of that web of dependencies.

NPM and PyPI are two of the most popular package indexes, offering crucial resources to millions of developers worldwide.

Both have had their share of recent problems, but the risks of relying on open-source indexes were perhaps best demonstrated in 2016 when a developer “broke the internet,” as Quartz put it, by deleting 11 lines of code in a package called “left-pad” because of a trademark dispute. The code didn’t do much — it simply added characters to the beginning of a line of text — but so many packages depended on “left-pad” that NPM took the extraordinary step of reinstating the deleted JavaScript package over the objections of its author, effectively “un-un-publishing” it to un-break the internet.

The left-pad saga highlighted the risks of relying on packages maintained by people who can remove their code at will. But it didn’t stop developers from continuing to use packages.

Malicious snippets

A popular package can make a compelling target for malware distributors. In the case of ua-parser-js, an unknown attacker released three malicious versions of the package on October 22 containing a script that attempted to steal credentials from Linux and Windows systems. It also tried to install cryptocurrency mining software on Windows.

“Any computer that has this package installed or running should be considered fully compromised,” GitHub said in a security advisory last month, adding “there is no guarantee that removing the package will remove all malicious software resulting from installing it.”

But many developers may not even know they’re using the package. NPM statistics show that over 1,200 other packages depend on ua-parser-js. That list includes “fbjs,” a package maintained by Facebook that was downloaded over 5 million times in the past week. That package has more than 1,200 of its own dependents, many of which have their own dependents. It’s no wonder many packages have millions of weekly downloads.

Not all malicious packages have such high download counts, however. JFrog revealed this summer that Python developers were being targeted by malicious packages distributed via PyPI, with downloads estimated at around 30,000.

The Sunnyvale, Calif.-based company’s automated scanning system can be used to detect malicious packages across a range of indexes.

JFrog has disclosed eight malicious packages from PyPI that were used to steal Discord authentication tokens as well as autocomplete information managed via Google Chrome and Microsoft Edge, among other targets.

Menashe at JFrog urged developers to make sure they trust a package’s maintainer before using the software in their own projects — and at the very least should opt for packages that many other developers have starred on GitHub.

But Menashe acknowledged this isn’t necessarily a problem for individual developers to solve, and he called for indexes like PyPI to take steps to catch malicious packages before they’re distributed.

“Some developers know about this and are concerned about it,” he said, “but the way to solve this is with safe defaults” on the part of the indexes. Those could include automated scanning, or ensuring malicious packages can’t co-opt the names of established software products such as Discord.

Problematic packages

Packages can also introduce flaws in key software products. In May, HTTP Toolkit developer Tim Perry disclosed a severe vulnerability in a package called “pac-resolver” that was publicly revealed on Aug. 22. The flaw enabled remote code execution via the JavaScript module before the issue was addressed with the release of an updated version of the tool on July 12.

NPM’s statistics indicate that pac-resolver is downloaded a few million times per week — meaning countless projects that feature it as a dependency were at risk before the patch was released.

How is counting GitHub stars supposed to fix that?

It’s not. “This isn’t really a new problem,” Menashe said, comparing the issue to bugs in popular software like Windows or Linux. The best thing many developers can do is make sure they keep packages up-to-date, so attackers looking to exploit publicly disclosed flaws can’t find purchase on a target system. “Most of these attacks are not zero-day attacks,” he said, “they are one-day attacks… you just gotta monitor it and be very proactive.”

1_yP4P95n75un0IJLrA_Ydzg — An attendee’s laptop is shown at a December 2019 Node.js conference in Montreal. Source: Linux Foundation/Flickr

Roundabout security practices

The Python Software Foundation (PSF) and NPM, Inc. did not respond to multiple requests for comment.

Learning more about PyPI’s security measures requires one to take a serpentine path through multiple websites for several organizations, working groups, and projects. PyPI’s day-to-day management falls to a series of working groups that have largely gone silent during the coronavirus pandemic; the clearest information about its approach to security arrives via the Warehouse project, a web application that aims to improve the package index.

Warehouse’s documentation shows that some form of malware scanning is run before packages are made available on PyPI. That scan missed the packages JFrog detected with an early version of its own automatic scanning tool, however — and the Warehouse repository on GitHub shows the malware scanning tool’s last significant updates were made about two years ago. (More recent updates have centered on naming conventions, not additional protections against attackers.)

In NPM’s case, the company is owned by Microsoft Corp. subsidiary GitHub. The index’s security policy states “we proactively pen-test and audit software.” It also says there’s a dedicated security point of contact and that some measure of automatic scanning is taking place.

The policy also says audit documentation is available “and can be provided to customers when requested,” but NPM didn’t respond to requests for additional details or an example report.

Neither GitHub nor Microsoft responded to requests for comment.

Less money, more problems

Developing open-source software can already be a labor of love — one for which developers often get nothing but sharply worded bug reports as thanks. That makes it a tall order to convince package developers and maintainers to consider the security of their code before it’s submitted to an index.

Donald Fischer, CEO of Tidelift, a Boston-based software company focused on open-source maintenance and support, said there’s an important distinction between application-level projects made by a small group of independent developers and the larger, systems-level projects that attract the attention and financial backing of large organizations. The most prominent example of the latter may be the Linux kernel used in billions of devices around the globe.

“If somebody is maintaining that software, they usually haven’t been doing that as part of an income stream or a business,” Fischer said. “They love doing it for the technical challenge and they know there are millions of people, in many of these cases, using the software they work on.”

But that also means developers may not pay as much attention as they would in professional contexts. “There’s a lot of process and paperwork that’s involved in shipping and maintaining commercial-grade software to security standards,” Fischer said. Companies like Red Hat — where Fischer has worked in the past — pay developers to work on a systems-level project like Linux, he said — but that isn’t true for many other developments.

“It’s more like the boring part of software creation,” Fischer said. “All of this documentation and putting the information in machine readable formats and so on — it’s fairly tedious.”

Tidelift was founded to make it easier for organizations to find the people creating and maintaining these open-source projects so they can pay them to heed those details, he said. GitHub has also worked to bring more financial support to open source devs via its Sponsors program.

The model could be useful for securing the package ecosystem. But experts warn it may not stop developers from mistaking the “free” in “free and open source software” to mean something that doesn’t require any investment to remain operational.

Raising awareness might help, if only to ensure developers know that supply chain attacks conducted via package indexes are a possibility. “People realize that supply chain attacks are much more viable now,” Menashe said, adding that JFrog’s first disclosure covered simple packages that relied on “extremely novice” techniques to evade detection and compromise target information.

JFrog said that some of the attacks evaded detection by using publicly available Python code obfuscation tools such as PyArmor and python-obfuscator. Their creators didn’t have to use sophisticated techniques: they simply had to tweak their packages to not broadcast how they were going to access sensitive files.

“The low-hanging fruit is these really novice attackers trying to get into all kinds of machines that — in a different scenario — they wouldn’t be able to,” Menashe said.