The Package Analysis Project: Scalable detection of malicious open source packages

Despite open source software’s essential role in all software built today, it’s far too easy for bad actors to circulate malicious packages that attack the systems and users running that software. Unlike mobile app stores that can scan for and reject malicious contributions, package repositories have limited resources to review the thousands of daily updates and must maintain an open model where anyone can freely contribute. As a result, malicious packages like ua-parser-js, and node-ipc are regularly uploaded to popular repositories despite their best efforts, with sometimes devastating consequences for users.

Google, a member of the Open Source Security Foundation (OpenSSF), is proud to support the OpenSSF’s Package Analysis project, which is a welcome step toward helping secure the open source packages we all depend on. The Package Analysis program performs dynamic analysis of all packages uploaded to popular open source repositories and catalogs the results in a BigQuery table. By detecting malicious activities and alerting consumers to suspicious behavior before they select packages, this program contributes to a more secure software supply chain and greater trust in open source software. The program also gives insight into the types of malicious packages that are most common at any given time, which can guide decisions about how to better protect the ecosystem.

To better understand how the Package Analysis program is contributing to supply chain security, we analyzed the nearly 200 malicious packages it captured over a one-month period. Here’s what we discovered: 

Results

All signals collected are published in our BigQuery table. Using simple queries on this table, we found around 200 meaningful results from the packages uploaded to NPM and PyPI in a period of just over a month. Here are some notable examples, with more available in the repository.

PyPI: discordcmd
This Python package will attack the desktop client for Discord on Windows. It was found by spotting the unusual requests to raw.githubusercontent.com, Discord API, and ipinfo.io.

First, it downloaded a backdoor from GitHub and installed it into the Discord electron client.

Next, it looked through various local databases for the user's Discord token.


Finally, it grabbed the data associated with the token from the Discord API and exfiltrated it back to a Discord server controlled by the attacker.

NPM: @roku-web-core/ajax

During install, this NPM package exfiltrates details of the machine it is running on and then opens a reverse shell, allowing the remote execution of commands.
This package was discovered from its requests to an attacker-controlled address.

Dependency Confusion / Typosquatting

The vast majority of the malicious packages we detected are dependency confusion and typosquatting attacks.


The packages we found usually contain a simple script that runs during an install and calls home with a few details about the host. These packages are most likely the work of security researchers looking for bug bounties, since most are not exfiltrating meaningful data except the name of the machine or a username, and they make no attempt to disguise their behavior.


These dependency confusion attacks were discovered through the domains they used, such as burpcollaborator.net, pipedream.com, interact.sh, which are commonly used for reporting back attacks. The same domains appear across unrelated packages and have no apparent connection to the packages themselves. Many packages also used unusual version numbers that were high (e.g. v5.0.0, v99.10.9) for a package with no previous versions.Conclusions

The short time frame and low sophistication needed for finding the results above underscore the challenge facing open source package repositories. While many of the results above were likely the work of security researchers, any one of these packages could have done far more to hurt the unfortunate victims who installed them.

These results show the clear need for more investment in vetting packages being published in order to keep users safe. This is a growing space, and having an open standard for reporting would help centralize analysis results and offer consumers a trusted place to assess the packages they’re considering using. Creating an open standard should also foster healthy competition, promote integration, and raise the overall security of open source packages.
 
Over time we hope that the Package Analysis program will offer comprehensive knowledge about the behavior and capabilities of packages across open source software, and help guide the future efforts needed to make the ecosystem more secure for everyone. To get involved, please check out the GitHub Project and Milestones for opportunities to contribute.