Supply chain security for Go, Part 2: Compromised dependencies

“Secure your dependencies”—it’s the new supply chain mantra. With attacks targeting software supply chains sharply rising, open source developers need to monitor and judge the risks of the projects they rely on. Our previous installment of the Supply chain security for Go series shared the ecosystem tools available to Go developers to manage their dependencies and vulnerabilities. This second installment describes the ways that Go helps you trust the integrity of a Go package. 

Go has built-in protections against three major ways packages can be compromised before reaching you: 

  • A new, malicious version of your dependency is published

  • A package is withdrawn from the ecosystem

  • A malicious file is substituted for a currently used version of your dependency

In this blog post we look at real-world scenarios of each situation and show how Go helps protect you from similar attacks.

Reproducible builds and malicious new versions

In 2018, control of the JavaScript package event-stream passed from the original maintainer to a project contributor. The new owner purposefully published version 3.3.6 with a new dependency named flatmap-stream, which was found to be maliciously executing code to steal cryptocurrency. In the two months that the compromised version was available, it had been downloaded 8 million times. This poses the question - how many users were unaware that they had adopted a new indirect dependency? 

Go ensures reproducible builds thanks to automatically fixing dependencies to a specific version (“pinning”). A newly released dependency version will not affect a Go build until the package author explicitly chooses to upgrade. This means that all updates to the dependency tree must pass code review. In a situation like the event-stream attack, developers would have the opportunity to investigate their new indirect dependency. 

Go Module Mirror and package availability

In 2016, an open-source developer pulled his projects from npm after a disagreement with npm and patent lawyers over the name of one of his open-source libraries. One of these pulled projects, left-pad, seemed to be small, but was used indirectly by some of the largest projects in the npm ecosystem. Left-pad had 2.5 million downloads in the month before it was withdrawn, and its disappearance left developers around the world scrambling to diagnose and fix broken builds. Within a few hours, npm took the unprecedented action to restore the package. The event was a wake up call to the community about what can happen when packages go missing.

Go guarantees the availability of packages.The Go Module Mirror serves packages requested by the go command, rather than going to the origin servers (such as GitHub). The first time any Go developer requests a given module, it’s fetched from upstream sources and cached within the module mirror. When a module has been made available under a standard open source license, all future requests for that module simply return the cached copy, even if the module is deleted upstream.

Go Checksum Database and package integrity

In December 2022, users who installed the package pyTorch-nightly via pip, downloaded something they didn’t expect: a package that included all the functionality of the original version but also ran a malicious binary that could gain access to environment variables, host names, and login information.  

This compromise was possible because pyTorch-nightly had a dependency named torchtriton that shipped from the pyTorch-nightly package index instead of PyPI. An attacker claimed the unused torchtriton namespace on PyPI and uploaded a malicious package. Since pip checks PyPI first when performing an install, the attacker got their package out in front of the real package—a dependency confusion attack.  

Go protects against these kinds of attacks in two ways. First, it is harder to hijack a namespace on the module mirror because publicly available projects are added to it automatically—there are no unclaimed namespaces of currently available projects. Second, package authenticity is automatically verified by Go's checksum database.  

The checksum database is a global list of the SHA-256 hashes of source code for all publicly available Go modules. When fetching a module, the go command verifies the hashes against the checksum database, ensuring that all users in the ecosystem see the same source code for a given module version. In the case of pyTorch-nightly, a checksum database would have detected that the torchtriton version on PyPI did not match the one served earlier from pyTorch-nightly.

Open source, transparent logs for verification

How do we know that the values in the Go checksum database are trustworthy? The Go checksum database is built on a Transparent Log of hashes of every Go module. The transparent log is backed by Trillian, a production-quality, open-source implementation also used for Certificate Transparency. Transparent logs are tamper-evident by design and append-only, meaning that it's impossible to delete or modify Go module hashes in the logs without the change being detected.

Secure by default

The Go team supports the checksum database and module mirror as services so that Go developers don't need to worry about disappearing or hijacked packages. The future of supply chain security is ecosystem integration, and with these services built directly into Go, you can develop with confidence, knowing your dependencies will be available and uncorrupted. 

The final part of this series will discuss the Go tools that take a “shift left” approach to security—moving security earlier in the development life cycle. For a sneak peek, check out our recent supply chain security talk from Google I/O!