Tag Archives: Open source

Magika: AI powered fast and efficient file type identification

Today we are open-sourcing Magika, Google’s AI-powered file-type identification system, to help others accurately detect binary and textual file types. Under the hood, Magika employs a custom, highly optimized deep-learning model, enabling precise file identification within milliseconds, even when running on a CPU.

Magika command line tool used to recognize a identify the type of a diverse set of files
Magika command line tool used to recognize a identify the type of a diverse set of files

You can try the Magika web demo today, or install it as a Python library and standalone command line tool (output is showcased above) by using the standard command line pip install magika.

Why identifying file type is difficult

Since the early days of computing, accurately detecting file types has been crucial in determining how to process files. Linux comes equipped with libmagic and the file utility, which have served as the de facto standard for file type identification for over 50 years. Today web browsers, code editors, and countless other software rely on file-type detection to decide how to properly render a file. For example, modern code editors use file-type detection to choose which syntax coloring scheme to use as the developer starts typing in a new file.

Accurate file-type detection is a notoriously difficult problem because each file format has a different structure, or no structure at all. This is particularly challenging for textual formats and programming languages as they have very similar constructs. So far, libmagic and most other file-type-identification software have been relying on a handcrafted collection of heuristics and custom rules to detect each file format.

This manual approach is both time consuming and error prone as it is hard for humans to create generalized rules by hand. In particular for security applications, creating dependable detection is especially challenging as attackers are constantly attempting to confuse detection with adversarially-crafted payloads.

To address this issue and provide fast and accurate file-type detection we researched and developed Magika, a new AI powered file type detector. Under the hood, Magika uses a custom, highly optimized deep-learning model designed and trained using Keras that only weighs about 1MB. At inference time Magika uses Onnx as an inference engine to ensure files are identified in a matter of milliseconds, almost as fast as a non-AI tool even on CPU.

Magika Performance

Magika detection quality compared to other tools on our 1M files benchmark
Magika detection quality compared to other tools on our 1M files benchmark

Performance wise, Magika, thanks to its AI model and large training dataset, is able to outperform other existing tools by about 20% when evaluated on a 1M files benchmark that encompasses over 100 file types. Breaking down by file type, as reported in the table below, we see even greater performance gains on textual files, including code files and configuration files that other tools can struggle with.

Table showing various file type identification tools performance for a selection of the file types included in our benchmark
Various file type identification tools performance for a selection of the file types included in our benchmark - n/a indicates the tool doesn’t detect the given file type.

Magika at Google

Internally, Magika is used at scale to help improve Google users’ safety by routing Gmail, Drive, and Safe Browsing files to the proper security and content policy scanners. Looking at a weekly average of hundreds of billions of files reveals that Magika improves file type identification accuracy by 50% compared to our previous system that relied on handcrafted rules. In particular, this increase in accuracy allows us to scan 11% more files with our specialized malicious AI document scanners and reduce the number of unidentified files to 3%.

The upcoming integration of Magika with VirusTotal will complement the platform's existing Code Insight functionality, which employs Google's generative AI to analyze and detect malicious code. Magika will act as a pre-filter before files are analyzed by Code Insight, improving the platform’s efficiency and accuracy. This integration, due to VirusTotal’s collaborative nature, directly contributes to the global cybersecurity ecosystem, fostering a safer digital environment.

Open Sourcing Magika

By open-sourcing Magika, we aim to help other software improve their file identification accuracy and offer researchers a reliable method for identifying file types at scale.

Magika code and model are freely available starting today in Github under the Apache2 License. Magika can also quickly be installed as a standalone utility and python library via the pypi package manager by simply typing pip install magika with no GPU required. We also have an experimental npm package if you would like to use the TFJS version.

To learn more about how to use it, please refer to Magika documentation site.


Acknowledgements

Magika would not have been possible without the help of many people including: Ange Albertini, Loua Farah, Francois Galilee, Giancarlo Metitieri, Luca Invernizzi, Young Maeng, Alex Petit-Bianco , David Tao, Kurt Thomas, Amanda Walker

By Elie Bursztein – Cybersecurity AI Technical and Research Lead and Yanick Fratantonio – Cybersecurity Research Scientist

YouTube releases scripts to help partners and creators to optimize their work

At YouTube Technology Services, we believe that open source software is essential for driving innovation and collaboration in the YouTube ecosystem. We want to make automation on YouTube more accessible by providing publicly available scripts to automate common use cases, aiming to decrease the cost for partners and creators to handle the most common scenarios when managing their content on YouTube.

In order to do so, we are announcing a new GitHub Organization, YouTubeLabs, where you will find open source code examples in the code-samples repository. We are providing open source scripts for a variety of use cases, including but not limited to:

Most code samples rely on public YouTube APIs or Google APIs and are well-documented and well-commented, in order to be easily modified by partners and creators.

We are delivering code that aims to be as accessible as possible to our partners and creators, with minimal configurations and minimal installation required. That's why we rely on Colaboratory Notebooks (Colab) and AppsScript as the main pillars of our open source offering. Colab is a free, cloud-based Jupyter notebook environment that makes it easy to run Python code in the browser, and it is integrated with Google Drive. AppsScript is a serverless platform that allows you to write scripts that run on Google's servers.

We believe that open source software is key to the future of the YouTube ecosystem. By making our code available to the public, we are helping to empower partners and creators to do more with YouTube.

Want to get started? Check out some of the code examples already available in YouTubeLabs’ code-sharing repository:

We look forward to continuing to build out our open source examples in the coming months, so don’t forget to “like and subscribe” to our repository to stay tuned for more!

By Federico Villa and Haley Schafer – Partner Technology Managers on behalf of YouTube Technology Services

Kubernetes 1.29 is available in the Regular channel of GKE

Kubernetes 1.29 is now available in the GKE Regular Channel since January 26th, and was available in the Rapid Channel January 11th, less than 30 days after the OSS release! For more information about the content of Kubernetes 1.29, read the Kubernetes 1.29 Release Notes.

New Features

Using CEL for Validating Admission Policy

Validating admission policies offer a declarative, in-process alternative to validating admission webhooks.

Validating admission policies use the Common Expression Language (CEL) to declare the validation rules of a policy. Validation admission policies are highly configurable, enabling policy authors to define policies that can be parameterized and scoped to resources as needed by cluster administrators. [source]

Validating Admission Policy graduates to beta in 1.29. We are especially excited about the work that Googlers Cici Huang, Joe Betz, and Jiahui Feng have led in this release to get to the beta milestone. As we move toward v1, we are actively working to ensure scalability and would appreciate any end-user feedback. [public doc here for those interested]

The beta of ValidatingAdmissionPolicy feature can be opted into by enabling the beta APIs.

InitContainers as a Sidecar

InitContainers can now be configured as sidecar containers and kept running alongside normal containers in a Pod. This is only supported by nodes running version 1.29 or later, so ensure all nodes in a cluster are at version 1.29 or later before using this feature in Pods. The feature was long awaited. This is evident by the fact that Istio has already widely tested it and the Istio community working hard to make sure that the enablement of it can be done early with minimal disruption for the clusters with older nodes. You can participate in the discussion here.

A big driver to deliver the feature is the growing number of AI/ML workloads which are often represented by Pods running to completion. Thos Pods need infrastructure sidecars - Istio and GCSFuse are examples of it, and Google recognizes this trend.

Implementation of sidecar containers is and continues to be the community effort. We are proud to highlight that Googler Sergey Kanzhelev is driving it via the Sidecar working group, and it was a great effort of many other Googlers to make sure this KEP landed so fast. John Howard made sure the early versions of implementation were tested with Istio, Wojciech Tyczyński made sure the safe rollout vie production readiness review, Tim Hockin spent many hours in API review of the feature, and Clayton Coleman gave advice and helped with code reviews.

New APIs

API Priority and Fairness/Flow Control

We are super excited to share that API Priority and Fairness graduated to Stable V1 / GA in 1.29! Controlling the behavior of the Kubernetes API server in an overload situation is a key task for cluster administrators, and this is what APF addresses. This ambitious project was initiated by Googler and founding API Machinery SIG lead Daniel Smith, and expanded to become a community-wide effort. Special thanks to Googler Wojciech Tyczyński and API Machinery members Mike Spreitzer from IBM and Abu Kashem from RedHat, for landing this critical feature in Kubernetes 1.29 (more details in the Kubernetes publication). In Google GKE we tested and utilized it early. In fact, any version above 1.26.4 is setting higher kubelet QPS values trusting the API server to handle it gracefully.

Deprecations and Removals

  • The previously deprecated v1beta2 Priority and Fairness APIs are no longer served in 1.29, so update usage to v1beta3 before upgrading to 1.29.
  • With the API Priority and Fairness graduation to v1, the v1beta3 Priority and Fairness APIs are newly deprecated in 1.29, and will no longer be served in 1.32.
  • In the Node API, take a look at the changes to the status.kubeProxyVersion field, which will not be populated starting in v1.33. The field is currently populated with the kubelet version, not the kube-proxy version, and might not accurately reflect the kube-proxy version in use. For more information, see KEP-4004.
  • 1.29 removed support for the insecure SHA1 algorithm. To prevent impact on your clusters, you must replace incompatible certificates of webhook servers and extension API servers before upgrading your clusters to version 1.29.
    • GKE will not auto-upgrade clusters with webhook backends using incompatible certificates to 1.29 until you replace the certificates or until version 1.28 reaches end of life. For more information refer to the official GKE documentation.
  • The Ceph CephFS (kubernetes.io/cephfs) and RBD (kubernetes.io/rbd) volume plugins are deprecated since 1.28 and will be removed in a future release

Shoutout to the Production Readiness Review (PRR) team

For each new Kubernetes Release, there is a dedicated sub group of SIG Architecture, composed of very senior contributors in the Kubernetes Community, that regularly conducts Production Readiness reviews for each new release, going through each feature.

  • OSS Production Readiness Reviews (PRR) reduce toil for all the different Cloud Providers, by shifting the effort onto OSS developers.
  • OSS Production Readiness Reviews surface production safety, observability, and scalability issues with OSS features at design time, when it is still possible to affect the outcomes.
  • By ensuring feature gates, solid enable → disable → enable testing, and attention to upgrade and rollout considerations, OSS Production Readiness Reviews enable rapid mitigation of failures in new features.

As part of this group, we want to thank Googlers John Belamaric and Wojciech Tyczyński for doing this remarkable, heavy lifting on non shiny, and often invisible work. Additionally, we’d like to congratulate Googler Joe Betz who recently graduated as a new PRR reviewer, after shadowing during all 2023 the process.

By Jordan Liggitt, Jago Macleod, Sergey Kanzhelev, and Federico Bongiovanni – Google Kubernetes Kernel team

Announcing Google Season of Docs 2024

Google Season of Docs provides direct grants to open source projects to improve their documentation and gives professional technical writers an opportunity to gain experience in open source. Together we raise awareness of open source, of docs, and of technical writing.

How does GSoD work?

Google Season of Docs allows open source organizations to apply for a grant based on their documentation needs. If selected, the open source organizations use their grant to directly hire a technical writer to complete their documentation project. Organizations have up to six months to complete their documentation project. At the end of the program, organizations complete their final case study which outlines the problem the documentation project was intended to solve, what metrics were used to judge the effectiveness of the documentation, and what the organization learned for the future. All project case studies are published on the Season of Docs site at the end of the program.

Organizations: apply to be part of GSoD

The applications for Google Season of Docs open February 22 for the 2024 cycle. We strongly suggest that organizations take the time to complete the steps in the exploration phase before the application process begins, including:

  • Creating a project page to gauge community and technical writer participant interest (see our project ideas page for examples).
  • Publicizing your interest in participating in GSoD through your project channels and adding your project to our list of interested projects on GitHub.
  • Lining up community members who are interested in mentoring or helping onboard technical writers to your project.
  • Brainstorming requirements for technical writers to work on your project. Will they need to be able to test code, work with video, or have prior experience with your project or related technologies? Will you allow the use of generative AI tools in creating documentation for your project?
  • Reading through the case studies from previous Season of Docs participants.

Organizations: create your project page

Every Google Season of Docs project begins with a project page, which is a publicly visible page that serves as an overview of your documentation project. A good project page includes:

  • A statement of the problem your project needs to solve: “users on Windows don’t have clear guidance of how to install our project”.
  • The documentation that might solve this problem: “We want to create a quickstart doc and installation guide for Windows users”.
  • How you’ll measure the success of your documentation: “With a good quickstart, we expect to see 50% fewer issues opened about Windows installation problems.”
  • What skills your technical writer would need (break down into “must have” and “nice to have” categories): “Must have: access Windows machine to test instructions”.
  • What volunteer help is needed from community members: “need help onboarding technical writers to our discussion groups”. Include a way for the community to discuss the proposal.
  • Most importantly, include a way for interested technical writers to reach you and ask questions!

Technical writers: reach out to organizations early

Technical writers do not submit a formal application through Google Season of Docs, but instead apply to accepted organizations directly. Technical writers can share their contact information now via the Google Season of Docs GitHub repository. They can also submit proposals directly to organizations using the contact information shared on the organization’s project page. Check out our technical writer guide for more information. We suggest that interested technical writers read through the case studies from the previous Google Season of Docs participants to get an idea of the kinds of projects that have been accepted and what organizations have learned from working with technical writers.

General Timeline

February 22 - April 2, 2024 Open source organizations apply to take part in Google Season of Docs

April 10

Google publishes the list of accepted organizations, along with their project proposals and doc development can begin
May 22

Technical writer hiring deadline
June 5

Organization administrators begin to submit monthly evaluations to report on the status of their project
November 22 - December 10

Organization administrators submit their case study and final project evaluation.
December 13

Google publishes the 2024 case studies and aggregate project data.

May 1, 2025 Organizations begin to participate in post-program followup surveys.

See the full program timeline for more details.

Join us

Explore the Google Season of Docs website at g.co/seasonofdocs to learn more about participating in the program. Use our logo and other promotional resources to spread the word. Check out the timeline and FAQ, and get ready to apply!

By Erin McKean – Google Open Source Programs Office

Google Summer of Code 2024 Mentor Organization Applications Now Open

We are excited to announce that open source projects and organizations can now apply to participate as mentor organizations in the 2024 Google Summer of Code (GSoC) program. Applications for organizations will close on February 6, 2024 at 18:00 UTC.

We are celebrating a big milestone as we head into our 20th year of Google Summer of Code this year! In 2024 we are adding a third project size option which you can read more about in our announcement blog post.

Does your open source project want to learn more about becoming a mentor organization? Visit the program site and read the mentor guide to learn what it means to be a mentor organization and how to prepare your community (hint: have plenty of excited, dedicated mentors and well thought out project ideas!).

We welcome all types of organizations and are very eager to involve first-time mentor orgs in GSoC. We encourage new organizations to get a referral from experienced organizations that think they would be a good fit to participate in GSoC.

The open source projects that participate in GSoC as mentor organizations span many fields including those doing interesting work in AI/ML, security, cloud, development tools, science, medicine, data, media, and more! Projects can range from being relatively new (about 2 years old) to well established projects that started over 20 years ago. We welcome open source projects big, small, and everything in between.

This year we are looking to bring more open source projects in the AI/ML field into GSoC 2024. If your project is in the artificial intelligence or machine learning fields please chat with your community and see if you would be interested in applying to GSoC 2024.

One thing to remember is that open source projects wishing to apply need to have a solid community; the goal of GSoC is to bring new contributors into established and welcoming communities. While you don’t have to have 50+ community members, the project also can’t have as few as three people.

You can apply to be a mentor organization for GSoC starting today on the program site. The deadline to apply is February 6, 2024 at 18:00 UTC. We will publicly announce the organizations chosen for GSoC 2024 on February 21st.

Please visit the program site for more information on how to apply and review the detailed timeline for important deadlines. We also encourage you to check out the Mentor Guide, our ‘Intro to Google Summer of Code’ video, and our short video on why open source projects are excited to be a part of the GSoC program.

Good luck to all open source mentor organization applicants!

By Stephanie Taylor, Program Manager – Google Open Source Programs Office

Open sourcing tools for Google Cloud performance and resource optimization

Over the years, we at Google have identified common requests from customers to optimize their Kubernetes clusters on Google Cloud. Today, we are releasing a set of open source tools to help customers with these tasks, including bin packing, load testing, and performance benchmarking. These tools are designed to help customers optimize their clusters for cost, performance, and scalability.

Those identified common requests from customers are around the following use cases:

  1. Google Cloud customers ask whether Google Cloud has a bin packing recommendation feature or tool to optimize GKE Standard's nodes usage?
  2. How to easily run Aerospike, Cassandra, PgBench benchmark or other popular benchmarking tools on Google Cloud?
  3. How to load test our application running on Google Cloud? How many requests per second could my app handle given the current size of the existing Google Cloud infrastructure?

The underlying motivation is that customers want some evidence-based tooling in order to help them optimize their Google Cloud resources, optimize for cost, run benchmarks, identify performance bottlenecks, or even to start a performance discussion.

For such use cases mentioned above, we are open sourcing a set of tools for the public to self-service the installation of each application which comprises UI and Backend components deployable to their respective Google Cloud Project. We name the collection of these tools as sa-tools.


BinPacker

BinPacker recommender for GKE node size
There are currently no bin packing recommendation features available in GCP Cloud Console. We are open sourcing a tool to visually scan your GKE cluster and recommend the optimal node’s bin packing size. Users can opt to select services that are grouped together to be in the same node. The installation guide can be found here.

Perfkit Benchmarker with UI

Perfkit Benchmarker with UI

What if you could install an easy-to-use version of Perfkit Benchmarker (PKB) with a click-and-select UI?

With this version, you could simply select the benchmark tool you want to use from a dropdown menu and provide a YAML configuration file. PKB would then automatically spin up a GKE Autopilot cluster with the configuration you have provided and run the benchmark. You could then view the performance metrics results in the UI.

This easy-to-use version of PKB would make it easier to run benchmarks and compare the performance of different systems, even if you don't have much technical experience. The installation guide can be found here.


Web Performance Testing

gTools Performance Testing

We built an open source UI wrapper on top of Locust, running inside your GCP Project. You can have a Locust farm instance run for a specific group of users in comparison to the generic Locust setup where everyone is able to access the Locust instance. The installation guide can be found here.

For more info you may reach us via the contributor list in the repository.

By Yudy Hendry, Anant Damle, Kozzy Hasebe, Jun Sheng, and Chuan Chen – Cloud Solutions Architects Team

Google Open Source Peer Bonus program announces second group of 2023 winners



We are excited to announce the second group of winners for the 2023 Google Open Source Peer Bonus Program! This program recognizes external open source contributors who have been nominated by Googlers for their exceptional contributions to open source projects.

The Google Open Source Peer Bonus Program is a key part of Google's ongoing commitment to open source software. By supporting the development and growth of open source projects, Google is fostering a more collaborative and innovative software ecosystem that benefits everyone.

This cycle's Open Source Peer Bonus Program received 163 nominations and winners come from 35 different countries around the world, reflecting the program's global reach and the immense impact of open source software. Community collaboration is a key driver of innovation and progress, and we are honored to be able to support and celebrate the contributions of these talented individuals from around the world through this program.

We would like to extend our congratulations to the winners! Included below are those who have agreed to be named publicly.

Winner

Open Source Project

Tim Dettmers

8-bit CUDA functions for PyTorch

Odin Asbjørnsen

Accompanist

Lazarus Akelo

Android FHIR

Khyati Vyas

Android FHIR

Fikri Milano

Android FHIR

Veyndan Stuart

AndroidX

Alex Van Boxel

Apache Beam

Dezső Biczó

Apigee Edge Drupal module

Felix Yan

Arch Linux

Gerlof Langeveld

atop

Fabian Meumertzheim

Bazel

Keith Smiley

Bazel

Andre Brisco

Bazel Build Rules for Rust

Cecil Curry

beartype

Paul Marcombes

bigfunctions

Lucas Yuji Yoshimine

Camposer

Anita Ihuman

CHAOSS

Jesper van den Ende

Chrome DevTools

Aboobacker MK

CircuitVerse.org

Aaron Ballman

Clang

Alejandra González

Clippy

Catherine Flores

Clippy

Rajasekhar Kategaru

Compose Actors

Olivier Charrez

comprehensive-rust

John O'Reilly

Confetti

James DeFelice

container-storage-interface

Akihiro Suda

containerd, runc, OCI specs, Docker, Kubernetes

Neil Bowers

CPAN

Aleksandr Mikhalitsyn

CRIU

Daniel Stenberg

curl

Ryosuke TOKUAMI

Dataform

Salvatore Bonaccorso

Debian

Moritz Muehlenhoff

Debian

Sylvestre Ledru

DebianLLVM

Andreas Deininger

Docsy

Róbert Fekete

Docsy

David Sherret

dprint

Justin Grant

ECMAScript Time Zone Canonicalization Proposal

Chris White

EditorConfig

Charles Schlosser

Eigen

Daniel Roe

Elk - Mastodon Client

Christopher Quadflieg

FakerJS

Ostap Taran

Firebase Apple SDK

Frederik Seiffert

Firebase C++ SDK

Juraj Čarnogurský

firebase-tools

Callum Moffat

Flutter

Anton Borries

Flutter

Tomasz Gucio

Flutter

Chinmoy Chakraborty

Flutter

Daniil Lipatkin

Flutter

Tobias Löfstrand

Flutter go_router package

Ole André Vadla Ravnås

Frida

Jaeyoon Choi

Fuchsia

Jeuk Kim

Fuchsia

Dongjin Kim

Fuchsia

Seokhwan Kim

Fuchsia

Marcel Böhme

FuzzBench

Md Awsafur Rahman

GCViT-tf, TransUNet-tf,Kaggle

Qiusheng Wu

GEEMap

Karsten Ohme

GlobalPlatform

Sacha Chua

GNU Emacs

Austen Novis

Goblet

Tiago Temporin

Golang

Josh van Leeuwen

Google Certificate Authority Service Issuer for cert-manager

Dustin Walker

google-cloud-go

Parth Patel

GUAC

Kevin Conner

GUAC

Dejan Bosanac

GUAC

Jendrik Johannes

Guava

Chao Sun

Hive, Spark

Sean Eddy

hmmer

Paulus Schoutsen

Home Assistant

Timo Lassmann

Kalign

Stephen Augustus

Kubernetes

Vyom Yadav

Kubernetes

Meha Bhalodiya

Kubernetes

Madhav Jivrajani

Kubernetes

Priyanka Saggu

Kubernetes

DANIEL FINNERAN

kubeVIP

Junfeng Li

LanguageClient-neovim

Andrea Fioraldi

LibAFL

Dongjia Zhang

LibAFL

Addison Crump

LibAFL

Yuan Tong

libavif

Gustavo A. R. Silva

Linux kernel

Mathieu Desnoyers

Linux kernel

Nathan Chancellor

Linux Kernel, LLVM

Gábor Horváth

LLVM / Clang

Martin Donath

Material for MkDocs

Jussi Pakkanen

Meson Build System

Amos Wenger

Mevi

Anders F Björklund

minikube

Maksim Levental

MLIR

Andrzej Warzynski

MLIR, IREE

Arnaud Ferraris

Mobian

Rui Ueyama

mold

Ryan Lahfa

nixpkgs

Simon Marquis

Now in Android

William Cheng

OpenAPI Generator

Kim O'Sullivan

OpenFIPS201

Yigakpoa Laura Ikpae

Oppia

Aanuoluwapo Adeoti

Oppia

Philippe Antoine

oss-fuzz

Tornike Kurdadze

Pinput

Andrey Sitnik

Postcss (and others: Autoprefixer, postcss, browserslist, logux)

Marc Gravell

protobuf-net

Jean Abou Samra

Pygments

Qiming Sun

PySCF

Trey Hunner

Python

Will Constable

PyTorch/XLA

Jay Berkenbilt

qpdf

Ahmed El-Helw

Quran App for Android

Jan Gorecki

Reproducible benchmark of database-like ops

Ralf Jung

Rust

Frank Steffahn

Rust, ICU4X

Bhaarat Krishnan

Serverless Web APIs Workshop

Maximilian Keppeler

Sheets-Compose-Dialogs

Cory LaViska

Shoelace

Carlos Panato

Sigstore

Keith Zantow

spdx/tools-golang

Hayley Patton

Steel Bank Common Lisp

Qamar Safadi

Sunflower

Victor Julien

Suricata

Eyoel Defare

textfield_tags

Giedrius Statkevičius

Thanos

Michael Park

The Good Docs Project

Douglas Theobald

Theseus

David Blevins

Tomee

Anthony Fu

Vitest

Ryuta Mizuno

Volcago

Nicolò Ribaudo

WHATWG HTML Living Standard; ECMAScript Language Specification

Antoine Martin

xpra

Toru Komatsu

youki

We are incredibly proud of all of the nominees for their outstanding contributions to open source, and we look forward to seeing even more amazing contributions in the years to come. An additional thanks to Maria Tabak who has helped to lay the groundwork and management of this program for the past 5 years!

By Mike Bufano, Google Open Source Peer Bonus Program Lead

A look back at BazelCon ’23 and the launch of Bazel 7

In October ‘23, the Google Bazel team hosted the 7th annual BazelCon, a gathering for the Bazel community and broader Build ecosystem. We welcomed enterprise users and program partners, companies building businesses on top of Bazel, as well as enthusiasts curious to learn more about this space. This year, BazelCon made its debut outside North America and was hosted in the Google Munich office.


BazelCon recap

The Bazel ecosystem is growing. This year, we had over 200 in-person external attendees, over 3K livestream views, and a record number of 120 proposals submitted by the community.

We started the conference with a keynote address by Mícheál Ó Foghlú (Engineering Director at Google), followed by a state-of-the-union address by John Field and Tobias Werth (Engineering Managers at Google).

The Bazel community showcased a series of technical and lightning main-stage talks. To highlight a few:

    • BMW shared insights into how they released several “Bazel cars”
    • JetBrains* announced the preview release of their new Bazel plugin for their IDEs
    • Booking.com walked through their journey of adopting Bazel, thereby reducing CI time from 22 minutes to under 2 minutes and container image size by 80%

Take a look at published recordings of all of these talks at your own leisure.

In addition to hearing from presenters, conference attendees also had the opportunity to engage with each other in smaller, more interactive forums. Through live Q&A with the Bazel team and several Birds of a Feather sessions on topics ranging from authoring rulesets, to collecting usage data responsibly, to IDE integrations, the Bazel community was able to provide direct feedback to the team and spark productive discussions. Make sure to check out published notes from these sessions.

At BazelCon, we also proudly announced the initial release candidate for Bazel 7, which has since launched.


What’s new in Bazel 7?

Bazel 7 is the latest major release on the long-term support (LTS) track. Many multi-year efforts have landed in this release. For example:

Bzlmod: Bzlmod, Bazel's new modular external dependency management system, is now enabled by default (i.e. --enable_bzlmod defaults to true). If your project doesn't have a MODULE.bazel file, Bazel will create an empty one for you. The old WORKSPACE mechanism will continue to work alongside the new Bzlmod-managed system. Learn more about what’s changed since Bazel 6 and what’s coming up in Bazel 8 and 9.

Build without the Bytes (BwoB): Build without the Bytes for builds using remote execution is now enabled by default (i.e. --remote_download_outputs defaults to toplevel). Bazel will no longer try to download any intermediate outputs from the remote server, but only the outputs of requested top-level targets instead. This significantly improves remote build performance. Learn more about BwoB.

Merged analysis and execution (Skymeld): Project Skymeld aims to improve multi-target build performance by removing the boundary between the analysis and execution phases and allowing targets to be independently executed as soon as their analysis finishes.

Platform-based toolchain resolution for Android and C++: This change helps streamline the toolchain resolution API across all rulesets, obviating the need for language-specific flags. It also removes technical debt by having Android and C++ rules use the same toolchain resolution logic as other rulesets. Full details for Android developers are available in the Android Platforms announcement.

Read the full release notes for Bazel 7.
 

Stay up-to-date with Bazel

We are thankful to everyone who played a role in making BazelCon ‘23 a big success - speakers, contributors, attendees, the planning committee, and more. We look forward to seeing you again next year!

In the meantime, follow along as we work together towards Bazel 8:

If you have any questions or feedback, or would like to share something you’ve built, reach out to [email protected]. We would love to hear from you!

By the Google Bazel team

*Copyright © 2023 JetBrains s.r.o. JetBrains and IntelliJ are registered trademarks of JetBrains s.r.o

Google Season of Docs announces results of 2023 program

Google Season of Docs is happy to announce the 2023 program results, including the project case studies.

Google Season of Docs is a grant-based program where open source organizations apply for US$5-15,000 to hire technical writers to complete documentation projects. At the end of the six-month documentation development phase, organizations submit a case study to outline the problems their documentation project was intended to solve, how they are measuring the success of their documentation project, and what they learned during the project. The case studies are publicly available and are intended to help other open source organizations learn best practices in open source documentation.

The 2023 Google Season of Docs documentation development phase began on March 31 and ended November 21, 2023 for all projects. Participants in the 2023 program will also answer three followup surveys in 2024, in order to better track the impact of these documentation projects over time.

Feedback from the 2023 participating projects was extremely positive:

“I would strongly recommend engaging with a technical writer who is genuinely passionate about open-source initiatives. A writer who asks probing questions, encourages leaders to think innovatively, and is eager to learn in unfamiliar domains can be incredibly beneficial."
      – Digital Biomarker Discovery Project
“Having a dedicated resource under the banner of GSoD helped as it allowed the team to focus on core activities while leaving out the worries related to the stacking documentation challenges behind, to be taken care of by the writer."
      – Flux
“We made significant improvements to nearly half of the p5.js reference and laid the groundwork for a team of writers currently working on documentation. Along the way, we engaged a broad cross-section of the community and strengthened bonds among core contributors. "
      – p5.js

Take a look at the participant list to see the initial project plans and case studies!


What’s next?

Stay tuned for information about Google Season of Docs 2024—watch for posts on this blog and sign up for the announcements email list. We’ll also be publishing the 2023 case study summary report in early 2024.

If you were excited about participating in the 2023 Google Season of Docs program, please do write social media posts. See the promotion and press page for images and other promotional materials you can include, and be sure to use the tag #SeasonOfDocs when promoting your project on social media. To include the tech writing and open source communities, add #WriteTheDocs, #techcomm, #TechnicalWriting, and #OpenSource to your posts.

By Erin McKean, Google Open Source Programs Office

Google Summer of Code 2023 Final Results!

On November 17th, we wrapped up our 2023 Google Summer of Code program where 903 contributors completed open source projects for 168 OSS organizations. This year 70% (628) of the successful 2023 contributors opted for a 12-week project, while the remaining 30% (275) completed their GSoC work over the past few weeks. That being said, many contributors choose to continue involvement in the OSS community after finishing their GSoC projects. GSoC is typically just one small chapter in a contributor's lifetime open source journey.

This certainly was one of our most enthusiastic groups of mentors & GSoC contributors yet. We were able to host multiple virtual check-ins where contributors had the chance to ask GSoC Administrators questions and get live reminders and advice regarding the program and its milestones, the response to these sessions was overwhelming with one session having over 60% of 2023 GSoC contributors attending. Our final virtual event as part of this series was a multi-day ‘Contributor Talks’ Series where 43 participants had the chance to give three minute Lightning Talks about their GSoC projects.

Quick GSOC 2023 by the Numbers: 99% of 2023 orgs say that GSoC has value for their organization, 83% of 2023 contributors said they would consider being a mentor, 30% of 2023 contributors said that GSoC has already helped them with a job offer, 99.8% of 2023 contributors plan to continue working on open source, 88%
of 2023 contributors were first-time GSoC contributors meaning 12% had participated in GSOC before, 96.25% of contributors said they would apply to GSoC again.
In 2023, GSoC contributors rated their GSOC experience as 3.79/4. Mentors gave their GSoC contributor's overall performance a 4.41/5.

Our mentors and GSoC contributors spent a lot of time giving us invaluable feedback from the program so we wanted to share a few top insights below. Their comments help us to keep the program relevant and to continue to meet the needs of open source communities and new open source contributors.


Advice for future contributors

As we head into our 20th year of GSoC, we wanted to highlight some of the advice that the 2023 GSoC contributors offered to future contributors. Much of the advice falls into the themes of:

    • Communicate early and often with mentors.
    • Take the time in February as soon as orgs are announced to find the right org and choose a project you are excited about, it will make the program much more enjoyable.
    • Set realistic goals and break tasks into milestones.
    • Be open to learning! Open source can seem intimidating but you have amazing mentors and the community there encouraging and supporting you.

We welcomed 18 new mentoring organizations this year, many of which were able to attend our Mentor Summit on Google’s campus a few weeks ago.

In 2023, 10.15% of GSoC contributors were non-students. This was the second year since we opened up the program to non-students. We hope to continue to have more potential GSoC contributors who are changing careers or not currently enrolled in academic programs join the program.

“My advice is to just go for it. I'm a thirty-something career-changer who doesn't have a technical background; at times, I doubt myself and my ability to transition into a more technical field. During GSoC, I was paired with knowledgeable, friendly, engaging mentors who trusted me to get the work done. It was empowering, and I did work that I'm extremely proud of. To anyone in my shoes who may be afraid to take the plunge, I highly encourage them to do so. Seriously - if I can do it, anyone can."
      – brittneyjuliet, GSoC’23 Contributor

Favorite part of GSoC

GSoC contributors have shared their favorite parts of GSoC with some very common themes:

    • Working on real-world projects that thousands/millions of people actually use and rely on
    • Interacting with experienced developers and truly being part of an enthusiastic, welcoming community
    • Making a difference
    • Gaining overall skills and confidence to boost their careers that can’t be obtained from classrooms alone

How GSoC improved their programming skills

95.5% of contributors believe GSoC improved their programming skills. The most common responses to how GSoC improved their skills were:

    • Practical experience. Applying programming concepts and techniques to real projects.
    • Learning to write cleaner, more maintainable code.
    • Enhanced problem solving skills.
    • Project management - learned how to break large, complex problems into smaller, organized tasks.
    • Learning to understand complex codebases.
    • Learning new concepts and technologies.
    • Engaging in code reviews with mentors regularly helped to grasp industry best practices.

We want to thank all of our mentors, organization administrators, and GSoC contributors for a rewarding and smooth GSoC 2023. The excitement from our GSoC contributors throughout the program and our mentors at the recent Mentor Summit was palpable. Thank you all for the time and energy you put in to make open source communities stronger and sustainable.


GSoC 2024 will be open for organization applications from January 22–February 6, 2024. We will announce the 2024 accepted GSoC organizations February 21 on the program site: g.co/gsoc. GSoC contributor applications will be open March 18–April 2, 2024.

By Stephanie Taylor, Program Manager, and Perry Burnham, Associate Program Manager for the Google Open Source Programs Office