Author Archives: Open Source Programs Office

HarbourBridge: From PostgreSQL to Cloud Spanner

Would you like to try out Cloud Spanner with data from an existing PostgreSQL database? Maybe you’ve wanted to ‘kick the tires’ on Spanner, but have been discouraged by the effort involved?

Today, we’re announcing a tool that makes trying out Cloud Spanner using PostgreSQL data simple and easy.

HarbourBridge is a tool that loads Spanner with the contents of an existing PostgreSQL database. It requires zero configuration—no manifests or data maps to write. Instead, it ingests pg_dump output, automatically builds a Spanner schema, and creates a new Spanner database populated with data from pg_dump.

HarbourBridge is part of the Cloud Spanner Ecosystem, a collection of public, open source repositories contributed to, owned, and maintained by the Cloud Spanner user community. None of these repositories are officially supported by Google as part of Cloud Spanner.

Get up and running fast

HarbourBridge is designed to simplify Spanner evaluation, and in particular to bootstrap the process by getting moderate-size PostgreSQL datasets into Spanner (up to a few GB). Many features of PostgreSQL, especially those that don't map directly to Spanner features, are ignored, e.g. (non-primary) indexes, functions and sequences.

View HarbourBridge as a way to get up and running fast, so you can focus on critical things like tuning performance and getting the most out of Spanner. Expect that you'll need to tweak and enhance what HarbourBridge produces—More on this later.

Quick-start guide

The HarbourBridge README contains a step-by-step quick-start guide. We’ll quickly review the main steps. Before you begin, you'll need a Cloud Spanner instance, Cloud Spanner API enabled for your Google Cloud project, authentication credentials configured to use the Cloud API, and Go installed on your development machine.

To download HarbourBridge and install it, run
go get -u github.com/cloudspannerecosystem/harbourbridge
The tool should now be installed as $GOPATH/bin/harbourbridge. To use HarbourBridge on a PostgreSQL database called mydb, run
pg_dump mydb | $GOPATH/bin/harbourbridge
The tool will use the cloud project specified by the GCLOUD_PROJECT environment variable, automatically determine the Cloud Spanner instance associated with this project, convert the PostgreSQL schema for mydb to a Spanner schema, create a new Cloud Spanner database with this schema, and finally, populate this new database with the data from mydb. HarbourBridge also generates several files when it runs: a schema file, a report file (with details of the conversion), and a bad data file (if any data is dropped). See Files Generated by HarbourBridge.

Take care with ACLs

Note that PostgreSQL table-level and row-level ACLs are dropped during conversion since they are not supported by Spanner (Spanner manages access control at the database level). All data written to Spanner will be visible to anyone who can access the database created by HarbourBridge (which inherits default permissions from your Cloud Spanner instance).

Next steps

The tables created by HarbourBridge provide a starting point for evaluation of Spanner. While they preserve much of the core structure of your PostgreSQL schema and data, many important PostgreSQL features have been dropped.

In particular, HarbourBridge preserves primary keys but drops all other indexes. This means that the out-of-the-box performance you get from the tables created by HarbourBridge can be significantly slower than PostgreSQL performance. If HarbourBridge has dropped indexes that are important to the performance of your SQL queries, consider adding Secondary Indexes to the tables created by HarbourBridge. Use the existing PostgreSQL indexes as a guide. In addition, Spanner's Interleaved Tables can provide a significant performance boost.

Other dropped features include functions, sequences, procedures, triggers, and views. In addition, types have been mapped based on the types supported by Spanner. Types such as integers, floats, char/text, bools, timestamps and (some) array types map fairly directly to Spanner, but many other types do not and instead are mapped to Spanner's STRING(MAX). See Schema Conversion for details of the type conversions and their tradeoffs.

Recap

HarbourBridge automates much of the manual work of trying out Cloud Spanner using PostgreSQL data. The goal is to bootstrap your evaluation and help get you to the meaty issues as quickly as possible. The tables generated by HarbourBridge provide a starting point, but they will likely need to be tweaked and enhanced to support a full evaluation.

We encourage you to try out the tool, send feedback, file issues, fork and modify the codebase, and send PRs for fixes and new functionality. Our plans and aspirations for developing HarbourBridge further are outlined in the HarbourBridge Whitepaper. HarbourBridge is part of the Cloud Spanner Ecosystem, owned and maintained by the Cloud Spanner user community. It is not officially supported by Google as part of Cloud Spanner.

By Nevin Heintze, Cloud Spanner

Importing SA360 WebQuery reports to BigQuery

Context

Search Ads 360 (SA36) is an enterprise-class search campaign management platform used by marketers to manage global ad campaigns across multiple engines. It offers powerful reporting capability through WebQuery reports, API, BiqQuery and Datastudio connectors.

Effective Ad campaign management requires multi-dimensional analysis of campaign data along with customers’ first-party data by building custom reports with dimensions combined from paid-search reports and business data.

Customers’ business data resides in a data-warehouse, which is designed for analysis, insights and reporting. To integrate ads data into the data-warehouse, the usual approach is to bring/ load the campaign data into the warehouse; to achieve this, SA360 offers various options to retrieve paid-search data, each of these methods provide a unique capabilities.

Comparison AreaWebQueryBQ ConnectorDatastudio ConnectorAPI
Technical complexityLow
Medium
Medium
High
Ease of report customizationHigh
Medium
Low
High
Reporting DetailsCompleteLimited
Reports not supported on API are not available
E.g.
Location targets
Remarketing targets
Audience reports
Possible Data WarehouseAny
The report is generic and needs to be loaded into the data-warehouse using DWs custom loading methods.
BigQuery ONLYNoneAny
Comparing these approaches, in terms of technical knowledge required, as well as, supporters data warehousing solution, the easiest one is WebQuery report for which a marketer can build a report by choosing the dimensions/metrics they want on the SA360 User Interface.

BigQuery data-transfer service is limited to importing data in BigQuery and Datastudio connector does not allow retrieving data.

WebQuery offers a simpler and customizable method than other alternatives and also offers more options for the kind of data (vs. BQ transfer service which does not bring Business Data from SA360 to BigQuery). It was originally designed for Microsoft Excel to provide an updatable view of a report. In the era of cloud computing, a need was felt for a tool which would help consume the report and make it available on an analytical platform or a cloud data warehouse like BigQuery.

Solution Approach



This tool showcases how to bridge this gap of bringing SA360 data to a data warehouse, in generic fashion, where the report from SA360 is fetched in XML format and converted it into a CSV file using SAX parsers. This CSV file is then transferred to staging storage to be finally ETLed into the Data Warehouse.

As a concrete example, we chose to showcase a solution with BigQuery as the destination (cloud) data warehouse, though the solution architecture is flexible for any other system.

Conclusion

The tool helps marketers bring advertising data closer to their analytical systems helping them derive better insights. In case you use BigQuery as your Data Warehouse, you can use this tool as-is. You can also adopt by adding components for analytical/data-warehousing systems you use and improve it for the larger community.

To get started, follow our step-by-step guide.
Notable Features of the tool are as following:
  • Modular Authorization module
  • Handle arbitrarily large web-query reports
  • Batch mode to process multiple reports in a single call
  • Can be used as part of ETL workflow (Airflow compatible)
By Anant Damle, Solutions Architect and Meera Youn, Technical Partnership Lead

Announcing our Google Code-in 2019 Winners!

Google Code-in (GCI) 2019 was epic in every regard. Not only did we celebrate 10 years of the Google Code-in program, but we also broke all of our previous records for the program. It was a very, very busy seven weeks for everyone—we had 3,566 students from 76 countries complete 20,840 tasks with a record 29 open source organizations!

We want to congratulate all of the students who took part in this year’s 10th anniversary of Google Code-in. Great job!

Today we are excited to announce the Grand Prize Winners, Runners Up, and Finalists with each organization.

The 58 Grand Prize Winners completed an impressive 2,158 tasks while also helping other students.

Each of the Grand Prize Winners will be awarded a four-day trip to Google’s campus in northern California to meet with Google engineers, one of the mentors they worked with during the contest, and enjoy some fun in California with the other winners. We look forward to seeing these winners in a few months!

Grand Prize Winners

The Grand Prize Winners hail from 21 countries, listed by full name alphabetically below:
Name
Organization
Country
Aayushman Choudhary
JBoss Community
India
Abdur-Raheem Idowu
Haiku
Norway
Abhinav Kaushlya
The Julia Programming Language
India
Aditya Vardhan Singh
The ns-3 Network Simulator project
India
Anany Sachan
OpenWISP
India
Andrea Gonzales
Sugar Labs
Malaysia
Anmol Jhamb
Fedora Project
India
Aria Vikram
Open Roberta
India
Artur Grochal
Drupal
Poland
Bartłomiej Pacia
Systers, An AnitaB.org Community
Poland
Ben Houghton
Wikimedia
United Kingdom
Benjamin Amos
The Terasology Foundation
United Kingdom
Chamindu Amarasinghe
SCoRe Lab
Sri Lanka
Danny Lin
CCExtractor Development
United States
Diogo Fernandes
Apertium
Luxembourg
Divyansh Agarwal
AOSSIE
India
Duc Minh Nguyen
Metabrainz Foundation
Vietnam
Dylan Iskandar
Liquid Galaxy
United States
Emilie Ma
Liquid Galaxy
Canada
Himanshu Sekhar Nayak
BRL-CAD
India
Jayaike Ndu
CloudCV
Nigeria
Jeffrey Liu
BRL-CAD
United States
Joseph Semrai
SCoRe Lab
United States
Josh Heng
Circuitverse.org
United Kingdom
Kartik Agarwala
The ns-3 Network Simulator project
India
Kartik Singhal
AOSSIE
India
Kaustubh Maske Patil
CloudCV
India
Kim Fung
The Julia Programming Language
United Kingdom
Kumudtiha Karunarathna
FOSSASIA
Sri Lanka
M.Anantha Vijay
Circuitverse.org
India
Maathavan Nithiyananthan
Apertium
Sri Lanka
Manuel Alcaraz Zambrano
Wikimedia
Spain
Naman Modani
Copyleft Games
India
Navya Garg
OSGeo
India
Neel Gopaul
Drupal
Mauritius
Nils André
CCExtractor Development
United Kingdom
Paraxor
Fedora Project
United Arab Emirates
Paweł Sadowski
OpenWISP
Poland
Pola Łabędzka
Systers, An AnitaB.org Community
Poland
Pranav Karthik
FOSSASIA
Canada
Pranay Joshi
OSGeo
India
Prathamesh Mutkure
OpenMRS
India
Pratish Rai
R Project for Statistical Computing
India
Pun Waiwitlikhit
The Mifos Initiative
Thailand
Rachit Gupta
The Mifos Initiative
India
Rafał Bernacki
Haiku
Poland
Ray Ma
OpenMRS
New Zealand
Rick Wierenga
TensorFlow
Netherlands
Sayam Sawai
JBoss Community
India
Sidaarth “Sid” Sabhnani
Copyleft Games
United States
Srevin Saju
Sugar Labs
Bahrain
Susan He
Open Roberta
Australia
Swapneel Singh
The Terasology Foundation
India
Sylvia Li
Metabrainz Foundation
New Zealand
Umang Majumder
R Project for Statistical Computing
India
Uzay Girit
Public Lab
France
Vladimir Mikulic
Public Lab
Bosnia and Herzegovina
William Zhang
TensorFlow
United States

Runners Up

And a big kudos to our 58 Runners Up from 20 countries. They will receive a GCI backpack, jacket and a GCI tshirt. The Runners Up are listed alphabetically by First name below:
Name
Organization

Name
Organization
Adev Saputra
Drupal

Kunal Bhatia
Score Lab
Adrian Serapio
R Project for Statistical Computing

Laxya Pahuja
The Mifos Initiative
Alberto Navalón Lillo
Apertium

Łukasz Zbrzeski
Score Lab
Alvii_07
Liquid Galaxy

Madhav Mehndiratta
Fedora Project
Amar Fadil
OpenWISP

Marcus Chong
Sugar Labs
Ananya Gangavarapu
TensorFlow

Mateusz Samkiewicz
JBoss Community
Andrey Shcherbakov
Wikimedia

Maya Farber Brodsky
CCExtractor Development
Antara Bhattacharya
Metabrainz Foundation

Michał Piechowiak
Fedora Project
Anthony Zhou
Public Lab

Moodhunt
Metabrainz Foundation
Bartosz Dokurno
Circuitverse.org

Muhammad Wasif
FOSSASIA
Ching Lam Choi
The Julia Programming Language

name not shown
Haiku
Chirag Bhansali
AOSSIE

Nathan Taylor
Sugar Labs
Chiranjiv Singh Malhi
BRL-CAD

Nishanth Thumma
Open Roberta
Daksha Aeer
Systers, An AnitaB.org Community

Panagiotis Vasilopoulos
Haiku
Devansh Khetan
OpenMRS

Rachin Kalakheti
TensorFlow
Dhanus SL
OSGeo

Regan Iwadha
JBoss Community
Dhhyey Desai
AOSSIE

Ribhav Sharma
OpenMRS
Eric Xue
Copyleft Games

Richard Botez
Open Roberta
Eryk Mikołajek
BRL-CAD

Rishabh Verma
The Mifos Initiative
Hannah Guo
The Terasology Foundation

Rishank Kanaparti
Copyleft Games
Harsh Khandeparkar
Public Lab

Rishi R
R Project for Statistical Computing
Hirochika Matsumoto
CloudCV

Sai Putravu
The ns-3 Network Simulator project
Ilya Maier
Systers, An AnitaB.org Community

Samuel Sloniker
Apertium
Irvan Ayush Chengadu
Drupal

Shivam Rai
OSGeo
Jakub Niklas
The Terasology Foundation

Siddharth Sinha
FOSSASIA
Jun Rong Lam
Circuitverse.org

Soumitra Shewale
The Julia Programming Language
Karol Ołtarzewski
OpenWISP

Stanisław Howard
The ns-3 Network Simulator project
Kripa Kini
Liquid Galaxy

Suryansh Pathak
CloudCV
Krzysztof Krysiński
CCExtractor Development

Taavi Väänänen
Wikimedia

Finalists

And a hearty congratulations to our 58 Finalists from 20 countries. The finalists will win a special GCI jacket and a GCI tshirt. They are listed alphabetically by first name below:
Name
Organization

Name
Organization
Abinav Chari
CloudCV

Musab Kılıç
CCExtractor Development
Andre Christoga Pramaditya
CloudCV

Nail Anıl Örcün
The Terasology Foundation
Anish Agnihotri
OSGeo

Natalie Shapiro
Circuitverse.org
Aryan Gulati
FOSSASIA

Nate Clark
The Terasology Foundation
Ayush Sharma
Fedora Project

Nicholas Gregory
Wikimedia
Ayush Sharma
SCoRe Lab

Nikita Ermishin
OpenWISP
Daniel Oluojomu
JBoss Community

Nishith P
FOSSASIA
Dhruv Baronia
TensorFlow

Oliver Fogelin
R Project for Statistical Computing
Diana Hernandez
Systers, An AnitaB.org Community

Oussama Hassini
The Mifos Initiative
Gambali Seshasai Chaitanya
Apertium

Param Nayar
Copyleft Games
Hao Liu
R Project for Statistical Computing

Peter Terpstra
The ns-3 Network Simulator project
Hardik Jhalani
Systers, An AnitaB.org Community

Piyush Sharma
The Mifos Initiative
Hrishikesh Patil
OpenMRS

Robert Chen
Public Lab
Jackson Lewis
The ns-3 Network Simulator project

Rohan Cherivirala
Open Roberta
Jan Rosa
Wikimedia

Ruixuan Tu
Haiku
Janiru Hettiarachchi
Liquid Galaxy

Saptashwa Mandal
Drupal
Janiru Wijekoon
Metabrainz Foundation

Sashreek Magan
Sugar Labs
Joshua Yang
Apertium

Sauhard Jain
AOSSIE
Kevin Liu
Open Roberta
Sharman Maheshwari
SCoRe Lab
Krishna Rama Rao
AOSSIE

Sumagna Das
BRL-CAD
Li Chen
Fedora Project

Tanvir Singh
OSGeo
Madhav Shekhar Sharma
The Julia Programming Language

Techno-Disaster
CCExtractor Development
Mbah Javis
TensorFlow

Thusal Ranawaka
BRL-CAD
Merul Dhiman
Liquid Galaxy

Vivek Mishra
Copyleft Games
Michelle (Wai Man) Lo
OpenMRS

Yu Fai Wong
JBoss Community
Mihir Bhave
OpenWISP

Yuqi Qiu
Metabrainz Foundation
Mohit S A
Circuitverse.org

Zakhar Vozmilov
Public Lab
Mokshit Jain
Drupal

Zakiyah Hasanah
Sugar Labs
Mudit Somani
The Julia Programming Language

Zoltán Szatmáry
Haiku

Our 794 mentors, the heart and soul of GCI, are the reason the contest thrives. Mentors volunteer their time to help these bright students become open source contributors. They spend hundreds of hours during their holiday breaks answering questions, reviewing submitted tasks, and welcoming the students to their communities. GCI would not be possible without their dedication, patience and tireless efforts.

We will post more numbers from GCI 2019 here on the Google Open Source Blog over the next few weeks, so please stay tuned.

Congratulations to our Grand Prize Winners, Runners Up, Finalists, and all of the students who spent the last couple of months learning about, and contributing to, open source. We hope they will continue their journey in open source!

By Stephanie Taylor, Google Open Source

Announcing the 2019 second cycle Google Open Source Peer Bonus winners


We are happy to announce the 2019 second cycle winners of the Google Open Source Peer Bonus! This cohort represents the largest number of winners to date, with 115 awardees from 26 countries, including: Australia, Austria, Belgium, Canada, China, Colombia, Denmark, Finland, France, Germany, India, Ireland, Israel, Italy, Japan, Republic of Korea, Mexico, Netherlands, Poland, Portugal, Russia, Spain, Sweden, Switzerland, United Kingdom, and the United States.

The Google Open Source Peer Bonus is an award for open source contributors that are not employed by Google, but nominated by Googlers for their exceptional contributions to open source. Initially, the program began as a way to reward developers; however, has evolved into one that supports all contributors of open source from technical writers and designers to operations.

Below is the list of winners who gave us permission to thank them publicly:
WinnerOpen Source Project
Miina SikkAMP Plugin for WordPress, AMP Stories
Ryan KienstraAMP Plugin for WordPress, AMP Stories
Joost KoehoornAngular
Ash Berlin-TaylorApache Airflow
Jarek PotiukApache Airflow
Kamil BregulaApache Airflow
Ismael Mejia Apache Beam, Avro
Jose FonsecaAPITrace
Lars Zawallich Appleseed
Maximilian Michels Beam, Flink
Roman Lebedevbenchmark
Ben Manes Caffeine
Yang Luocasbin; npcap; nmap
Sedat DilekClangBuiltLinux
Nathan ChancellorClangBuiltLinux
Pablo Galindo Salgado CPython
Karthikeyan Singaravelan CPython
Tobe OsakweDart build system
Drew Banin DBT
Michael Johnson Discourse - Google+ Import Script
Philip Rebohledxvk
Mike Blandforder9x/ersky9x radio firmware
Simon EdwardsExtraterm
Ethan LeeFNA, FAudio, SDL2
Vasco Asturianoforce-graph
Alexandre AlapetiteFreshRSS
Jenny Bryan gargle: an R package for calling Google APIs from R, including auth.
Patrick Mulhall Gerrit Code Review
Gert van Dijk Gerrit Code Review
Rafael Ascensão Git
Arnold RobbinsGNU awk
Alberto DonizettiGo
Alessandro ArzilliGo
Tobias Klauser Go
Emmanuel Odeke Go
Brian KesslerGo
Giovanni Bajo Go compiler
Glenn Lewis go-github
Cedric Staub go-jose
Paul Jollygo-tools
Daniel Martígo-tools
Dominik Honnefgo-tools
Mulr Mandersgo-tools
Billie Cleek go-tools
Ramya Raogo-tools
John Paton Google Cloud Python client libraries and Pandas GBQ
Krystian Kuźniarekgoogletest
Gernot VormayrGoPacket
Johan Brandhorstgrpc-gateway
Mike JumperGuacamole
Willy TarreauHAProxy
Mike McQuaid HomeBrew
Joachim ViideHTM
Serguei Bezverkhinftables
Kalle PerssonInbox Theme for Gmail
Artem Gusevios-webkit-debug-proxy
Morven CaoIstio Operator
Karol LassakJenkins GCE plugin
Sebastien Goasguen Knative
Joan Edwards Knative
Markus ThömmesKnative
Ashleigh BrennanKnative
Cornelius WeigKrew
Josh BottumKubeflow
Kam Kasravikubeflow/kubeflow, kubeflow/manifests
Rune Mehlsenlit-analyzer
Roman LebedevLLVM
Jonas BernoulliMagit
Jaeyoung TaeMaterial Components Web/Material Components Web React
Maximilian Hilsmitmproxy
Brijesh Bittumonaco-vim
Rich Felker musl
Tim NeutkensNext.js
Gordon Lyonnmap
Ryan Gordon Numerous open source games and engines
Carlos Alberto CortezOpenTelemetry
Roch Devost OpenTelemetry
Ted Young OpenTelemetry
Joshua MacDonald OpenTelemetry/opentelemetry-go
Daniel KhanOpenTelemetry
Brandon Gonzalez OpenTelemetry
Valentin Marchaud OpenTelemetry and OpenCensus
Olivier Albertini OpenTelemetry and OpenCensus
Armin Ruech OpenTelemetry-Java
Tyler Benson OpenTelemetry-Java
Paulo Janotti OpenTelemetry-Service
Akshay Anand Oppia
James MarcaOR-Tools
Max DymondOSS-Fuzz
Ignazio Palmisano OWL API
Marcos Caceres Payment Request API
Jovi De Croock Preact
Leah UllmannPreact
Hervé Bredinpyannote
Tomohiko KinebuchiPython official document Japanese translation project
Gabriela de Queiroz R
Baldur KarlssonRenderDoc
Fabian HennekeSecure Shell
Sam AaronSonic Pi
Greg Roth Spiregg (SPIR-V Backend in DirectXShaderCompiler)
Erica SadunSwift Evolution
Sean Morgantensorflow/addons
Yong Tangtensorflow/io
Shree KumarTesseract
Seth Larson urllib3
Michael Tüxenusrsctp
Felix Weinrankusrsctp
Qiuyi ZhangV8
Sébastien HelleuWeechat
Wesley Shields YARA
Congratulations to the winners! Open source is a shared effort that is only possible with everyone’s commitment to build better solutions for the world. Thank you for partnering with us in this mission. We look forward to more collaborations in the months to come!

By María Cruz, Google Open Source

BazelCon 2019

Cross-posted from the original BazelCon 2019 recap .

Last month the Google Bazel team hosted its largest ever Bazel user conference: BazelCon 2019, an annual gathering of the community surrounding the Bazel build system. This is the main Bazel event of the year which serves as an opportunity for Bazel contributors, maintainers, and users to meet and learn from each other, present Bazel migration stories, educate new users, and collaborate together on the future of Bazel.

BazelCon 2019 by the Numbers

  • 400+ attendees (2x increase over BazelCon 2018)
  • 125 organizations represented including Microsoft, Spotify, Uber, Apple, Cruise, EA, Lyft, Tesla, SpaceX, SAP, Bloomberg, Wix, Etsy, BMW and others
  • 26 full-length talks and 15 lightning talks by members of the external community and Googlers
  • 16 hours of Q&A during Office Hours with Bazel team members
  • 45 Bazel Bootcamp attendees
  • 5 Birds of a Feather sessions on iOS, Python, Java, C++ and Front-end Bazel rules
  • 182 users in the #bazelcon2019 Slack channel

BazelCon 2019 Full Length Talks

The full playlist also includes lighting talks.
  • Keynote: The Role of Catastrophic Failure in Software Design – Jeff Atwood (Stack Overflow/Discourse)
  • Bazel State of the Union – John Field and Dmirty Lomov (Google)
  • Building Self Driving Cars with Bazel – Axel Uhlig and Patrick Ziegler (BMW Group)
  • Moving to a Bazel-based CI system: 6 Learnings – Or Shachar (Wix)
  • Bazel Federation – Florian Weikert (Google)
  • Lessons from our First 100,000 Bazel Builds – Kevin Gessner (Etsy)
  • Migrating Lyft-iOS to Bazel – Keith Smiley and Dave Lee (Lyft)
  • Test Selection – Benjamin Peterson (Dropbox)
  • Porting iOS Apps to Bazel – Oscar Bonilla (LinkedIn)
  • Boosting Dev Box Performance with Remote Execution for Non-Hermetic Build Engines – Erik Mavrinac (Microsoft)
  • Building on Key - Keeping your Actions and Remote Executions in Tune – George Gensure (UberATG)
  • Bazel remote execution API vs Goma – Mostyn Bramley-Moore (Vewd Software)
  • Integrating with ease: leveraging BuildStream interaction with Bazel build for consistent results – Daniel Silverstone (Codethink)
  • Building Self-Driving Cars with Bazel – Michael Broll and Nico Valigi (Cruise)
  • Make local development (with Bazel) great again! – Ittai Zeidman (Wix)
  • Gradle to Bazel – Chip Dickson and Charles Walker (SUM Global Technology)
  • Bazel Bootcamp – Kyle Cordes (Oasis Digital)
  • Bazel migration patterns: how to prove business value with a small investment – Alex Eagle and Greg Magolan (Google)
  • Dynamic scheduling: Fastest clean and incremental builds – Julio Merino (Google)
  • Building a great CI with Bazel – Philipp Wollermann (Google)
By Misha Narinsky, Bazel Team

Google Summer of Code 2020 is now open for mentor organization applications!

We are looking for open source projects and organizations to participate in the 16th annual Google Summer of Code (GSoC)! GSoC is a global program that draws university student developers from around the world to contribute to open source projects. Each student will spend three months working on a coding project with the support of volunteer mentors from participating open source organizations, mid-May to mid-August.

Last year, 1,276 students worked with 206 open source organizations and over 2,000 mentors. Organizations include small and medium sized open source projects, as well as a number of umbrella organizations with many sub-projects under them (Apache Software Foundation, Python Software Foundation, etc.).

Our 2020 goal is to accept more organizations into their first GSoC than ever before! We ask that veteran organizations refer other organizations they think would be a good fit to participate in GSoC.

You can apply to be a mentoring organization for GSoC starting today. The deadline to apply is February 5 at 19:00 UTC. Organizations chosen for GSoC 2020 will be publicly announced on February 20.

Please visit the program site for more information on how to apply and review the detailed timeline of important deadlines. We also encourage you to check out the Mentor Guide and our short video on why open source projects apply to be a part of the program.

Best of luck to all of the open source mentoring organization applicants!

By Stephanie Taylor, Google Open Source

Securing open source: How Google supports the new Kubernetes bug bounty

At Google, we care deeply about the security of open-source projects, as they’re such a critical part of our infrastructure—and indeed everyone’s. Today, the Cloud-Native Computing Foundation (CNCF) announced a new bug bounty program for Kubernetes that we helped create and get up and running. Here’s a brief overview of the program, other ways we help secure open-source projects and information on how you can get involved.

Launching the Kubernetes bug bounty program

Kubernetes is a CNCF project. As part of its graduation criteria, the CNCF recently funded the project’s first security audit, to review its core areas and identify potential issues. The audit identified and addressed several previously unknown security issues. Thankfully, Kubernetes already had a Product Security Committee, including engineers from the Google Kubernetes Engine (GKE) security team, who respond to and patch any newly discovered bugs. But the job of securing an open-source project is never done. To increase awareness of Kubernetes’ security model, attract new security researchers, and reward ongoing efforts in the community, the Kubernetes Product Security Committee began discussions in 2018 about launching an official bug bounty program.

Find Kubernetes bugs, get paid

What kind of bugs does the bounty program recognize? Most of the content you’d think of as ‘core’ Kubernetes, included at https://github.com/kubernetes, is in scope. We’re interested in common kinds of security issues like remote code execution, privilege escalation, and bugs in authentication or authorization. Because Kubernetes is a community project, we’re also interested in the Kubernetes supply chain, including build and release processes that might allow a malicious individual to gain unauthorized access to commits, or otherwise affect build artifacts. This is a bit different from your standard bug bounty as there isn’t a ‘live’ environment for you to test—Kubernetes can be configured in many different ways, and we’re looking for bugs that affect any of those (except when existing configuration options could mitigate the bug). Thanks to the CNCF’s ongoing support and funding of this new program, depending on the bug, you can be rewarded with a bounty anywhere from $100 to $10,000.

The bug bounty program has been in a private release for several months, with invited researchers submitting bugs and to help us test the triage process. And today, the new Kubernetes bug bounty program is live! We’re excited to see what kind of bugs you discover, and are ready to respond to new reports. You can learn more about the program and how to get involved here.

Dedicated to Kubernetes security

Google has been involved in this new Kubernetes bug bounty from the get-go: proposing the program, completing vendor evaluations, defining the initial scope, testing the process, and onboarding HackerOne to implement the bug bounty solution. Though this is a big effort, it’s part of our ongoing commitment to securing Kubernetes. Google continues to be involved in every part of Kubernetes security, including responding to vulnerabilities as part of the Kubernetes Product Security Committee, chairing the sig-auth Kubernetes special interest group, and leading the aforementioned Kubernetes security audit. We realize that security is a critical part of any user’s decision to use an open-source tool, so we dedicate resources to help ensure we’re providing the best possible security for Kubernetes and GKE.

Although the Kubernetes bug bounty program is new, it isn’t a novel strategy for Google. We have enjoyed a close relationship with the security research community for many years and, in 2010, Google established our own Vulnerability Rewards Program (VRP). The VRP provides rewards for vulnerabilities reported in GKE and virtually all other Google Cloud services. (If you find a bug in GKE that isn’t specific to Kubernetes core, you should still report it to the Google VRP!) Nor is Kubernetes the only open-source project with a bug bounty program. In fact, we recently expanded our Patch Rewards program to provide financial rewards both upfront and after-the-fact for security improvements to open-source projects.

Help keep the world’s infrastructure safe. Report a bug to the Kubernetes bug bounty, or a GKE bug to the Google VRP.

By Maya Kaczorowski, Product Manager, Container Security; and Aaron Small, Product Manager, GKE On-Prem security

Wombat Dressing Room, an npm publication proxy on GCP

We're excited to announce that we're open sourcing the service we use on the Google Cloud Client Libraries team for handling npm publications, it's called Wombat Dressing Room. Wombat Dressing Room provides features that help npm work better with automation, while maintaining good security practices.

A tradeoff is often made for automation

npm has top notch security features: CIDR-range restricted tokens, publication notifications, and two-factor authentication, to name a few. Of these, a feature critical to protecting publications is two-factor authentication (2FA).

2FA requires that you provide two pieces of information when accessing a protected resource: "something you know" (for instance, a password); and "something you have" (for instance, a code from an authenticator app). With 2FA, if your password is exposed, an attacker still can't publish a malicious package (unless they also steal the "something you have".)

On my team, a small number of developers manage over 75 Node.js libraries. We see automation as key to making this possible: we've written tools that automate releases, validate license headers, ensure contributors have signed CLAs; we adhere to the philosophy, automate all the things!

It's difficult to automate the step of entering a code off a cellphone. As a result, folks often opt to turn off 2FA in their automation.

What if you could have both automation and the added security of 2FA? This is why we built the Wombat Dressing Room.

A different approach to authentication

With Wombat Dressing Room, rather than an individual configuring two factor authentication in an authenticator app, 2FA is managed by a shared proxy server. Publications are then directed at the Wombat Dressing Room proxy, which provides the following security features:

Per-package publication tokens.

Wombat Dressing Room can generate authentication tokens tied to repositories on GitHub. These tokens are tied to a single GitHub repository, which the user generating the token must have push permissions for.

If a per-package publication token is leaked, an attacker can only hijack the single package that the token is associated with.

Limited lifetime tokens

Wombat Dressing Room can also generate access tokens that have a 24 hour lifespan. In this model, a leaked token is only vulnerable until the 24 hour lifespan is hit.

GitHub Releases as 2FA

In this authentication model, a package can only be published to npm if a GitHub release with a corresponding tag is found on GitHub.

This introduces a true "second factor", as users must prove they have access to both Wombat Dressing Room and the repository on GitHub.

Getting started with Wombat Dressing Room

We've been using Wombat Dressing Room to manage Google Cloud client libraries for over a year now in our fully automated library release process. As of today, the source is available for everyone on GitHub under an Apache 2.0 license.

Wombat Dressing Room runs on Google App Engine, and instructions on getting it up and running can be found in its README.md.

It's my hope that this will help other folks in the community, simplify and automate their release process, while minimizing the attack surface of their libraries.
By Benjamin Coe, works on Node.js client libraries for the Google Cloud Platform, and was the third engineer at npm, Inc.

Season of Docs Announces Results of 2019 Program

Season of Docs has announced the 2019 program results for standard-length projects. You can view a list of successfully completed technical writing projects on the website along with their final project reports.

During the program, technical writers spent a few months working closely with an open source community. They brought their technical writing expertise to improve the project's documentation while the open source projects provided mentors to introduce the technical writers to open source tools, workflows, and the project's technology.

The technical writers and their mentors did a fantastic job with the inaugural year of Season of Docs! Participants represented countries across all continents except for Antarctica! 36 technical writers out of 41 successfully completed their standard-length technical writing projects, and there are eight long-running projects in progress that are expected to finish in February.

  • 91.7% of the mentors had a positive experience and want to mentor again in future Season of Docs cycles
  • 88% of the technical writers had a positive experience
  • 96% plan to continue contributing to open source projects
  • 100% of the technical writers said that Season of Docs helped improved their knowledge of code and/or open source

Technical writing projects ranged from beginners' guides and tutorials to API and reference documentation; all of which benefited a diverse set of open source projects that included programming languages, software, compiler infrastructure, operating systems, software libraries, hardware, science, healthcare, and more. Take a look at the list of successful projects to see the wide range of subjects covered!

What is next?

The long-running projects are still in progress and finish in February 2020. Technical writers participating in these long-running projects submit their project reports by Feb. 25, and the writer and mentor evaluations are due by Feb. 28. Successfully completed long-running technical writing projects are then published on the results page on March 6, 2020.

If you were excited about participating, please do write social media posts. See the promotion and press page for images and other promotional materials you can include, and be sure to use the tag #SeasonOfDocs when promoting your ideas on social media. To include the tech writing and open source communities, add #WriteTheDocs, #techcomm, #TechnicalWriting, and #OpenSource to your posts.

Stay tuned for information about Season of Docs 2020—watch for posts in this blog and sign up for the announcements email list.

By Andrew Chen, Google Open Source and Sarah Maddox, Cloud Docs

W3C Trace Context Specification: What it Means for You

Since the first days of Google Cloud Platform (GCP), Google has been at the forefront of making your applications more observable. Beyond Stackdriver, our most visible impact in this space is OpenTelemetry, which we initiated in 2017 (as OpenCensus) and has grown into a huge community that includes the majority of APM / monitoring vendors and cloud platforms.

While OpenTelemetry allows developers to easily capture distributed traces and metrics from their own services, there’s also a need to trace requests as they propagate through components that developers don’t directly control, like managed services, load balancers, network hardware, etc. To solve this we co-defined a prototype HTTP header that these components can rely on, gathered partners, and moved the work into the W3C.

This work is now complete, and the W3C Trace Context format is now an official standard. Once implemented in GCP, this will make our services even easier to manage, both with Stackdriver and other third party distributed tracing tools. We explain more in the
official post on the W3C blog, which I’ve copied below:

The W3C Distributed Tracing working group has moved the Trace Context specification to the next maturity level. The specification is already being adopted and implemented by many platforms and SDKs. This article describes the Trace Context specification and how it improves troubleshooting and monitoring of modern distributed apps.

W3C Trace Context specification defines the format for propagating distributed tracing context between services. Distributed tracing makes it easy for developers to find the causes of issues in highly-distributed microservices applications by tracking how a single interaction was processed across multiple services. Each step of a trace is correlated through an ID that is passed between services, and W3C Trace Context now defines a standard for these context propagation headers.

Until now, different tracing systems have defined their own headers. Examples include Zipkin’s B3 format and X-Google-Cloud-Trace. Adopting a common context propagation format has been long desired by developers, APM vendors, and cloud platform hosts, as compatibility provides numerous benefits:
  • Web and RPC frameworks that use this standard to provide context propagation out of the box will also offer cross-service log correlation, even for developers who haven’t set up distributed tracing.
  • API producers can record the trace IDs of requests from API consumers and provide additional spans or metadata to their customers for a given traced request. Producers can also correlate customer trace IDs to internal traces when debugging technical issues raised by consumers.
  • Networking infrastructure (proxies, load balancers, routers, etc.) can both ensure that context propagation headers are not removed from requests passing through them, and can record spans or logs for a given trace, without having to support multiple vendor-specific formats. Potential examples of these include router appliances, cloud load balancers, and sidecar proxies like Envoy.
  • Instrumentation can be further decoupled from a developer’s choice of APM vendor. For example, using both OpenTelemetry and a given vendor’s agents, a developer can instrument different services in an application, and traces will flow through the system and be processed correctly by the vendor’s backend.
  • Web browsers and other clients can use these identifiers to correlate their telemetry with traces collected from backend services. This functionality is currently being defined.
To address this effort, a group of cloud providers, open source contributors, and APM vendors started defining a standard HTTP context propagation header that would replace their homegrown formats. This specification has been discussed and iterated on over the past two years, and the group working on it has grown significantly over that time. Sponsors include Google, Microsoft, Dynatrace, and New Relic (W3C members), and the group was officially moved into the W3C in 2018 for the work to proceed under the guidance of an official standards body and to spur even greater adoption.

TraceContext has since been adopted by OpenTelemetry (which enables it by default and also serves as the reference implementation), Azure services, Dynatrace, Elastic, Google Cloud Platform, Lightstep, and New Relic. We are tracking adoption in this list.

This first phase of work has focused on HTTP, as it is commonly used and has no built-in affordances for trace context propagation (gRPC and some newer RPC systems do). The same group of committee members are also working to define trace context propagation in other formats, starting with AMQP and MQTT for IoT; other upcoming topics include context propagation from clients and web browsers.

By Morgan McLean, OpenTelemetry + Stackdriver