Author Archives: Open Source Programs Office

HarbourBridge: From PostgreSQL to Cloud Spanner

Would you like to try out Cloud Spanner with data from an existing PostgreSQL database? Maybe you’ve wanted to ‘kick the tires’ on Spanner, but have been discouraged by the effort involved?

Today, we’re announcing a tool that makes trying out Cloud Spanner using PostgreSQL data simple and easy.

HarbourBridge is a tool that loads Spanner with the contents of an existing PostgreSQL database. It requires zero configuration—no manifests or data maps to write. Instead, it ingests pg_dump output, automatically builds a Spanner schema, and creates a new Spanner database populated with data from pg_dump.

HarbourBridge is part of the Cloud Spanner Ecosystem, a collection of public, open source repositories contributed to, owned, and maintained by the Cloud Spanner user community. None of these repositories are officially supported by Google as part of Cloud Spanner.

Get up and running fast

HarbourBridge is designed to simplify Spanner evaluation, and in particular to bootstrap the process by getting moderate-size PostgreSQL datasets into Spanner (up to a few GB). Many features of PostgreSQL, especially those that don't map directly to Spanner features, are ignored, e.g. (non-primary) indexes, functions and sequences.

View HarbourBridge as a way to get up and running fast, so you can focus on critical things like tuning performance and getting the most out of Spanner. Expect that you'll need to tweak and enhance what HarbourBridge produces—More on this later.

Quick-start guide

The HarbourBridge README contains a step-by-step quick-start guide. We’ll quickly review the main steps. Before you begin, you'll need a Cloud Spanner instance, Cloud Spanner API enabled for your Google Cloud project, authentication credentials configured to use the Cloud API, and Go installed on your development machine.

To download HarbourBridge and install it, run

go get -u github.com/cloudspannerecosystem/harbourbridge

The tool should now be installed as $GOPATH/bin/harbourbridge. To use HarbourBridge on a PostgreSQL database called mydb, run

pg_dump mydb | $GOPATH/bin/harbourbridge

The tool will use the cloud project specified by the GCLOUD_PROJECT environment variable, automatically determine the Cloud Spanner instance associated with this project, convert the PostgreSQL schema for mydb to a Spanner schema, create a new Cloud Spanner database with this schema, and finally, populate this new database with the data from mydb. HarbourBridge also generates several files when it runs: a schema file, a report file (with details of the conversion), and a bad data file (if any data is dropped). See Files Generated by HarbourBridge.

Take care with ACLs

Note that PostgreSQL table-level and row-level ACLs are dropped during conversion since they are not supported by Spanner (Spanner manages access control at the database level). All data written to Spanner will be visible to anyone who can access the database created by HarbourBridge (which inherits default permissions from your Cloud Spanner instance).

Next steps

The tables created by HarbourBridge provide a starting point for evaluation of Spanner. While they preserve much of the core structure of your PostgreSQL schema and data, many important PostgreSQL features have been dropped.

In particular, HarbourBridge preserves primary keys but drops all other indexes. This means that the out-of-the-box performance you get from the tables created by HarbourBridge can be significantly slower than PostgreSQL performance. If HarbourBridge has dropped indexes that are important to the performance of your SQL queries, consider adding Secondary Indexes to the tables created by HarbourBridge. Use the existing PostgreSQL indexes as a guide. In addition, Spanner's Interleaved Tables can provide a significant performance boost.

Other dropped features include functions, sequences, procedures, triggers, and views. In addition, types have been mapped based on the types supported by Spanner. Types such as integers, floats, char/text, bools, timestamps and (some) array types map fairly directly to Spanner, but many other types do not and instead are mapped to Spanner's STRING(MAX). See Schema Conversion for details of the type conversions and their tradeoffs.

Recap

HarbourBridge automates much of the manual work of trying out Cloud Spanner using PostgreSQL data. The goal is to bootstrap your evaluation and help get you to the meaty issues as quickly as possible. The tables generated by HarbourBridge provide a starting point, but they will likely need to be tweaked and enhanced to support a full evaluation.

We encourage you to try out the tool, send feedback, file issues, fork and modify the codebase, and send PRs for fixes and new functionality. Our plans and aspirations for developing HarbourBridge further are outlined in the HarbourBridge Whitepaper. HarbourBridge is part of the Cloud Spanner Ecosystem, owned and maintained by the Cloud Spanner user community. It is not officially supported by Google as part of Cloud Spanner.

By Nevin Heintze, Cloud Spanner

Source: Google Open Source Blog

Importing SA360 WebQuery reports to BigQuery

Context

Search Ads 360 (SA36) is an enterprise-class search campaign management platform used by marketers to manage global ad campaigns across multiple engines. It offers powerful reporting capability through WebQuery reports, API, BiqQuery and Datastudio connectors.

Effective Ad campaign management requires multi-dimensional analysis of campaign data along with customers’ first-party data by building custom reports with dimensions combined from paid-search reports and business data.

Customers’ business data resides in a data-warehouse, which is designed for analysis, insights and reporting. To integrate ads data into the data-warehouse, the usual approach is to bring/ load the campaign data into the warehouse; to achieve this, SA360 offers various options to retrieve paid-search data, each of these methods provide a unique capabilities.

Comparison Area	WebQuery	BQ Connector	Datastudio Connector	API
Technical complexity	Low	Medium	Medium	High
Ease of report customization	High	Medium	Low	High
Reporting Details	Complete	Limited Reports not supported on API are not available E.g. Location targets Remarketing targets Audience reports
Possible Data Warehouse	Any The report is generic and needs to be loaded into the data-warehouse using DWs custom loading methods.	BigQuery ONLY	None	Any

Comparing these approaches, in terms of technical knowledge required, as well as, supporters data warehousing solution, the easiest one is WebQuery report for which a marketer can build a report by choosing the dimensions/metrics they want on the SA360 User Interface.

BigQuery data-transfer service is limited to importing data in BigQuery and Datastudio connector does not allow retrieving data.

WebQuery offers a simpler and customizable method than other alternatives and also offers more options for the kind of data (vs. BQ transfer service which does not bring Business Data from SA360 to BigQuery). It was originally designed for Microsoft Excel to provide an updatable view of a report. In the era of cloud computing, a need was felt for a tool which would help consume the report and make it available on an analytical platform or a cloud data warehouse like BigQuery.

Solution Approach

This tool showcases how to bridge this gap of bringing SA360 data to a data warehouse, in generic fashion, where the report from SA360 is fetched in XML format and converted it into a CSV file using SAX parsers. This CSV file is then transferred to staging storage to be finally ETLed into the Data Warehouse.

As a concrete example, we chose to showcase a solution with BigQuery as the destination (cloud) data warehouse, though the solution architecture is flexible for any other system.

Conclusion

The tool helps marketers bring advertising data closer to their analytical systems helping them derive better insights. In case you use BigQuery as your Data Warehouse, you can use this tool as-is. You can also adopt by adding components for analytical/data-warehousing systems you use and improve it for the larger community.

To get started, follow our step-by-step guide.
Notable Features of the tool are as following:

Modular Authorization module
Handle arbitrarily large web-query reports
Batch mode to process multiple reports in a single call
Can be used as part of ETL workflow (Airflow compatible)

By Anant Damle, Solutions Architect and Meera Youn, Technical Partnership Lead

Source: Google Open Source Blog

Announcing our Google Code-in 2019 Winners!

Google Code-in (GCI) 2019 was epic in every regard. Not only did we celebrate 10 years of the Google Code-in program, but we also broke all of our previous records for the program. It was a very, very busy seven weeks for everyone—we had 3,566 students from 76 countries complete 20,840 tasks with a record 29 open source organizations!

We want to congratulate all of the students who took part in this year’s 10th anniversary of Google Code-in. Great job!

Today we are excited to announce the Grand Prize Winners, Runners Up, and Finalists with each organization.

The 58 Grand Prize Winners completed an impressive 2,158 tasks while also helping other students.

Each of the Grand Prize Winners will be awarded a four-day trip to Google’s campus in northern California to meet with Google engineers, one of the mentors they worked with during the contest, and enjoy some fun in California with the other winners. We look forward to seeing these winners in a few months!

Grand Prize Winners

The Grand Prize Winners hail from 21 countries, listed by full name alphabetically below:

Name	Organization	Country
Aayushman Choudhary	JBoss Community	India
Abdur-Raheem Idowu	Haiku	Norway
Abhinav Kaushlya	The Julia Programming Language	India
Aditya Vardhan Singh	The ns-3 Network Simulator project	India
Anany Sachan	OpenWISP	India
Andrea Gonzales	Sugar Labs	Malaysia
Anmol Jhamb	Fedora Project	India
Aria Vikram	Open Roberta	India
Artur Grochal	Drupal	Poland
Bartłomiej Pacia	Systers, An AnitaB.org Community	Poland
Ben Houghton	Wikimedia	United Kingdom
Benjamin Amos	The Terasology Foundation	United Kingdom
Chamindu Amarasinghe	SCoRe Lab	Sri Lanka
Danny Lin	CCExtractor Development	United States
Diogo Fernandes	Apertium	Luxembourg
Divyansh Agarwal	AOSSIE	India
Duc Minh Nguyen	Metabrainz Foundation	Vietnam
Dylan Iskandar	Liquid Galaxy	United States
Emilie Ma	Liquid Galaxy	Canada
Himanshu Sekhar Nayak	BRL-CAD	India
Jayaike Ndu	CloudCV	Nigeria
Jeffrey Liu	BRL-CAD	United States
Joseph Semrai	SCoRe Lab	United States
Josh Heng	Circuitverse.org	United Kingdom
Kartik Agarwala	The ns-3 Network Simulator project	India
Kartik Singhal	AOSSIE	India
Kaustubh Maske Patil	CloudCV	India
Kim Fung	The Julia Programming Language	United Kingdom
Kumudtiha Karunarathna	FOSSASIA	Sri Lanka
M.Anantha Vijay	Circuitverse.org	India
Maathavan Nithiyananthan	Apertium	Sri Lanka
Manuel Alcaraz Zambrano	Wikimedia	Spain
Naman Modani	Copyleft Games	India
Navya Garg	OSGeo	India
Neel Gopaul	Drupal	Mauritius
Nils André	CCExtractor Development	United Kingdom
Paraxor	Fedora Project	United Arab Emirates
Paweł Sadowski	OpenWISP	Poland
Pola Łabędzka	Systers, An AnitaB.org Community	Poland
Pranav Karthik	FOSSASIA	Canada
Pranay Joshi	OSGeo	India
Prathamesh Mutkure	OpenMRS	India
Pratish Rai	R Project for Statistical Computing	India
Pun Waiwitlikhit	The Mifos Initiative	Thailand
Rachit Gupta	The Mifos Initiative	India
Rafał Bernacki	Haiku	Poland
Ray Ma	OpenMRS	New Zealand
Rick Wierenga	TensorFlow	Netherlands
Sayam Sawai	JBoss Community	India
Sidaarth “Sid” Sabhnani	Copyleft Games	United States
Srevin Saju	Sugar Labs	Bahrain
Susan He	Open Roberta	Australia
Swapneel Singh	The Terasology Foundation	India
Sylvia Li	Metabrainz Foundation	New Zealand
Umang Majumder	R Project for Statistical Computing	India
Uzay Girit	Public Lab	France
Vladimir Mikulic	Public Lab	Bosnia and Herzegovina
William Zhang	TensorFlow	United States

Runners Up

And a big kudos to our 58 Runners Up from 20 countries. They will receive a GCI backpack, jacket and a GCI tshirt. The Runners Up are listed alphabetically by First name below:

Name	Organization	Name	Organization
Adev Saputra	Drupal	Kunal Bhatia	Score Lab
Adrian Serapio	R Project for Statistical Computing	Laxya Pahuja	The Mifos Initiative
Alberto Navalón Lillo	Apertium	Łukasz Zbrzeski	Score Lab
Alvii_07	Liquid Galaxy	Madhav Mehndiratta	Fedora Project
Amar Fadil	OpenWISP	Marcus Chong	Sugar Labs
Ananya Gangavarapu	TensorFlow	Mateusz Samkiewicz	JBoss Community
Andrey Shcherbakov	Wikimedia	Maya Farber Brodsky	CCExtractor Development
Antara Bhattacharya	Metabrainz Foundation	Michał Piechowiak	Fedora Project
Anthony Zhou	Public Lab	Moodhunt	Metabrainz Foundation
Bartosz Dokurno	Circuitverse.org	Muhammad Wasif	FOSSASIA
Ching Lam Choi	The Julia Programming Language	name not shown	Haiku
Chirag Bhansali	AOSSIE	Nathan Taylor	Sugar Labs
Chiranjiv Singh Malhi	BRL-CAD	Nishanth Thumma	Open Roberta
Daksha Aeer	Systers, An AnitaB.org Community	Panagiotis Vasilopoulos	Haiku
Devansh Khetan	OpenMRS	Rachin Kalakheti	TensorFlow
Dhanus SL	OSGeo	Regan Iwadha	JBoss Community
Dhhyey Desai	AOSSIE	Ribhav Sharma	OpenMRS
Eric Xue	Copyleft Games	Richard Botez	Open Roberta
Eryk Mikołajek	BRL-CAD	Rishabh Verma	The Mifos Initiative
Hannah Guo	The Terasology Foundation	Rishank Kanaparti	Copyleft Games
Harsh Khandeparkar	Public Lab	Rishi R	R Project for Statistical Computing
Hirochika Matsumoto	CloudCV	Sai Putravu	The ns-3 Network Simulator project
Ilya Maier	Systers, An AnitaB.org Community	Samuel Sloniker	Apertium
Irvan Ayush Chengadu	Drupal	Shivam Rai	OSGeo
Jakub Niklas	The Terasology Foundation	Siddharth Sinha	FOSSASIA
Jun Rong Lam	Circuitverse.org	Soumitra Shewale	The Julia Programming Language
Karol Ołtarzewski	OpenWISP	Stanisław Howard	The ns-3 Network Simulator project
Kripa Kini	Liquid Galaxy	Suryansh Pathak	CloudCV
Krzysztof Krysiński	CCExtractor Development	Taavi Väänänen	Wikimedia

Finalists

And a hearty congratulations to our 58 Finalists from 20 countries. The finalists will win a special GCI jacket and a GCI tshirt. They are listed alphabetically by first name below:

Name	Organization	Name	Organization
Abinav Chari	CloudCV	Musab Kılıç	CCExtractor Development
Andre Christoga Pramaditya	CloudCV	Nail Anıl Örcün	The Terasology Foundation
Anish Agnihotri	OSGeo	Natalie Shapiro	Circuitverse.org
Aryan Gulati	FOSSASIA	Nate Clark	The Terasology Foundation
Ayush Sharma	Fedora Project	Nicholas Gregory	Wikimedia
Ayush Sharma	SCoRe Lab	Nikita Ermishin	OpenWISP
Daniel Oluojomu	JBoss Community	Nishith P	FOSSASIA
Dhruv Baronia	TensorFlow	Oliver Fogelin	R Project for Statistical Computing
Diana Hernandez	Systers, An AnitaB.org Community	Oussama Hassini	The Mifos Initiative
Gambali Seshasai Chaitanya	Apertium	Param Nayar	Copyleft Games
Hao Liu	R Project for Statistical Computing	Peter Terpstra	The ns-3 Network Simulator project
Hardik Jhalani	Systers, An AnitaB.org Community	Piyush Sharma	The Mifos Initiative
Hrishikesh Patil	OpenMRS	Robert Chen	Public Lab
Jackson Lewis	The ns-3 Network Simulator project	Rohan Cherivirala	Open Roberta
Jan Rosa	Wikimedia	Ruixuan Tu	Haiku
Janiru Hettiarachchi	Liquid Galaxy	Saptashwa Mandal	Drupal
Janiru Wijekoon	Metabrainz Foundation	Sashreek Magan	Sugar Labs
Joshua Yang	Apertium	Sauhard Jain	AOSSIE
Kevin Liu	Open Roberta	Sharman Maheshwari	SCoRe Lab
Krishna Rama Rao	AOSSIE	Sumagna Das	BRL-CAD
Li Chen	Fedora Project	Tanvir Singh	OSGeo
Madhav Shekhar Sharma	The Julia Programming Language	Techno-Disaster	CCExtractor Development
Mbah Javis	TensorFlow	Thusal Ranawaka	BRL-CAD
Merul Dhiman	Liquid Galaxy	Vivek Mishra	Copyleft Games
Michelle (Wai Man) Lo	OpenMRS	Yu Fai Wong	JBoss Community
Mihir Bhave	OpenWISP	Yuqi Qiu	Metabrainz Foundation
Mohit S A	Circuitverse.org	Zakhar Vozmilov	Public Lab
Mokshit Jain	Drupal	Zakiyah Hasanah	Sugar Labs
Mudit Somani	The Julia Programming Language	Zoltán Szatmáry	Haiku

Our 794 mentors, the heart and soul of GCI, are the reason the contest thrives. Mentors volunteer their time to help these bright students become open source contributors. They spend hundreds of hours during their holiday breaks answering questions, reviewing submitted tasks, and welcoming the students to their communities. GCI would not be possible without their dedication, patience and tireless efforts.

We will post more numbers from GCI 2019 here on the Google Open Source Blog over the next few weeks, so please stay tuned.

Congratulations to our Grand Prize Winners, Runners Up, Finalists, and all of the students who spent the last couple of months learning about, and contributing to, open source. We hope they will continue their journey in open source!

By Stephanie Taylor, Google Open Source

Source: Google Open Source Blog

Announcing the 2019 second cycle Google Open Source Peer Bonus winners

We are happy to announce the 2019 second cycle winners of the Google Open Source Peer Bonus! This cohort represents the largest number of winners to date, with 115 awardees from 26 countries, including: Australia, Austria, Belgium, Canada, China, Colombia, Denmark, Finland, France, Germany, India, Ireland, Israel, Italy, Japan, Republic of Korea, Mexico, Netherlands, Poland, Portugal, Russia, Spain, Sweden, Switzerland, United Kingdom, and the United States.

The Google Open Source Peer Bonus is an award for open source contributors that are not employed by Google, but nominated by Googlers for their exceptional contributions to open source. Initially, the program began as a way to reward developers; however, has evolved into one that supports all contributors of open source from technical writers and designers to operations.

Below is the list of winners who gave us permission to thank them publicly:

Winner	Open Source Project
Miina Sikk	AMP Plugin for WordPress, AMP Stories
Ryan Kienstra	AMP Plugin for WordPress, AMP Stories
Joost Koehoorn	Angular
Ash Berlin-Taylor	Apache Airflow
Jarek Potiuk	Apache Airflow
Kamil Bregula	Apache Airflow
Ismael Mejia	Apache Beam, Avro
Jose Fonseca	APITrace
Lars Zawallich	Appleseed
Maximilian Michels	Beam, Flink
Roman Lebedev	benchmark
Ben Manes	Caffeine
Yang Luo	casbin; npcap; nmap
Sedat Dilek	ClangBuiltLinux
Nathan Chancellor	ClangBuiltLinux
Pablo Galindo Salgado	CPython
Karthikeyan Singaravelan	CPython
Tobe Osakwe	Dart build system
Drew Banin	DBT
Michael Johnson	Discourse - Google+ Import Script
Philip Rebohle	dxvk
Mike Blandford	er9x/ersky9x radio firmware
Simon Edwards	Extraterm
Ethan Lee	FNA, FAudio, SDL2
Vasco Asturiano	force-graph
Alexandre Alapetite	FreshRSS
Jenny Bryan	gargle: an R package for calling Google APIs from R, including auth.
Patrick Mulhall	Gerrit Code Review
Gert van Dijk	Gerrit Code Review
Rafael Ascensão	Git
Arnold Robbins	GNU awk
Alberto Donizetti	Go
Alessandro Arzilli	Go
Tobias Klauser	Go
Emmanuel Odeke	Go
Brian Kessler	Go
Giovanni Bajo	Go compiler
Glenn Lewis	go-github
Cedric Staub	go-jose
Paul Jolly	go-tools
Daniel Martí	go-tools
Dominik Honnef	go-tools
Mulr Manders	go-tools
Billie Cleek	go-tools
Ramya Rao	go-tools
John Paton	Google Cloud Python client libraries and Pandas GBQ
Krystian Kuźniarek	googletest
Gernot Vormayr	GoPacket
Johan Brandhorst	grpc-gateway
Mike Jumper	Guacamole
Willy Tarreau	HAProxy
Mike McQuaid	HomeBrew
Joachim Viide	HTM
Serguei Bezverkhi	nftables
Kalle Persson	Inbox Theme for Gmail
Artem Gusev	ios-webkit-debug-proxy
Morven Cao	Istio Operator
Karol Lassak	Jenkins GCE plugin
Sebastien Goasguen	Knative
Joan Edwards	Knative
Markus Thömmes	Knative
Ashleigh Brennan	Knative
Cornelius Weig	Krew
Josh Bottum	Kubeflow
Kam Kasravi	kubeflow/kubeflow, kubeflow/manifests
Rune Mehlsen	lit-analyzer
Roman Lebedev	LLVM
Jonas Bernoulli	Magit
Jaeyoung Tae	Material Components Web/Material Components Web React
Maximilian Hils	mitmproxy
Brijesh Bittu	monaco-vim
Rich Felker	musl
Tim Neutkens	Next.js
Gordon Lyon	nmap
Ryan Gordon	Numerous open source games and engines
Carlos Alberto Cortez	OpenTelemetry
Roch Devost	OpenTelemetry
Ted Young	OpenTelemetry
Joshua MacDonald	OpenTelemetry/opentelemetry-go
Daniel Khan	OpenTelemetry
Brandon Gonzalez	OpenTelemetry
Valentin Marchaud	OpenTelemetry and OpenCensus
Olivier Albertini	OpenTelemetry and OpenCensus
Armin Ruech	OpenTelemetry-Java
Tyler Benson	OpenTelemetry-Java
Paulo Janotti	OpenTelemetry-Service
Akshay Anand	Oppia
James Marca	OR-Tools
Max Dymond	OSS-Fuzz
Ignazio Palmisano	OWL API
Marcos Caceres	Payment Request API
Jovi De Croock	Preact
Leah Ullmann	Preact
Hervé Bredin	pyannote
Tomohiko Kinebuchi	Python official document Japanese translation project
Gabriela de Queiroz	R
Baldur Karlsson	RenderDoc
Fabian Henneke	Secure Shell
Sam Aaron	Sonic Pi
Greg Roth	Spiregg (SPIR-V Backend in DirectXShaderCompiler)
Erica Sadun	Swift Evolution
Sean Morgan	tensorflow/addons
Yong Tang	tensorflow/io
Shree Kumar	Tesseract
Seth Larson	urllib3
Michael Tüxen	usrsctp
Felix Weinrank	usrsctp
Qiuyi Zhang	V8
Sébastien Helleu	Weechat
Wesley Shields	YARA

Congratulations to the winners! Open source is a shared effort that is only possible with everyone’s commitment to build better solutions for the world. Thank you for partnering with us in this mission. We look forward to more collaborations in the months to come!

By María Cruz, Google Open Source

Source: Google Open Source Blog

BazelCon 2019

Cross-posted from the original BazelCon 2019 recap .

Last month the Google Bazel team hosted its largest ever Bazel user conference: BazelCon 2019, an annual gathering of the community surrounding the Bazel build system. This is the main Bazel event of the year which serves as an opportunity for Bazel contributors, maintainers, and users to meet and learn from each other, present Bazel migration stories, educate new users, and collaborate together on the future of Bazel.

BazelCon 2019 by the Numbers

400+ attendees (2x increase over BazelCon 2018)
125 organizations represented including Microsoft, Spotify, Uber, Apple, Cruise, EA, Lyft, Tesla, SpaceX, SAP, Bloomberg, Wix, Etsy, BMW and others
26 full-length talks and 15 lightning talks by members of the external community and Googlers
16 hours of Q&A during Office Hours with Bazel team members
45 Bazel Bootcamp attendees
5 Birds of a Feather sessions on iOS, Python, Java, C++ and Front-end Bazel rules
182 users in the #bazelcon2019 Slack channel

BazelCon 2019 Full Length Talks

The full playlist also includes lighting talks.

Keynote: The Role of Catastrophic Failure in Software Design – Jeff Atwood (Stack Overflow/Discourse)
Bazel State of the Union – John Field and Dmirty Lomov (Google)
Building Self Driving Cars with Bazel – Axel Uhlig and Patrick Ziegler (BMW Group)
Moving to a Bazel-based CI system: 6 Learnings – Or Shachar (Wix)
Bazel Federation – Florian Weikert (Google)
Lessons from our First 100,000 Bazel Builds – Kevin Gessner (Etsy)
Migrating Lyft-iOS to Bazel – Keith Smiley and Dave Lee (Lyft)
Test Selection – Benjamin Peterson (Dropbox)
Porting iOS Apps to Bazel – Oscar Bonilla (LinkedIn)
Boosting Dev Box Performance with Remote Execution for Non-Hermetic Build Engines – Erik Mavrinac (Microsoft)
Building on Key - Keeping your Actions and Remote Executions in Tune – George Gensure (UberATG)
Bazel remote execution API vs Goma – Mostyn Bramley-Moore (Vewd Software)
Integrating with ease: leveraging BuildStream interaction with Bazel build for consistent results – Daniel Silverstone (Codethink)
Building Self-Driving Cars with Bazel – Michael Broll and Nico Valigi (Cruise)
Make local development (with Bazel) great again! – Ittai Zeidman (Wix)
Gradle to Bazel – Chip Dickson and Charles Walker (SUM Global Technology)
Bazel Bootcamp – Kyle Cordes (Oasis Digital)
Bazel migration patterns: how to prove business value with a small investment – Alex Eagle and Greg Magolan (Google)
Dynamic scheduling: Fastest clean and incremental builds – Julio Merino (Google)
Building a great CI with Bazel – Philipp Wollermann (Google)

By Misha Narinsky, Bazel Team

Source: Google Open Source Blog

Google Summer of Code 2020 is now open for mentor organization applications!

We are looking for open source projects and organizations to participate in the 16th annual Google Summer of Code (GSoC)! GSoC is a global program that draws university student developers from around the world to contribute to open source projects. Each student will spend three months working on a coding project with the support of volunteer mentors from participating open source organizations, mid-May to mid-August.

Last year, 1,276 students worked with 206 open source organizations and over 2,000 mentors. Organizations include small and medium sized open source projects, as well as a number of umbrella organizations with many sub-projects under them (Apache Software Foundation, Python Software Foundation, etc.).

Our 2020 goal is to accept more organizations into their first GSoC than ever before! We ask that veteran organizations refer other organizations they think would be a good fit to participate in GSoC.

You can apply to be a mentoring organization for GSoC starting today. The deadline to apply is February 5 at 19:00 UTC. Organizations chosen for GSoC 2020 will be publicly announced on February 20.

Please visit the program site for more information on how to apply and review the detailed timeline of important deadlines. We also encourage you to check out the Mentor Guide and our short video on why open source projects apply to be a part of the program.

Best of luck to all of the open source mentoring organization applicants!

By Stephanie Taylor, Google Open Source

Source: Google Open Source Blog

Securing open source: How Google supports the new Kubernetes bug bounty

At Google, we care deeply about the security of open-source projects, as they’re such a critical part of our infrastructure—and indeed everyone’s. Today, the Cloud-Native Computing Foundation (CNCF) announced a new bug bounty program for Kubernetes that we helped create and get up and running. Here’s a brief overview of the program, other ways we help secure open-source projects and information on how you can get involved.

Launching the Kubernetes bug bounty program

Kubernetes is a CNCF project. As part of its graduation criteria, the CNCF recently funded the project’s first security audit, to review its core areas and identify potential issues. The audit identified and addressed several previously unknown security issues. Thankfully, Kubernetes already had a Product Security Committee, including engineers from the Google Kubernetes Engine (GKE) security team, who respond to and patch any newly discovered bugs. But the job of securing an open-source project is never done. To increase awareness of Kubernetes’ security model, attract new security researchers, and reward ongoing efforts in the community, the Kubernetes Product Security Committee began discussions in 2018 about launching an official bug bounty program.

Find Kubernetes bugs, get paid

What kind of bugs does the bounty program recognize? Most of the content you’d think of as ‘core’ Kubernetes, included at https://github.com/kubernetes, is in scope. We’re interested in common kinds of security issues like remote code execution, privilege escalation, and bugs in authentication or authorization. Because Kubernetes is a community project, we’re also interested in the Kubernetes supply chain, including build and release processes that might allow a malicious individual to gain unauthorized access to commits, or otherwise affect build artifacts. This is a bit different from your standard bug bounty as there isn’t a ‘live’ environment for you to test—Kubernetes can be configured in many different ways, and we’re looking for bugs that affect any of those (except when existing configuration options could mitigate the bug). Thanks to the CNCF’s ongoing support and funding of this new program, depending on the bug, you can be rewarded with a bounty anywhere from $100 to $10,000.

The bug bounty program has been in a private release for several months, with invited researchers submitting bugs and to help us test the triage process. And today, the new Kubernetes bug bounty program is live! We’re excited to see what kind of bugs you discover, and are ready to respond to new reports. You can learn more about the program and how to get involved here.

Dedicated to Kubernetes security

Google has been involved in this new Kubernetes bug bounty from the get-go: proposing the program, completing vendor evaluations, defining the initial scope, testing the process, and onboarding HackerOne to implement the bug bounty solution. Though this is a big effort, it’s part of our ongoing commitment to securing Kubernetes. Google continues to be involved in every part of Kubernetes security, including responding to vulnerabilities as part of the Kubernetes Product Security Committee, chairing the sig-auth Kubernetes special interest group, and leading the aforementioned Kubernetes security audit. We realize that security is a critical part of any user’s decision to use an open-source tool, so we dedicate resources to help ensure we’re providing the best possible security for Kubernetes and GKE.

Although the Kubernetes bug bounty program is new, it isn’t a novel strategy for Google. We have enjoyed a close relationship with the security research community for many years and, in 2010, Google established our own Vulnerability Rewards Program (VRP). The VRP provides rewards for vulnerabilities reported in GKE and virtually all other Google Cloud services. (If you find a bug in GKE that isn’t specific to Kubernetes core, you should still report it to the Google VRP!) Nor is Kubernetes the only open-source project with a bug bounty program. In fact, we recently expanded our Patch Rewards program to provide financial rewards both upfront and after-the-fact for security improvements to open-source projects.

Help keep the world’s infrastructure safe. Report a bug to the Kubernetes bug bounty, or a GKE bug to the Google VRP.

By Maya Kaczorowski, Product Manager, Container Security; and Aaron Small, Product Manager, GKE On-Prem security

Source: Google Open Source Blog

Wombat Dressing Room, an npm publication proxy on GCP

We're excited to announce that we're open sourcing the service we use on the Google Cloud Client Libraries team for handling npm publications, it's called Wombat Dressing Room. Wombat Dressing Room provides features that help npm work better with automation, while maintaining good security practices.

A tradeoff is often made for automation

npm has top notch security features: CIDR-range restricted tokens, publication notifications, and two-factor authentication, to name a few. Of these, a feature critical to protecting publications is two-factor authentication (2FA).

2FA requires that you provide two pieces of information when accessing a protected resource: "something you know" (for instance, a password); and "something you have" (for instance, a code from an authenticator app). With 2FA, if your password is exposed, an attacker still can't publish a malicious package (unless they also steal the "something you have".)

On my team, a small number of developers manage over 75 Node.js libraries. We see automation as key to making this possible: we've written tools that automate releases, validate license headers, ensure contributors have signed CLAs; we adhere to the philosophy, automate all the things!

It's difficult to automate the step of entering a code off a cellphone. As a result, folks often opt to turn off 2FA in their automation.

What if you could have both automation and the added security of 2FA? This is why we built the Wombat Dressing Room.

A different approach to authentication

With Wombat Dressing Room, rather than an individual configuring two factor authentication in an authenticator app, 2FA is managed by a shared proxy server. Publications are then directed at the Wombat Dressing Room proxy, which provides the following security features:

Per-package publication tokens.

Wombat Dressing Room can generate authentication tokens tied to repositories on GitHub. These tokens are tied to a single GitHub repository, which the user generating the token must have push permissions for.

If a per-package publication token is leaked, an attacker can only hijack the single package that the token is associated with.

Limited lifetime tokens

Wombat Dressing Room can also generate access tokens that have a 24 hour lifespan. In this model, a leaked token is only vulnerable until the 24 hour lifespan is hit.

GitHub Releases as 2FA

In this authentication model, a package can only be published to npm if a GitHub release with a corresponding tag is found on GitHub.

This introduces a true "second factor", as users must prove they have access to both Wombat Dressing Room and the repository on GitHub.

Getting started with Wombat Dressing Room

We've been using Wombat Dressing Room to manage Google Cloud client libraries for over a year now in our fully automated library release process. As of today, the source is available for everyone on GitHub under an Apache 2.0 license.

Wombat Dressing Room runs on Google App Engine, and instructions on getting it up and running can be found in its README.md.

It's my hope that this will help other folks in the community, simplify and automate their release process, while minimizing the attack surface of their libraries.

By Benjamin Coe, works on Node.js client libraries for the Google Cloud Platform, and was the third engineer at npm, Inc.

Source: Google Open Source Blog

Season of Docs Announces Results of 2019 Program

Season of Docs has announced the 2019 program results for standard-length projects. You can view a list of successfully completed technical writing projects on the website along with their final project reports.

During the program, technical writers spent a few months working closely with an open source community. They brought their technical writing expertise to improve the project's documentation while the open source projects provided mentors to introduce the technical writers to open source tools, workflows, and the project's technology.

The technical writers and their mentors did a fantastic job with the inaugural year of Season of Docs! Participants represented countries across all continents except for Antarctica! 36 technical writers out of 41 successfully completed their standard-length technical writing projects, and there are eight long-running projects in progress that are expected to finish in February.

91.7% of the mentors had a positive experience and want to mentor again in future Season of Docs cycles
88% of the technical writers had a positive experience
96% plan to continue contributing to open source projects
100% of the technical writers said that Season of Docs helped improved their knowledge of code and/or open source

Technical writing projects ranged from beginners' guides and tutorials to API and reference documentation; all of which benefited a diverse set of open source projects that included programming languages, software, compiler infrastructure, operating systems, software libraries, hardware, science, healthcare, and more. Take a look at the list of successful projects to see the wide range of subjects covered!

What is next?

The long-running projects are still in progress and finish in February 2020. Technical writers participating in these long-running projects submit their project reports by Feb. 25, and the writer and mentor evaluations are due by Feb. 28. Successfully completed long-running technical writing projects are then published on the results page on March 6, 2020.

If you were excited about participating, please do write social media posts. See the promotion and press page for images and other promotional materials you can include, and be sure to use the tag #SeasonOfDocs when promoting your ideas on social media. To include the tech writing and open source communities, add #WriteTheDocs, #techcomm, #TechnicalWriting, and #OpenSource to your posts.

Stay tuned for information about Season of Docs 2020—watch for posts in this blog and sign up for the announcements email list.

By Andrew Chen, Google Open Source and Sarah Maddox, Cloud Docs

Source: Google Open Source Blog

W3C Trace Context Specification: What it Means for You

Since the first days of Google Cloud Platform (GCP), Google has been at the forefront of making your applications more observable. Beyond Stackdriver, our most visible impact in this space is OpenTelemetry, which we initiated in 2017 (as OpenCensus) and has grown into a huge community that includes the majority of APM / monitoring vendors and cloud platforms.

While OpenTelemetry allows developers to easily capture distributed traces and metrics from their own services, there’s also a need to trace requests as they propagate through components that developers don’t directly control, like managed services, load balancers, network hardware, etc. To solve this we co-defined a prototype HTTP header that these components can rely on, gathered partners, and moved the work into the W3C.

This work is now complete, and the W3C Trace Context format is now an official standard. Once implemented in GCP, this will make our services even easier to manage, both with Stackdriver and other third party distributed tracing tools. We explain more in the official post on the W3C blog, which I’ve copied below:

The W3C Distributed Tracing working group has moved the Trace Context specification to the next maturity level. The specification is already being adopted and implemented by many platforms and SDKs. This article describes the Trace Context specification and how it improves troubleshooting and monitoring of modern distributed apps.

W3C Trace Context specification defines the format for propagating distributed tracing context between services. Distributed tracing makes it easy for developers to find the causes of issues in highly-distributed microservices applications by tracking how a single interaction was processed across multiple services. Each step of a trace is correlated through an ID that is passed between services, and W3C Trace Context now defines a standard for these context propagation headers.

Until now, different tracing systems have defined their own headers. Examples include Zipkin’s B3 format and X-Google-Cloud-Trace. Adopting a common context propagation format has been long desired by developers, APM vendors, and cloud platform hosts, as compatibility provides numerous benefits:

Web and RPC frameworks that use this standard to provide context propagation out of the box will also offer cross-service log correlation, even for developers who haven’t set up distributed tracing.
API producers can record the trace IDs of requests from API consumers and provide additional spans or metadata to their customers for a given traced request. Producers can also correlate customer trace IDs to internal traces when debugging technical issues raised by consumers.
Networking infrastructure (proxies, load balancers, routers, etc.) can both ensure that context propagation headers are not removed from requests passing through them, and can record spans or logs for a given trace, without having to support multiple vendor-specific formats. Potential examples of these include router appliances, cloud load balancers, and sidecar proxies like Envoy.
Instrumentation can be further decoupled from a developer’s choice of APM vendor. For example, using both OpenTelemetry and a given vendor’s agents, a developer can instrument different services in an application, and traces will flow through the system and be processed correctly by the vendor’s backend.
Web browsers and other clients can use these identifiers to correlate their telemetry with traces collected from backend services. This functionality is currently being defined.

To address this effort, a group of cloud providers, open source contributors, and APM vendors started defining a standard HTTP context propagation header that would replace their homegrown formats. This specification has been discussed and iterated on over the past two years, and the group working on it has grown significantly over that time. Sponsors include Google, Microsoft, Dynatrace, and New Relic (W3C members), and the group was officially moved into the W3C in 2018 for the work to proceed under the guidance of an official standards body and to spur even greater adoption.

TraceContext has since been adopted by OpenTelemetry (which enables it by default and also serves as the reference implementation), Azure services, Dynatrace, Elastic, Google Cloud Platform, Lightstep, and New Relic. We are tracking adoption in this list.

This first phase of work has focused on HTTP, as it is commonly used and has no built-in affordances for trace context propagation (gRPC and some newer RPC systems do). The same group of committee members are also working to define trace context propagation in other formats, starting with AMQP and MQTT for IoT; other upcoming topics include context propagation from clients and web browsers.

By Morgan McLean, OpenTelemetry + Stackdriver