Tag Archives: open source release

ETC2Comp: fast texture compression for games and VR

For mobile game and VR developers the ETC2 texture format has become an increasingly valuable tool for texture compression. It produces good on-GPU sizes (it stays compressed in memory) and higher quality textures (compared to its ETC1 counterpart).

These benefits come with a significant downside, however: ETC2 textures take significantly longer to compress than their ETC1 counterparts. As adoption of the ETC2 format increases in a project, so do build times. As such, developers have had to make the classic choice between quality and time.

We wanted to eliminate the need for developers to make that choice, so we’ve released ETC2Comp, a fast and high quality ETC2 encoder for games and VR developers.

ETC2 takes a long time to compress textures because the format defines a large number of possible combinations for encoding a block in the texture. To find the most perfect, highest quality compressed image means brute-forcing this incredibly large number of combinations, which clearly is not a time efficient option.

We designed ETC2Comp to get the same visual results at much faster speeds by deploying a few optimization techniques:

Directed Block Search. Rather than a brute-force search, ETC2Comp uses a much more limited, targeted search for the best encoding for a given block. ETC2Comp comes with a precomputed set of archetype blocks, where each archetype is associated with a sorted list of the ETC2 block format types that provide its best encodings. During the actual compression of a texture, each block is initially assigned an archetype, and multiple passes are done to test the block against its block format list to find the best encoding. As a result, the best option can be found much quicker than with a brute-force method.

Full effort setting. During each pass of the encoding process, all the blocks of the image are sorted by their visual quality (worst-looking to best-looking). ETC2Comp takes an effort parameter whose value specifies what percentage of the blocks to update during each pass of encoding. An effort value of 25, for instance, means that on each pass, only the 25% worst looking blocks are tested against the next format in their archetypes' format-chains. The result is a tradeoff between optimizing blocks that already look good, and the time it takes to do it.

Highly multi-threaded code. Since blocks can be evaluated independently during each pass, it’s straightforward to apply multithreading to the work. During encoding ETC2comp can take advantage of available parallel threads, and it even accepts a jobs parameter, where you can define exactly the number of threads you’d like it to use... in case you have a 256 core machine.

Check out the code on GitHub to get started with ETC2Comp and let us know what you think. You can use the tool from the command line or embed the C++ library in your project. If you want to know more about what’s going on under the hood, check out this blog post.

By Colt McAnlis, Developer Advocate

Open source visualization of GPS displacements for earthquake cycle physics

The Earth’s surface is moving, ever so slightly, all the time. This slow, small, but persistent movement of the Earth's crust is responsible for the formation of mountain ranges, sudden earthquakes, and even the positions of the continents. Scientists around the world measure these almost imperceptible movements using arrays of Global Navigation Satellite System (GNSS) receivers to better understand all phases of an earthquake cycle—both how the surface responds after an earthquake, and the storage of strain energy between earthquakes.

To help researchers explore this data and better understand the Earthquake cycle, we are releasing a new, interactive data visualization which draws geodetic velocity lines on top of a relief map by amplifying position estimates relative to their true positions. Unlike existing approaches, which focus on small time slices or individual stations, our visualization can show all the data for a whole array of stations at once. Open sourced under an Apache 2 license, and available on GitHub, this visualization technique is a collaboration between Harvard’s Department of Earth and Planetary Sciences and Google's Machine Perception and Big Picture teams.

Our approach helps scientists quickly assess deformations across all phases of the earthquake cycle—both during earthquakes (coseismic) and the time between (interseismic). For example, we can see azimuth (direction) reversals of stations as they relate to topographic structures and active faults. Digging into these movements will help scientists vet their models and their data, both of which are crucial for developing accurate computer representations that may help predict future earthquakes.

Classical approaches to visualizing these data have fallen into two general categories: 1) a map view of velocity/displacement vectors over a fixed time interval and 2) time versus position plots of each GNSS component (longitude, latitude and altitude).

Examples of classical approaches. On the left is a map view showing average velocity vectors over the period from 1997 to 2001[1]. On the right you can see a time versus eastward (longitudinal) position plot for a single station.

Each of these approaches have proved to be informative ways to understand the spatial distribution of crustal movements and the time evolution of solid earth deformation. However, because geodetic shifts happen in almost imperceptible distances (mm) and over long timescales, both approaches can only show a small subset of the data at any time—a condensed average velocity per station, or a detailed view of a single station, respectively. Our visualization enables a scientist to see all the data at once, then interactively drill down to a specific subset of interest.

Our visualization approach is straightforward; by magnifying the daily longitude and latitude position changes, we show tracks of the evolution of the position of each station. These magnified position tracks are shown as trails on top of a shaded relief topography to provide a sense of position evolution in geographic context.

To see how it works in practice, let’s step through an an example. Consider this tiny set of longitude/latitude pairs for a single GNSS station, with the differing digits shown in bold:


Day IndexLongitudeLatitude
0139.0699040734.949757897
1139.0699040034.949757882
2139.0699041334.949757941
3139.0699040934.949757921
4139.0699041334.949757904

If we were to draw line segments between these points directly on a map, they’d be much too small to see at any reasonable scale. So we take these minute differences and multiply them by a user-controlled scaling factor. By default this factor is 105.5 (about 316,000x).


To help the user identify which end is the start of the line, we give the start and end points different colors and interpolate between them. Blue and red are the default colors, but they’re user-configurable. Although day-to-day movement of stations may seem erratic, by using this method, one can make out a general trend in the relative motion of a station.
Close-up of a single station’s movement during the three year period from 2003 to 2006.
However, static renderings of this sort suffer from the same problem that velocity vector images do; in regions with a high density of GNSS stations, tracks overlap significantly with one another, obscuring details. To solve this problem, our visualization lets the user interactively control the time range of interest, the amount of amplification and other settings. In addition, by animating the lines from start to finish, the user gets a real sense of motion that’s difficult to achieve in a static image.

We’ve applied our new visualization to the ~20 years of data from the GEONET array in Japan. Through it, we can see small but coherent changes in direction before and after the great 2011 Tohoku earthquake.
GPS data sets (in .json format) for both the GEONET data in Japan and the Plate Boundary Observatory (PBO) data in the western US are available at earthquake.rc.fas.harvard.edu.
This short animation shows many of the visualization’s interactive features. In order:
  1. Modifying the multiplier adjusts how significantly the movements are magnified.
  2. We can adjust the time slider nubs to select a particular time range of interest.
  3. Using the map controls provided by the Google Maps JavaScript API, we can zoom into a tiny region of the map.
  4. By enabling map markers, we can see information about individual GNSS stations.
By focusing on a stations of interest, we can even see curvature changes in the time periods before and after the event.
Station designate 960601 of Japan’s GEONET array during the period from 2006 to 2012. Movement magnified 105.1 times (126,000x).
To achieve fast rendering of the line segments, we created a custom overlay using THREE.js to render the lines in WebGL. Data for the GNSS stations is passed to the GPU in a data texture, which allows our vertex shader to position each point on-screen dynamically based on user settings and animation.

We’re excited to continue this productive collaboration between Harvard and Google as we explore opportunities for groundbreaking, new earthquake visualizations. If you’d like to try out the visualization yourself, follow the instructions at earthquake.rc.fas.harvard.edu. It will walk you through the setup steps, including how to download the available data sets. If you’d like to report issues, great! Please submit them through the GitHub project page.

Acknowledgments

We wish to thank Bill Freeman, a researcher on Machine Perception, who hatched the idea and developed the initial prototypes, and Fernanda Viégas and Martin Wattenberg of the Big Picture team for their visualization design guidance.

References

[1] Loveless, J. P., and Meade, B. J. (2010). Geodetic imaging of plate motions, slip rates, and partitioning of deformation in Japan, Journal of Geophysical Research.

By Jimbo Wilson, Software Engineer, Big Picture Team and Brendan Meade, Professor, Harvard Department of Earth and Planetary Sciences

Celebrating TensorFlow’s First Year

Originally posted on Google Research blog

It has been an eventful year since the Google Brain Team open-sourced TensorFlow to accelerate machine learning research and make technology work better for everyone. There has been an amazing amount of activity around the project: more than 480 people have contributed directly to TensorFlow, including Googlers, external researchers, independent programmers, students, and senior developers at other large companies. TensorFlow is now the most popular machine learning project on GitHub.


With more than 10,000 commits in just twelve months, we’ve made numerous performance improvements, added support for distributed training, brought TensorFlow to iOS and Raspberry Pi, and integrated TensorFlow with widely-used big data infrastructure. We’ve also made TensorFlow accessible from Go, Rust, and Haskell, released state-of-the-art image classification models – and answered thousands of questions on GitHub, StackOverflow, and the TensorFlow mailing list along the way.

At Google, TensorFlow supports everything from large-scale product features to exploratory research. We recently launched major improvements to Google Translate using TensorFlow (and Tensor Processing Units, which are special hardware accelerators for TensorFlow). Project Magenta is working on new reinforcement learning-based models that can produce melodies, and a visiting PhD student recently worked with the Google Brain team to build a TensorFlow model that can automatically interpolate between artistic styles. DeepMind has also decided to use TensorFlow to power all of their research – for example, they recently produced fascinating generative models of speech and music based on raw audio.

We’re especially excited to see how people all over the world are using TensorFlow. For example:

  • Australian marine biologists are using TensorFlow to find sea cows in tens of thousands of hi-res photos to better understand their populations, which are under threat of extinction. 
  • An enterprising Japanese cucumber farmer trained a model with TensorFlow to sort cucumbers by size, shape, and other characteristics.
  • Radiologists have adapted TensorFlow to identify signs of Parkinson’s disease in medical scans.
  • Data scientists in the Bay Area have rigged up TensorFlow and the Raspberry Pi to keep track of the Caltrain.

We’re committed to making sure TensorFlow scales all the way from research to production and from the tiniest Raspberry Pi all the way up to server farms filled with GPUs or TPUs. But TensorFlow is more than a single open-source project – we’re doing our best to foster an open-source ecosystem of related software and machine learning models around it:

  • The TensorFlow Serving project simplifies the process of serving TensorFlow models in production.
  • TensorFlow “Wide and Deep” models combine the strengths of traditional linear models and modern deep neural networks. 
  • For those who are interested in working with TensorFlow in the cloud, Google Cloud Platform recently launched Cloud Machine Learning, which offers TensorFlow as a managed service.

Furthermore, TensorFlow’s repository of models continues to grow with contributions from the community, with more than 3000 TensorFlow-related repositories are listed on GitHub alone! To participate in the TensorFlow community, you can follow our new Twitter account (@tensorflow), find us on GitHub, ask and answer questions on StackOverflow, and join the community discussion list.

Thanks very much to all of you who have already adopted TensorFlow in your cutting-edge products, your ambitious research, your fast-growing startups, and your school projects; special thanks to everyone who has contributed directly to the codebase. In collaboration with the global machine learning community, we look forward to making TensorFlow even better in the years to come!

By Zak Stone, Product Manager for TensorFlow

Podcast to YouTube: an open source story

Almost a year ago Mark Mandel and I started the Google Cloud Platform Podcast, a weekly podcast that covers topics related to Google Cloud Platform, among other things. It's been a pretty successful podcast, but that’s not what I want to write about today.

After a while we started receiving emails from listeners that wanted to access our podcast on YouTube. Even though this might seem strange for those that love podcasts and have their favorite app on their phones, we decided that the customer is always right: we should post every episode to YouTube.

Specifications

Ok, so … how? Well, to create a video I need to merge the mp3 audio from an episode with a static image. Let's include the title of the episode and the Google Cloud Platform Podcast logo.


But once we post the video to YouTube we're going to need more than that! We need a description, some tags, and probably a link to the episode (SEO FTW!).

Where can we get that information from? Let's think about this for a minute. Where are others getting this information from? The RSS feed! Would it be possible to create a tool to which I could say "post the video for episode 46" and a couple minutes later the video appeared on YouTube? That'd be awesome! Let's do that!

Architecture

The application I wrote parses an RSS feed and given the episodes to publish it downloads the metadata and audio for an episode, generates the corresponding videos, and pushes them to YouTube.
Diagram of the flow of data in podcast-to-youtube
The hardest parts here are the creation of the image and the video. The rest is sending HTTP requests right and left.

Image Maker: rendering images in pure Go

After trying a couple of different tools I decided that the easiest was to create the image from scratch in Go using the image package from the standard library and a freetype library available on GitHub.

Probably the most fun part was to be able to choose a font that would make the title fit the image correctly regardless of the length in characters. I ended up creating a loop that:
  • chooses a font and measures the width of the resulting text
  • if it's too wide, decreases the font size by one and repeats.
Surprisingly, for me, this is actually a pretty common practice!

It is also worth mentioning the way I test the package: Using a standard image that I compare to the one generated by the package, then showing a "diff" image where all the pixels that differ are highlighted in red.
Diff image generated when using a wrong DPI.
The code for this package is available here.

Video maker: ffmpeg is awesome

From the beginning I knew I would end up using ffmpeg to create my video. Why? Well, because it is as simple as running this command:

$ ffmpeg -i image.png -i audio.mp3 video.mp4

Easy right? Well, this is once ffmpeg has been installed and correctly configured, which is actually not that simple and would make this tool hard to install on any machine.

That's why the whole tool runs on Docker. Docker is a pretty widespread technology, and thanks to Makefile I'm able to provide a tool that can be run like this:

$ make run

Conclusion

It took me a couple of days to write the tool and get it to a point where I could open source it, but it was totally worth it. I know that others will be able to easily reuse it, or even extend it. Who knows, maybe this should be exposed as a web application so anyone can use it, no Docker or Makefile needed!

I am currently using this tool weekly to upload the Google Cloud Platform Podcast episodes to this playlist, and you can find the whole code on this GitHub repository.

Any questions? I'm @francesc on Twitter.

By Francesc Campoy, Developer Advocate

Using TensorFlow and JupyterHub in Classrooms

We’ve published a new solution and a companion GitHub repository that guides you through setting up a Google Container Engine cluster to run JupyterHub to automatically provision secure Jupyter containers for each user in a classroom or team. Don’t let the title of this article mislead you, not only does it use TensorFlow and JupyterHub, it’s actually an open source and cloud smorgasbord based on the Jupyter and Kubernetes platforms.



Jupyter is a powerful open source technology that gives you a platform to write and execute code to analyze, visualize and share the discoveries you find in your big data set. You can download a number of different Docker images preconfigured with many different notebook extensions and software packages to help you on any kind of data-science quest.

If you’re exploring on your own, and really want to get started quickly, you can get this all running on your local computer, but what if you want to take your expertise and lead a classroom of people along the same path? You have to either configure everything for them or walk them through configuring their own machines with all the required software.

This is where JupyterHub comes in, as a management layer in front of Jupyter instances, allowing you to configure users, using custom authentication, and giving you a Python interface to spawn new Jupyter instances for each user. Even with JupyterHub, you still need a way to provision physical and virtual hardware for the students.

Enter Kubernetes, an open source system for automating deploying, scaling and managing containerized applications. Google Container Engine is a fully managed service based on Kubernetes, allowing you to create clusters easily on Google Cloud Platform.

This solution comes with a JupyterHub Spawner class that allows it to create Kubernetes Pods, which are Docker images running Jupyter, for each user. It also comes with all the automation scripts required to create a Container Engine cluster and let you easily customize your setup.

When your students log into JupyterHub using Google OAuth2, they can choose from a list of several pre-built Jupyter images, including a newly updated “datalab-jupyter” image, which comes with the Google Datalab open source notebook extension enabling integration with BigQuery, Google Cloud ML, StackDriver, and it also has TensorFlow and the Apache Beam Python SDK for Google Cloud DataFlow installed.  Users can also choose to run any of the pre-configured Jupyter docker-stack images, or you can build your own Docker images to run any special libraries or Jupyter configurations you want.

We hope that this solution allows you to get your classroom or team environment running quickly so you can focus on learning rather than configuring machines.

By Brad Svee, Cloud Solutions Architect

Dart in 2017 and beyond

We’re here at the Dart Developer Summit in Munich, Germany. Over 250 developers from more than 50 companies from all over the world just finished watching the keynote.

This is a summary of the topics we covered:

Dart is the fastest growing programming language at Google, with a 3.5x increase in lines of code since last year. We like to think that this is because of our focus on developer productivity: teams report 25% to 100% increase in speed of development. Google has bet its biggest business on Dart — the web apps built on Dart bring over $70B per year.

Google AdSense recently launched a ground-up redesign of their web app, built with Dart. Earlier this year, we announced that the next generation of AdWords is built with Dart. There are more exciting Dart products at Google that we’re looking forward to reveal. Outside Google, companies such as Wrike, Workiva, Soundtrap, Blossom, DG Logic, Sonar Design have all been using and enjoying Dart for years.

Our five year investment in this language is reaping fruit. But we’re not finished.

We learned that people who use Dart love its terse and readable syntax. So we’re keeping that.

We have also learned that Dart developers really enjoy the language’s powerful static analysis. So we’re making it better. With strong mode, Dart’s type system becomes sound (meaning that it rejects all incorrect programs). We’re also introducing support for generic methods.

We have validated that the programming language itself is just a part of the puzzle. Dart comes with ‘batteries included.’ Developers really like Dart’s core libraries — we will keep them tight, efficient and comprehensive. We will also continue to invest in tooling such as pub (our integrated packaging system), dartfmt (our automatic formatter) and, of course, the analyzer.

On the web, we have arrived at a framework that is an excellent fit for Dart: AngularDart. All the Google web apps mentioned above use it. It has been in production at Google since February. AngularDart is designed for Dart, and it’s getting better every week. In the past 4 months, AngularDart’s output has gotten 40% smaller, and our AngularDart web apps got 15% faster.

Today, we’re launching AngularDart 2.0 final. Tune in to the next session.

With that, we’re also releasing — as a developer preview — the AngularDart components that Google uses for its major web apps. These Material Design widgets are being developed by hundreds of Google engineers and are thoroughly tested. They are written purely in Dart.

We’re also making Dart easier to use with existing JavaScript libraries. For example, you will be able to use our tool to convert TypeScript .d.ts declarations into Dart libraries.

We’re making the development cycle much faster. Thanks to Dart Dev Compiler, compilation to JavaScript will take less than a second across all modern browsers.

We believe all this makes Dart an even better choice for web development than before. Dart has been here for a long time and it’s not going anywhere. It’s cohesive and dependable, which is what a lot of web developers want.

We’re also very excited about Flutter — a project to help developers build high-performance, high-fidelity, mobile apps for iOS and Android from a single codebase in Dart. More on that tomorrow.

We hope you’ll enjoy these coming two days. Tune in on the live stream or follow #dartsummit on Twitter.

By Filip Hracek, Developer Relations Program Manager

Budou: Automatic Japanese line breaking tool

Today we are pleased to introduce Budou, an automatic line breaking tool for Japanese. What is a line breaking tool and why is it necessary? English uses spacing and hyphenation as cues to allow for beautiful, aka more legible, line breaks. Japanese, which has none of these, is notoriously more difficult. Breaks occur randomly, usually in the middle of a word.

This is a long standing issue in Japanese typography on the web, and results in degradation of readability. We can specify the place which line breaks can occur with CSS coding, but this is a non-trivial manual process which requires Japanese vocabulary and knowledge of grammar.


Budou automatically translates Japanese sentences into organized HTML code with meaningful chunks wrapped in non-breaking markup so as to semantically control line breaks. Budou uses Cloud Natural Language API to analyze the input sentence, and it concatenates proper words in order to produce meaningful chunks utilizing PoS (part-of-speech) tagging and syntactic information. Budou outputs HTML code by wrapping the chunks in a SPAN tag. By specifying their display property as inline-block in CSS, semantic units will no longer be split at the end of a line.

Budou is a simple Python script that runs each sentence through the Cloud Natural Language API. It can easily be extended as a custom filter for template engines, or as a task for runners such as Grunt and Gulp. The latest version also caches the response so no duplicate requests are sent. If you are using Budou for a static website, you can process your HTML code before deployment.

Budou is aimed to be used in relatively short sentences such as titles and headings. Screen readers may read a sentence by splitting the chunks wrapped by SPAN tag or split by WBR tag, so it is discouraged to use Budou for body paragraphs.

As of October 2016, the Cloud Natural Language API supports English, Spanish, and Japanese, and Budou currently only supports Japanese. Support for other Asian languages with line break issues, such as Chinese and Thai, will be added as the API adds support.

Any comments and suggestions are welcome. You can find us on GitHub.

By Shuhei Iitsuka, UX Engineer

Introducing Nomulus: an open source top-level domain name registry

Today, Google is proud to announce the release of Nomulus, a new open source cloud-based registry platform that powers Google’s top level domains (TLDs). We’re excited to make this piece of Internet infrastructure available to everyone.

TLDs are the top level of the Internet Domain Name System (DNS), and they collectively host every domain name on the Internet.  To manage a TLD, you need a domain name registry -- a behind-the-scenes system that stores registration details and DNS information for all domain names under that TLD. It handles WHOIS queries and requests to buy, check, transfer, and renew domain names. When you purchase a domain name on a TLD using a domain name registrar, such as Google Domains, the registrar is actually conducting business with that TLD’s registry on your behalf. That’s why you can transfer a domain from one registrar to another and have it remain active and 100% yours the entire time.

The project that became Nomulus began in 2011 when the Internet Corporation for Assigned Names and Numbers (ICANN) announced the biggest ever expansion of Internet namespace, aimed at improving choice and spurring innovation for Internet users. Google applied to operate a number of new generic TLDs, and built Nomulus to help run them.

We designed Nomulus to be a brand-new registry platform that takes advantage of the scalability and easy operation of Google Cloud Platform. Nomulus runs on Google App Engine and is backed by Google Cloud Datastore, a highly scalable NoSQL database. Nomulus can manage any number of TLDs in a single shared instance and supports the full range of TLD functionality required by ICANN, including the Extensible Provisioning Protocol (EPP), WHOIS, reporting, and trademark protection. It is written in Java and is released under the Apache 2.0 license.

We hope that by providing access to our implementation of core registry functions and up-and-coming services like Registration Data Access Protocol (RDAP), we can demonstrate advanced features of Google Cloud Platform and encourage interoperability and open standards in the domain name industry for registry operators like Donuts. With approximately 200 TLDs, Donuts has made early contributions to the Nomulus code base and has spun up an instance which they'll be sharing soon.

For more information, view Nomulus on GitHub.

By Ben McIlwain, Software Engineer

Google Open Source Report Card

Open source software enables Google to build things quickly and efficiently without reinventing the wheel, allowing us to focus on solving new problems. We stand on the shoulders of giants and we know it. This is why we support open source and make it easy for Googlers to release the projects they’re working on internally as open source.

Today we’re sharing our first Open Source Report Card, highlighting our most popular projects, sharing a few statistics and detailing some of the projects we’ve released in 2016.

We’ve open sourced over 20 million lines of code to date and you can find a listing of some of our best known project releases on our website. Here are some of our most popular projects:
  • Android - a software stack for mobile devices that includes an operating system, middleware and key applications.
  • Chromium - a project encompassing Chromium, the software behind Google Chrome, and Chromium OS, the software behind Google Chrome OS devices.
  • Angular - a web application framework for JavaScript and Dart focused on developer productivity, speed and testability.
  • TensorFlow - a library for numerical computation using data flow graphics with support for scalable machine learning across platforms from data centers to embedded devices.
  • Go - a statically typed and compiled programming language that is expressive, concise, clean and efficient.
  • Kubernetes - a system for automating deployment, operations and scaling of containerized applications.
  • Polymer - a lightweight library built on top of Web Components APIs for building encapsulated re-usable elements in web applications.
  • Protobuf - an extensible, language-neutral and platform-neutral mechanism for serializing structured data.
  • Guava - a set of Java core libraries that includes new collection types (such as multimap and multiset), immutable collections, a graph library, functional types, an in-memory cache, and APIs/utilities for concurrency, I/O, hashing, primitives, reflection, string processing and much more.
  • Yeoman - a robust and opinionated set of scaffolding tools including libraries and a workflow that can help developers quickly build beautiful and compelling web applications.
While it’s difficult to measure the full scope of open source at Google, we can use the subset of projects that are on GitHub to gather some interesting data. Today our GitHub footprint includes over 84 organizations and 3,499 repositories, 773 of which were created this year.

Googlers use countless languages from Assembly to XSLT, but what are their favorites? GitHub flags the most heavily used language in a repository and we can use that to find out. A survey of GitHub repositories shows us these are some of the languages Googlers use most often:
  • JavaScript
  • Java
  • C/C++
  • Go
  • Python
  • TypeScript
  • Dart
  • PHP
  • Objective-C
  • C#
Many things can be gleaned using the open source GitHub dataset on BigQuery, like usage of tabs versus spaces and the most popular Go packages. What about how many times Googlers have committed to open source projects on GitHub? We can search for Google.com email addresses to get a baseline number of Googler commits. Here’s our query:


SELECT count(*) as n
FROM [bigquery-public-data:github_repos.commits]
WHERE committer.date > '2016-01-01 00:00'
AND REGEXP_EXTRACT(author.email, r'.*@(.*)') = 'google.com'


With this, we learn that Googlers have made 142,527 commits to open source projects on GitHub since the start of the year. This dataset goes back to 2011 and we can tweak this query to find out that Googlers have made 719,012 commits since then. Again, this is just a baseline number as it doesn’t count commits made with other email addresses.

Looking back at the projects we’ve open-sourced in 2016 there’s a lot to be excited about. We have released open source software, hardware and datasets. Let’s take a look at some of this year’s releases.

Seesaw
Seesaw is a Linux Virtual Server (LVS) based load balancing platform developed in Go by our Site Reliability Engineers. Seesaw, like many projects, was built to scratch our own itch.

From our blog post announcing its release: “We needed the ability to handle traffic for unicast and anycast VIPs, perform load balancing with NAT and DSR (also known as DR), and perform adequate health checks against the backends. Above all we wanted a platform that allowed for ease of management, including automated deployment of configuration changes.”

Vendor Security Assessment Questionnaire (VSAQ)
We assess the security of hundreds of vendors every year and have developed a process to automate much of the initial information gathering with VSAQ. Many vendors found our questionnaires intuitive and flexible, so we decided to shared them. The VSAQ Framework includes four extensible questionnaire templates covering web applications, privacy programs, infrastructure as well as physical and data center security. You can learn more about it in our announcement blog post.

OpenThread
OpenThread, released by Nest, is a complete implementation of the Thread protocol for connected devices in the home. This is especially important because of the fragmentation we’re seeing in this space. Development of OpenThread is supported by ARM, Microsoft, Qualcomm, Texas Instruments and other major vendors.

Magenta
Can we use machine learning to create compelling art and music? That’s the question that animates Magenta, a project from the Google Brain team based on TensorFlow. The aim is to advance the state of the art in machine intelligence for music and art generation and build a collaborative community of artists, coders and machine learning researchers. Read the release announcement for more information.

Omnitone
Virtual reality (VR) isn’t nearly as immersive without spatial audio and much of VR development is taking place on proprietary platforms. Omnitone is an open library built by members of the Chrome Team that brings spatial audio to the browser. Omnitone builds on standard Web Audio APIs to deliver an immersive experience and can be used alongside projects like WebVR. Find out more in our blog post announcing the project’s release.

Science Journal
Today’s smartphones are packed with sensors that can tell us interesting things about the world around us. We launched Science Journal to help educators, students and citizen scientists tap into those sensors. You can learn more about the project in our announcement blog post.

Cartographer
Cartographer is a library for real-time simultaneous localization and mapping (SLAM) in 2D and 3D with Robot Operating System (ROS) support. Combining data from a variety of sensors, this library computes positioning and maps surroundings. This is a key element of self-driving cars, UAVs and robotics as well as efforts to map the insides of famous buildings. More information on Cartographer can be found in our blog post announcing its release.

This is just a small sampling of what we’ve released this year. Follow the Google Open Source Blog to stay apprised of Google’s open source software, hardware and data releases.

By Josh Simmons, Open Source Programs Office

An open source font system for everyone

Originally posted on the Google Developers Blog

A big challenge in sharing digital information around the world is “tofu”—the blank boxes that appear when a computer or website isn’t able to display text: ⯐. Tofu can create confusion, a breakdown in communication, and a poor user experience.

Five years ago we set out to address this problem via the Noto—aka “No more tofu”—font project. Today, Google’s open source Noto font family provides a beautiful and consistent digital type for every symbol in the Unicode standard, covering more than 800 languages and 110,000 characters.

A few samples of the 110,000+ characters covered by Noto fonts.
The Noto project started as a necessity for Google’s Android and ChromeOS operating systems. When we began, we did not realize the enormity of the challenge. It required design and technical testing in hundreds of languages, and expertise from specialists in specific scripts. In Arabic, for example, each character has four glyphs (i.e., shapes a character can take) that change depending on the text that comes after it. In Indic languages, glyphs may be reordered or even split into two depending on the surrounding text.

The key to achieving this milestone has been partnering with experts in the field of type and font design, including Monotype, Adobe, and an amazing network of volunteer reviewers. Beyond “no more tofu” in the common languages used every day, Noto will be used to preserve the history and culture of rare languages through digitization. As new characters are introduced into the Unicode standard, Google will add these into the Noto font family.

Google has a deep commitment to openness and the accessibility and innovation that come with it. The full Noto font family, design source files, and the font building pipeline are available for free at the links below. In the spirit of sharing and communication across borders and cultures, please use and enjoy! 
By Xiangye Xiao and Bob Jung, Internationalization