Tag Archives: Web

MediaPipe KNIFT: Template-based Feature Matching

Posted by Zhicheng Wang and Genzhi Ye, MediaPipe team

Image Feature Correspondence with KNIFT

In many computer vision applications, a crucial building block is to establish reliable correspondences between different views of an object or scene, forming the foundation for approaches like template matching, image retrieval and structure from motion. Correspondences are usually computed by extracting distinctive view-invariant features such as SIFT or ORB from images. The ability to reliably establish such correspondences enables applications like image stitching to create panoramas or template matching for object recognition in videos (see Figure 1).

Today, we are announcing KNIFT (Keypoint Neural Invariant Feature Transform), a general purpose local feature descriptor similar to SIFT or ORB. Likewise, KNIFT is also a compact vector representation of local image patches that is invariant to uniform scaling, orientation, and illumination changes. However unlike SIFT or ORB, which were engineered with heuristics, KNIFT is an embedding learned directly from a large number of corresponding local patches extracted from nearby video frames. This data driven approach implicitly encodes complex, real-world spatial transformations and lighting changes in the embedding. As a result, the KNIFT feature descriptor appears to be more robust, not only to affine distortions, but to some degree of perspective distortions as well. We are releasing an implementation of KNIFT in MediaPipe and a KNIFT-based template matching demo in the next section to get you started.

Figure 1: Matching a real Stop Sign with a Stop Sign template using KNIFT.

Training Method

In Machine Learning, loosely speaking, training an embedding means finding a mapping that can translate a high dimensional vector, such as an image patch, to a relatively lower dimensional vector, such as a feature descriptor. Ideally, this mapping should have the following property: image patches around a real-world point should have the same or very similar descriptors across different views or illumination changes. We have found real world videos a good source of such corresponding image patches as training data (See Figure 3 and 4) and we use the well-established Triplet Loss (see Figure 2) to train such an embedding. Each triplet consists of an anchor (denoted by a), a positive (p), and a negative (n) feature vector extracted from the corresponding image patches, and d() denotes the Euclidean distance in the feature space.

Figure 2: Triplet Loss Function.

Figure 2: Triplet Loss Function.

Training Data

The training triplets are extracted from all ~1500 video clips in the publicly available YouTube UGC Dataset. We first use an existing heuristically-engineered local feature detector to detect keypoints and compute the affine transform between two frames with a high accuracy (see Figure 4). Then we use this correspondence to find keypoint pairs and extract the patches around these keypoints. Note that the newly identified keypoints may include those that were detected but rejected by geometric verification in the first step. For each pair of matched patches, we randomly apply some form of data augmentation (e.g. random rotation or brightness adjustment) to construct the anchor-positive pair. Finally, we randomly pick an arbitrary patch from another video as the negative to finish the construction of this triplet (see Figure 5).

Figure 3: An example video clip from which we extract training triplets.

Figure 4: Finding frame correspondence using existing local features.

Figure 5: (Top to bottom) Anchor, positive and negative patches.

Hard-negative Triplet Mining

To improve model quality, we use the same hard-negative triplet mining method used by FaceNet training. We first train a base model with randomly selected triplets. Then we implement a pipeline that uses the base model to find semi-hard-negative samples (d(a,p) < d(a,n) < d(a,p)+margin) for each anchor-positive pair (Figure 6). After mixing the randomly selected triplets and hard-negative triplets, we re-train the model with this improved data.

Figure 6: (Top to bottom) Anchor, positive and semi-hard negative patches.

Model Architecture

From model architecture exploration, we have found that a relatively small architecture is sufficient to achieve decent quality, so we use a lightweight version of the Inception architecture as the KNIFT model backbone. The resulting KNIFT descriptor is a 40-dimensional float vector. For more model details, please refer to the KNIFT model card.

Benchmark

We benchmark the KNIFT model inference speed on various devices (computing 200 features) and list them in Table 1.

Table 1: KNIFT performance benchmark.

Table 1: KNIFT performance benchmark.

Quality-wise, we compare the average number of keypoints matched by KNIFT and by ORB (OpenCV implementation) respectively on an in-house benchmark (Table 2). There are many publicly available image matching benchmarks, e.g. 2020 Image Matching Benchmark, but most of them focus on matching landmarks across large perspective changes in relatively high resolution images, and the tasks often require computing thousands of keypoints. In contrast, since we designed KNIFT for matching objects in large scale (i.e. billions of images) online image retrieval tasks, we devised our benchmark to focus on low cost and high precision driven use cases, i.e. 100-200 keypoints computed per image and only ~10 matching keypoints needed for reliably determining a match. In addition, to illustrate the fine-grained performance characteristics of a feature descriptor, we divide and categorize the benchmark set by object types (e.g. 2D planar surface) and image pair relations (e.g. large size difference). In table 2, we compare the average number of keypoints matched by KNIFT and by ORB respectively in each category, based on the same 200 keypoint locations detected in each image by the oFast detector that comes with the ORB implementation in OpenCV.

Table 2: KNIFT vs ORB average number of matched keypoints.

From Table 2, we can see that KNIFT consistently matches more keypoints than ORB by a large margin in every category. Here we acknowledge the fact that KNIFT (40-d float) is considerably larger than ORB (32-d char) and this can have an effort on matching quality. Nevertheless, most local feature benchmarks do not take descriptor size into account so we will follow the convention here.

To make it easy for developers to try KNIFT in MediaPIpe, we have built a local-feature-based template matching solution (see implementation details using MediaPipe in the next section). As a side effect, we can demonstrate the matching quality between KNIFT and ORB visually in side-by-side comparisons like Figure 7 and 9.

Figure 7: Example of “matching 2D planar surface”. (Left) KNIFT 183/240, (Right) ORB 133/240.

In Figure 7, we choose a typical U.S. Stop Sign image from Google Image Search as the template and attempt to match it with the Stop Sign in this video. This example falls into the “matching 2D planar surface” category in Table 2. Using the same 200 keypoint locations detected by oFast and the same RANSAC setting, we show that KNIFT is successful at matching the Stop Sign in 183 frames out of a total of 240 frames. In comparison, ORB matches 133 frames.

Figure 8: Example of “matching 3D untextured object”. Two template images from different views.

Figure 9: Example of “matching 3D untextured object”. (Left) KNIFT 89/150, (Right) ORB 37/150.

Figure 9 shows another matching performance comparison on an example from the “matching 3D untextured object” category in Table 2. Since this example involves large perspective changes of untextured surfaces, which is known to be challenging for local feature descriptors, we use template images from two different views (shown in Figure 8) to improve the matching performance. Again, using the same keypoint locations and RANSAC setting, we show that KNIFT is successful at matching 89 frames out of a total of 150 frames while ORB matches 37 frames.

KNIFT-based Template Matching in MediaPipe

We are releasing the aforementioned template matching solution based on KNIFT in MediaPipe, which is capable of identifying pre-defined image templates and precisely localizing recognized templates on the camera image. There are 3 major components in the template-matching MediaPipe graph shown below:

  • FeatureDetectorCalculator: a calculator that consumes image frames and performs OpenCV oFast detector on the input image and outputs keypoint locations. Moreover, this calculator is also responsible for cropping patches around each keypoint with rotation and scale info and stacking them into a vector for the downstream calculator to process.
  • TfLiteInferenceCalculator with KNIFT model: a calculator that loads the KNIFT tflite model and performs model inference. The input tensor shape is (200, 32, 32, 1), indicating 200 32x32 local patches. The output tensor shape is (200, 40), indicating 200 40-dimensional feature descriptors. By default, the calculator runs the TFLite XNNPACK delegate, but users have the option to select the regular CPU delegate to run at a reduced speed.
  • BoxDetectorCalculator: a calculator that takes pre-computed keypoint locations and KNIFT descriptors and performs feature matching between the current frame and multiple template images. The output of this calculator is a list of TimedBoxProto, which contains the unique id and location of each box as a quadrilateral on the image. Aside from the classic homography RANSAC algorithm, we also apply a perspective transform verification step to ensure that the output quadrilateral does not result in too much skew or a weird shape.

Figure 10: MediaPipe graph of the demo

Demo

In this demo, we chose three different denominations ($1, $5, $20) of U.S. dollar bills as templates and attempted to match them to various real world dollar bills in videos. We resized each input frame to 640x480 pixels, ran the oFast detector to detect 200 keypoints, and used KNIFT to extract feature descriptors from each 32x32 local image patch surrounding these keypoints. We then performed template matching between these video frames and the KNIFT features extracted from the dollar bill templates. This demo runs at 20 FPS on a Pixel 2 Phone CPU with XNNPACK.

Figure 11: Matching different U.S. dollar bills using KNIFT.

Build Your Own Templates

We have provided a set of built-in planar templates in our demo. To make it easy for users to try their own templates, we also provide a tool to build such an index with user generated templates. index_building.pbtxt is a MediaPipe graph that accepts as its input a directory path containing a set of template images. Users can use this graph to compute KNIFT descriptors for all template images (which will be stored in a single file) by 1) replacing the index_proto_filename field in the main graph and the BUILD file and 2) rebuilding the APK file. For step-by-step instructions on how we created the dollar bill demo shown above, please refer to this documentation.

Acknowledgements

We would like to thank Jiuqiang Tang, Chuo-Ling Chang, Dan Gnanapragasam‎, Howard Zhou, Jianing Wei and Ming Guang Yong for contributing to this blog post.

MediaPipe on the Web

Posted by Michael Hays and Tyler Mullen from the MediaPipe team

MediaPipe is a framework for building cross-platform multimodal applied ML pipelines. We have previously demonstrated building and running ML pipelines as MediaPipe graphs on mobile (Android, iOS) and on edge devices like Google Coral. In this article, we are excited to present MediaPipe graphs running live in the web browser, enabled by WebAssembly and accelerated by XNNPack ML Inference Library. By integrating this preview functionality into our web-based Visualizer tool, we provide a playground for quickly iterating over a graph design. Since everything runs directly in the browser, video never leaves the user’s computer and each iteration can be immediately tested on a live webcam stream (and soon, arbitrary video).

Running the MediaPipe face detection example in the Visualizer

Figure 1 shows the running of the MediaPipe face detection example in the Visualizer

MediaPipe Visualizer

MediaPipe Visualizer (see Figure 2) is hosted at viz.mediapipe.dev. MediaPipe graphs can be inspected by pasting graph code into the Editor tab or by uploading that graph file into the Visualizer. A user can pan and zoom into the graphical representation of the graph using the mouse and scroll wheel. The graph will also react to changes made within the editor in real time.

MediaPipe Visualizer hosted at https://viz.mediapipe.dev

Figure 2 MediaPipe Visualizer hosted at https://viz.mediapipe.dev

Demos on MediaPipe Visualizer

We have created several sample Visualizer demos from existing MediaPipe graph examples. These can be seen within the Visualizer by visiting the following addresses in your Chrome browser:

Edge Detection

Face Detection

Hair Segmentation

Hand Tracking

Edge detection
Face detection
Hair segmentation
Hand tracking

Each of these demos can be executed within the browser by clicking on the little running man icon at the top of the editor (it will be greyed out if a non-demo workspace is loaded):

This will open a new tab which will run the current graph (this requires a web-cam).

Implementation Details

In order to maximize portability, we use Emscripten to directly compile all of the necessary C++ code into WebAssembly, which is a special form of low-level assembly code designed specifically for web browsers. At runtime, the web browser creates a virtual machine in which it can execute these instructions very quickly, much faster than traditional JavaScript code.

We also created a simple API for all necessary communications back and forth between JavaScript and C++, to allow us to change and interact with the MediaPipe graph directly from JavaScript. For readers familiar with Android development, you can think of this as a similar process to authoring a C++/Java bridge using the Android NDK.

Finally, we packaged up all the requisite demo assets (ML models and auxiliary text/data files) as individual binary data packages, to be loaded at runtime. And for graphics and rendering, we allow MediaPipe to automatically tap directly into WebGL so that most OpenGL-based calculators can “just work” on the web.

Performance

While executing WebAssembly is generally much faster than pure JavaScript, it is also usually much slower than native C++, so we made several optimizations in order to provide a better user experience. We utilize the GPU for image operations when possible, and opt for using the lightest-weight possible versions of all our ML models (giving up some quality for speed). However, since compute shaders are not widely available for web, we cannot easily make use of TensorFlow Lite GPU machine learning inference, and the resulting CPU inference often ends up being a significant performance bottleneck. So to help alleviate this, we automatically augment our “TfLiteInferenceCalculator” by having it use the XNNPack ML Inference Library, which gives us a 2-3x speedup in most of our applications.

Currently, support for web-based MediaPipe has some important limitations:

  • Only calculators in the demo graphs above may be used
  • The user must edit one of the template graphs; they cannot provide their own from scratch
  • The user cannot add or alter assets
  • The executor for the graph must be single-threaded (i.e. ApplicationThreadExecutor)
  • TensorFlow Lite inference on GPU is not supported

We plan to continue to build upon this new platform to provide developers with much more control, removing many if not all of these limitations (e.g. by allowing for dynamic management of assets). Please follow the MediaPipe tag on the Google Developer blog and Google Developer twitter account. (@googledevs)

Acknowledgements

We would like to thank Marat Dukhan, Chuo-Ling Chang, Jianing Wei, Ming Guang Yong, and Matthias Grundmann for contributing to this blog post.

Security Crawl Maze: An Open Source Tool to Test Web Security Crawlers

Scanning modern web applications for security vulnerabilities can be a difficult task, especially if they are built with Javascript frameworks, which is why crawlers have to use a multi-stage crawling approach to discover all the resources on modern websites.

Living in the times of dynamically changing specifications and the constant appearance of new frameworks, we often have to adjust our crawlers so that they are able to discover new ways in which developers can link resources from their applications. The issue we face in such situations is measuring if changes to crawling logic improve the effectiveness. While working on replacing a crawler for a web security scanner that has been in use for a number of years, we found we needed a universal test bed, both to test our current capabilities and to discover cases that are currently missed. Inspired by Firing Range, today we’re announcing the open-source release of Security Crawl Maze – a universal test bed for web security crawlers.

Security Crawl Maze is a simple Python application built with the Flask framework that contains a wide variety of cases for ways in which a web based application can link other resources on the Web. We also provide a Dockerfile which allows you to build a docker image and deploy it to an environment of your choice. While the initial release is covering the most important cases for HTTP crawling, it’s a subset of what we want to achieve in the near future. You’ll soon be able to test whether your crawler is able to discover known files (robots.txt, sitemap.xml, etc…) or crawl modern single page applications written with the most popular JS frameworks (Angular, Polymer, etc.).

Security crawlers are mostly interested in code coverage, not in content coverage, which means the deduplication logic has to be different. This is why we plan to add cases which allow for testing if your crawler deduplicates URLs correctly (e.g. blog posts, e-commerce). If you believe there is something else, feel free to add a test case for it – it’s super simple! Code is available on GitHub and through a public deployed version.

We hope that others will find it helpful in evaluating the capabilities of their crawlers, and we certainly welcome any contributions and feedback from the broader security research community.

By Maciej Trzos, Information Security Engineer

Advance your career with the Google Africa Certifications Scholarships

Posted by William Florance, Global Head, Developer Training Programs

Building upon our pledge to provide mobile developer training to 100,000 Africans to develop world class apps, today we are pleased to announce the next round of Google Africa Certification Scholarships aimed at helping developers become certified on Google’s Android, Web, and Cloud technologies.

This year, we are offering 30,000 additional scholarship opportunities and 1,000 grants for the Google Associate Android Developer, Mobile Web Specialist, and Associate Cloud Engineer certifications. The scholarship program will be delivered by our partners, Pluralsight and Andela, through an intensive learning curriculum designed to prepare motivated learners for entry-level and intermediate roles as software developers. Interested students in Africa can learn more about the Google Africa Certifications Scholarships and apply here

According to World Bank, Africa is on track to have the largest working-age population (1.1 billion) by 2034. Today’s announcement marks a transition from inspiring new developers to preparing them for the jobs of tomorrow. Google’s developer certifications are performance-based. They are developed around a job-task analysis which test learners for skills that employers expect developers to have.

As announced during Sundar Pichai - Google CEO’s visit to Nigeria in 2017, our continued initiatives focused on digital skills training, education and economic opportunity, and support for African developers and startups, demonstrate our commitment to help advance a healthy and vibrant ecosystem. By providing support for training and certifications we will help bridge the unemployment gap on the continent through increasing the number of employable software developers.

Although Google’s developer certifications are relatively new, we have already seen evidence that becoming certified can make a meaningful difference to developers and employers. Adaobi Frank - a graduate of the Associate Android Developer certification - got a better job that paid ten times more than her previous salary after completing her certification. Her interview was expedited as her employer was convinced that she was great for the role after she mentioned that she was certified. Now, she's got a job that helps provide for her family - see her video here. Through our efforts this year, we want to help many more developers like Ada and support the growth of startups and technology companies throughout Africa.

Follow this link to learn more about the scholarships and apply.

Web Notifications API Support Now Available in FCM Send v1 API

Posted by Mertcan Mermerkaya, Software Engineer

We have great news for web developers that use Firebase Cloud Messaging to send notifications to clients! The FCM v1 REST API has integrated fully with the Web Notifications API. This integration allows you to set icons, images, actions and more for your Web notifications from your server! Better yet, as the Web Notifications API continues to grow and change, these options will be immediately available to you. You won't have to wait for an update to FCM to support them!

Below is a sample payload you can send to your web clients on Push API supported browsers. This notification would be useful for a web app that supports image posting. It can encourage users to engage with the app.

{
"message": {
"webpush": {
"notification": {
"title": "Fish Photos ?",
"body":
"Thanks for signing up for Fish Photos! You now will receive fun daily photos of fish!",
"icon": "firebase-logo.png",
"image": "guppies.jpg",
"data": {
"notificationType": "fishPhoto",
"photoId": "123456"
},
"click_action": "https://example.com/fish_photos",
"actions": [
{
"title": "Like",
"action": "like",
"icon": "icons/heart.png"
},
{
"title": "Unsubscribe",
"action": "unsubscribe",
"icon": "icons/cross.png"
}
]
}
},
"token": "<APP_INSTANCE_REGISTRATION_TOKEN>"
}
}

Notice that you are able to set new parameters, such as actions, which gives the user different ways to interact with the notification. In the example below, users have the option to choose from actions to like the photo or to unsubscribe.

To handle action clicks in your app, you need to add an event listener in the default firebase-messaging-sw.js file (or your custom service worker). If an action button was clicked, event.action will contain the string that identifies the clicked action. Here's how to handle the "like" and "unsubscribe" events on the client:

// Retrieve an instance of Firebase Messaging so that it can handle background messages.
const messaging = firebase.messaging();

// Add an event listener to handle notification clicks
self.addEventListener('notificationclick', function(event) {
if (event.action === 'like') {
// Like button was clicked

const photoId = event.notification.data.photoId;
like(photoId);
}
else if (event.action === 'unsubscribe') {
// Unsubscribe button was clicked

const notificationType = event.notification.data.notificationType;
unsubscribe(notificationType);
}

event.notification.close();
});

The SDK will still handle regular notification clicks and redirect the user to your click_action link if provided. To see more on how to handle click actions on the client, check out the guide.

Since different browsers support different parameters in different platforms, it's important to check out the browser compatibility documentation to ensure your notifications work as intended. Want to learn more about what the Send API can do? Check out the FCM Send API documentation and the Web Notifications API documentation. If you're using the FCM Send API and you incorporate the Web Notifications API in a cool way, then let us know! Find Firebase on Twitter at @Firebase, and Facebook and Google+ by searching "Firebase".

Funding 15,000 web and android scholarship in Africa – to provide employable developer skills

Posted by William Florance, Head, Economic Impact Programs

Africa's digital journey is rapidly gaining speed. According to the recent data, over 73 million people came online in Africa for the first time in 2017- that's more than the population of the UK! This means there are now about 435 million people on the continent using the Web to engage, connect and access information online. That's a good thing! But with this growth comes with an increased need to scale efforts to make the Web more relevant and useful to African users. This will require more skilled hands working with individuals and local businesses to develop content and platforms that will support Africa's digital growth.

In July 2017, Google's CEO, Sundar Pichai, announced a pledge to provide digital skills training to ten million people in Africa, and also to provide mobile developer training to 100,000 Africans. Today, in line with that commitment, we're excited to announce the launch of our new Africa Web and Android Scholarship program aimed at providing 15,000 scholarships to developers resident in Africa countries.

Working in partnership with Udacity and Andela, we will be offering 15,000 2-month 'single course' scholarships and 500 6-month nanodegree scholarships to aspiring and professional developers across Africa. The training will be available online via the Udacity training website, and the Andela Learning Community will support the students (in Nigeria and Kenya) through mentorship, in-person meet-ups, and online communities.

In order to access the full nanodegree scholarships, learners will have to complete lessons and quizzes courses being offered under the Udacity single course scholarships (also known as challenge courses) in addition to their active participation and support of classmates in the student community. We will be offering 10,000 scholarships to beginners (with little or no programming experience) and 5,000 to professional developers (with +1 year of experience) spread across Android and mobile web development tracks. The 10,000 beginner scholarships will include Android beginner courses and basic introduction to HTML & CSS; while the 5,000 intermediate scholarships include Android fundamentals for intermediate and building offline web applications courses respectively. Both courses are taught in English through an online program on Udacity open to Africa residence. The top 500 students at the end of the challenge will earn a full Nanodegree scholarship to one of four Nanodegree programs in Android or web development.

The application period closes on April 24th. Interested or want to learn more, visit https://www.udacity.com/google-africa-scholarships?utm_source=devblog

AMP stories: Bringing visual storytelling to the open web

Posted by Rudy Galfi, Product Manager for AMP at Google

The AMP story format is a recently launched addition to the AMP Project that provides content publishers with a mobile-focused format for delivering news and information as visually rich, tap-through stories.

A visual-driven format for evolving news consumption on mobile

Some stories are best told with text while others are best expressed through images and videos. On mobile devices, users browse lots of articles, but engage with few in-depth. Images, videos and graphics help publishers to get their readers' attention as quickly as possible and keep them engaged through immersive and easily consumable visual information.

Recently, as with many new or experimental features within AMP, contributors from multiple companies — in this case, Google and a group of publishers — came together to work toward building a story-focused format in AMP. The collective desire was that this format offer new, creative and visually rich ways of storytelling specifically designed for mobile.

Minimize technical challenges and let creators focus on the storytelling

The mobile web is great for distributing and sharing content, but mastering performance can be tricky. Creating visual stories on the web with the fast and smooth performance that users have grown accustomed to in native apps can be challenging. Getting these key details right often poses prohibitively high startup costs, particularly for small publishers.

AMP stories are built on the technical infrastructure of AMP to provide a fast, beautiful experience on the mobile web. Just like any web page, a publisher hosts an AMP story HTML page on their site and can link to it from any other part of their site to drive discovery. And, as with all content in the AMP ecosystem, discovery platforms can employ techniques like pre-renderable pages, optimized video loading and caching to optimize delivery to the end user.

AMP stories aim to make the production of stories as easy as possible from a technical perspective. The format comes with preset but flexible layout templates, standardized UI controls, and components for sharing and adding follow-on content.

Yet, the design gives great editorial freedom to content creators to tell stories true to their brand. Publishers involved in the early development of the AMP stories format — CNN, Conde Nast, Hearst, Mashable, Meredith, Mic, Vox Media, and The Washington Post — have brought together their reporters, illustrators, designers, producers, and video editors to creatively use this format and experiment with novel ways to tell immersive stories for a diverse set of content categories.

Developer preview for AMP stories is starting today

Today AMP stories are available for everyone to try on their websites. As part of the AMP Project, the AMP story format is free and open for anyone to use. To get started, check out the tutorial and documentation. We are looking forward to feedback from content creators and technical contributors alike.

Also, starting today, you can see AMP stories on Google Search. To try it out, search for publisher names (like the ones mentioned above) within g.co/ampstories using your mobile browser. At a later point, Google plans to bring AMP stories to more products across Google, and expand the ways they appear in Google Search.

Why I contribute to Chromium

This is a guest post by Yoav Weiss who was recently recognized through the Google Open Source Peer Bonus Program for his work on the Chromium project. We invited Yoav to share about his work on our blog.

I was recently recognized by Google for my contributions to Chromium and wanted to write a few words on why I contribute to the project, other rendering engines and the web platform in general. I also wanted to share how it helped me evolve as a developer and why more people should contribute to the web platform for their own benefit.

The web platform

I’ve written before about why I think the web platform is an extremely important asset for humanity and why we should make sure it'll thrive for years to come. It enables the distribution of knowledge to the corners of the earth and has fundamentally changed our world. Yet, compared to the amount of users (billions!) and web developers (millions), there are only a few hundred engineers working on maintaining and improving the platform itself.

That means that there are many aspects of the platform that are not as well maintained as they should be. We're at a real risk of a "tragedy of the commons" scenario, where despite usage and utility, the platform will collapse under its own weight because maintaining it is nobody's exclusive problem.

How I got started

Personally, I had been working on web performance for well over a decade before I decided to get more involved and lend my hand in building the platform. For a large part of my professional life, browsers were black boxes. They were given to us by the browser gods and that's what we had to work with for the next few years. Their undocumented bugs and quirks became gospel, passed from senior engineers to their juniors.

Then at some point, that situation changed. Slowly but surely, open source browsers started picking up market share. No longer black boxes, we can actually see what happens on the inside!

I first got involved by joining the responsive images discussions and the Responsive Images Community Group. Then, I saw a tweet from RICG's chair calling to develop a prototype of the current proposal to prove its feasibility and value. And I jumped in.

I created a prototype using Chromium and WebKit, demoed it to anyone that was interested, worked on the proposals and argued the viability of the proposals' approach on the various mailing lists. Eventually, we were able to get some browser folks on board, improve the proposals and their fit to the rest of the platform, and I started working on an implementation.

The amount of work this required was larger than I expected. Eventually I managed to ship the feature in Blink and Chromium, and complete large parts of the implementation in WebKit as well. WOOT!

Success! Now what?

After that project was done, I started looking into what I should do next. I was determined to continue working on browsers and find a gig that would let me do that. So I searched for an employer with a vested interest in the web and in making it faster, who would be happy to let me work on the platform's client - the web browser.

I found such an employer in Akamai, where I have been working as a Principal Architect ever since. As part of my job I'm working on our performance optimization features as well as performance-related browser features, making sure they make it into browsers in a timely fashion.

Why you should contribute, too

Now, chances are that if you're reading this, you're also relying on the web platform for your job in one way or another. Which means that there's a chance that it also makes sense for your organization to contribute to the web platform. Let’s explore the reasons:

1. Make sure work is done on features you care about

If you're like me, you love the web platform and the reach it provides you, but you're not necessarily happy with all of it. The web is great, but not perfect. Since browsers and web standards are no longer black boxes, you can help change that.

You can work on standards and browsers to change them to include your use-case. That's immense power at your fingertips: put in the work and the platform evolves for all the billions of users out there.

And you don’t have to wait years before new features can be used in production like with yesteryear's browser changes. With today’s browser update rates and progressive enhancement, you’ll probably be able to use changes in production within a few months.

2. Gain expertise that can help you do your job better

Knowing browser internals better can also give you superpowers in other parts of your job. Whenever questions about browser behavior arrive, you can take a peek into the source code and have concrete answers rather than speculation.

Keeping track of standards discussions give you visibility into new browser APIs that are coming along, so that you can opt to use those rather than settle for sub-optimal alternatives that are currently available.

3. Grow as an engineer

Working on browsers teaches you a lot about how things work under the surface and enables you to understand the internals of modern browsers, which are extremely complex machines. Further, this work allows you to get code reviews from the world's leading experts on these subjects. What better way to grow than to interact with the experts?

4. It's a fun and welcoming community

Contributing to the web platform has been a great experience for me. Working with the Chromium project, in particular, is always great fun. The project is Google backed, but there are many external contributors and the majority of work and decisions are being done in the open. The people I've worked with are super friendly and happy to help. All in all, it's really fun!

Join us

The web needs more people working on it, and working on the web platform can be extremely beneficial to you, your career and your business.

If you're interested in getting started with web standards, the Discourse instance of the web Platform Incubator Community Group (or WICG for short) is where it's at (disclaimer: I'm co-chairing that group). For getting started with Chromium development, this is the post for you.

And most important, don't be afraid to ask the community. People on blink-dev and IRC are super friendly and will be happy to point you in the right direction.

So come on over and join the good cause. We'll be happy to have you!

By Yoav Weiss, Chromium contributor

Get ready for Javascript “Promises” with Google and Udacity

Sarah Clark, Program Manager, Google Developer Training

Front-end web developers face challenges when using common “asynchronous” requests. These requests, such as fetching a URL or reading a file, often lead to complicated code, especially when performing multiple actions in a row. How can we make this easier for developers?

Javascript Promises are a new tool that simplifies asynchronous code, converting a tangle of callbacks and event handlers into simple, straightforward code such as: fetch(url).then(decodeJSON).then(addToPage)...

Promises are used by many new web standards, including Service Worker, the Fetch API, Quota Management, Font Load Events,Web MIDI, and Streams.


We’ve just opened up a online course on Promises, built in collaboration with Udacity. This brief course, which you can finish in about a day, walks you through building an “Exoplanet Explorer” app that reads and displays live data using Promises. You’ll also learn to use the Fetch API and finally kiss XMLHttpRequest goodbye!

This short course is a prerequisite for most of the Senior Web Developer Nanodegree. Whether you are in the paid Nanodegree program or taking the course for free, won’t you come learn to make your code simpler and more reliable today?