Tag Archives: Google Cloud Platform

NoSQL for the serverless age: Announcing Cloud Firestore general availability and updates

Posted by Amit Ganesh, VP Engineering & Dan McGrath, Product Manager

As modern application development moves away from managing infrastructure and toward a serverless future, we're pleased to announce the general availability of Cloud Firestore, our serverless, NoSQL document database. We're also making it available in 10 new locations to complement the existing three, announcing a significant price reduction for regional instances, and enabling integration with Stackdriver for monitoring.

Cloud Firestore is a fully managed, cloud-native database that makes it simple to store, sync, and query data for web, mobile, and IoT applications. It focuses on providing a great developer experience and simplifying app development with live synchronization, offline support, and ACID transactions across hundreds of documents and collections. Cloud Firestore is integrated with both Google Cloud Platform (GCP) and Firebase, Google's mobile development platform. You can learn more about how Cloud Firestore works with Firebase here. With Cloud Firestore, you can build applications that move swiftly into production, thanks to flexible database security rules, real-time capabilities, and a completely hands-off auto-scaling infrastructure.

Cloud Firestore does more than just core database tasks. It's designed to be a complete data backend that handles security and authorization, infrastructure, edge data storage, and synchronization. Identity and Access Management (IAM) and Firebase Auth are built in to help make sure your application and its data remain secure. Tight integration with Cloud Functions, Cloud Storage, and Firebase's SDK accelerates and simplifies building end-to-end serverless applications. You can also easily export data into BigQuery for powerful analysis, post-processing of data, and machine learning.

Building with Cloud Firestore means your app can seamlessly transition from online to offline and back at the edge of connectivity. This helps lead to simpler code and fewer errors. You can serve rich user experiences and push data updates to more than a million concurrent clients, all without having to set up and maintain infrastructure. Cloud Firestore's strong consistency guarantee helps to minimize application code complexity and reduces bugs. A client-side application can even talk directly to the database, because enterprise-grade security is built right in. Unlike most other NoSQL databases, Cloud Firestore supports modifying up to 500 collections and documents in a single transaction while still automatically scaling to exactly match your workload.

What's new with Cloud Firestore

  • New regional instance pricing. This new pricing takes effect on March 3, 2019 for most regional instances, and is as low as 50% of multi-region instance prices.
    • Data in regional instances is replicated across multiple zones within a region. This is optimized for lower cost and lower write latency. We recommend multi-region instances when you want to maximize the availability and durability of your database.
  • SLA now available. You can now take advantage of Cloud Firestore's SLA: 99.999% availability for multi-region instances and 99.99% availability for regional instances.
  • New locations available. There are 10 new locations for Cloud Firestore:
    • Multi-region
      • Europe (eur3)
    • North America (Regional)
      • Los Angeles (us-west2)
      • Montréal (northamerica-northeast1)
      • Northern Virginia (us-east4)
    • South America (Regional)
      • São Paulo (southamerica-east1)
    • Europe (Regional)
      • London (europe-west2)
    • Asia (Regional)
      • Mumbai (asia-south1)
      • Hong Kong (asia-east2)
      • Tokyo (asia-northeast1)
    • Australia (Regional)
      • Sydney (australia-southeast1)

Cloud Firestore is now available in 13 regions.

  • Stackdriver integration (in beta). You can now monitor Cloud Firestore read, write and delete operations in near-real time with Stackdriver.
  • More features coming soon. We're working on adding some of the most requested features to Cloud Firestore from our developer community, such as querying for documents across collections and incrementing database values without needing a transaction.

As the next generation of Cloud Datastore, Cloud Firestore is compatible with all Cloud Datastore APIs and client libraries. Existing Cloud Datastore users will be live-upgraded to Cloud Firestore automatically later in 2019. You can learn more about this upgrade here.

Adding flexibility and scalability across industries

Cloud Firestore is already changing the way companies build apps in media, IoT, mobility, digital agencies, real estate, and many others. The unifying themes among these workloads include: the need for mobility even when connectivity lapses, scalability for many users, and the ability to move quickly from prototype to production. Here are a few of the stories we've heard from Cloud Firestore users.

When opportunity strikes...

In the highly competitive world of shared, on-demand personal mobility via cars, bikes, and scooters, the ability to deliver a differentiated user experience, iterate rapidly, and scale are critical. The prize is huge. Skip provides a scooter-sharing system where shipping fast can have a big impact. Mike Wadhera, CTO and Co-founder, says, "Cloud Firestore has enabled our engineering and product teams to ship at the clock-speed of a startup while leveraging Google-scale infrastructure. We're delighted to see continued investment in Firebase and the broader GCP platform."

Another Cloud Firestore user, digital consultancy The Nerdery, has to deliver high-quality results in a short period of time, often needing to integrate with existing third-party data sources. They can't build up and tear down complicated, expensive infrastructure for every client app they create. "Cloud Firestore was a great fit for the web and mobile applications we built because it required a solution to keep 40,000-plus users apprised of real-time data updates," says Jansen Price, Principal Software Architect. "The reliability and speed of Cloud Firestore coupled with its real-time capabilities allowed us to deliver a great product for the Google Cloud Next conferences."

Reliable information delivery

Incident response company Now IMS uses real-time data to keep citizens safe in crowded places, where cell service can get spotty when demand is high. "As an incident management company, real-time and offline capabilities are paramount to our customers," says John Rodkey, Co-founder. "Cloud Firestore, along with the Firebase Javascript SDK, provides us with these capabilities out of the box. This new 100% serverless architecture on Google Cloud enables us to focus on rapid application development to meet our customers' needs instead of worrying about infrastructure or server management like with our previous cloud."

Regardless of the app, users want the latest information right away, without having to click refresh. The QuintoAndar mobile application connects tenants and landlords in Brazil for easier apartment rentals. "Being able to deliver constantly changing information to our customers allows us to provide a truly engaging experience. Cloud Firestore enables us to do this without additional infrastructure and allows us to focus on the core challenges of our business," says Guilherme Salerno, Engineering Manager at QuintoAndar.

Real-time, responsive apps, happy users

Famed broadsheet and media company The Telegraph uses Cloud Firestore so registered users can easily discover and engage with relevant content. The Telegraph wanted to make the user experience better without having to become infrastructure experts in serving and managing data to millions of concurrent connections. "Cloud Firestore allowed us to build a real-time personalized news feed, keeping readers informed with synchronized content state across all of their devices," says Alex Mansfield-Scaddan, Solution Architect. "It allowed The Telegraph engineering teams to focus on improving engagement with our customers, rather than becoming real-time database and infrastructure experts."

On the other side of the Atlantic, The New York Times used Cloud Firestore to build a feature in The Times' mobile app to send push notifications updated in real time for the 2018 Winter Olympics. In previous approaches to this feature, scaling had been a challenge. The team needed to track each reader's history of interactions in order to provide tailored content for particular events or sports. Cloud Firestore allowed them to query data dynamically, then send the real-time updates to readers. The team was able to send more targeted content faster.

Delivering powerful edge storage for IoT devices

Athlete testing technology company Hawkin Dynamics was an early, pre-beta adopter of Cloud Firestore. Their pressure pads are used by many professional sports teams to measure and track athlete performance. In the fast-paced, high-stakes world of professional sports, athletes can't wait around for devices to connect or results to calculate. They demand instant answers even if the WiFi is temporarily down. Hawkin Dynamics uses Cloud Firestore to bring real-time data to athletes through their app dashboard, shown below.

"Our core mission at Hawkin Dynamics is to help coaches make informed decisions regarding their athletes through the use of actionable data. With real-time updates, our users can get the data they need to adjust an athlete's training on a moment-by-moment basis," says Chris Wales, CTO. "By utilizing the powerful querying ability of Cloud Firestore, we can provide them the insights they need to evaluate the overall efficacy of their programs. The close integrations with Cloud Functions and the other Firebase products have allowed us to constantly improve on our product and stay responsive to our customers' needs. In an industry that is rapidly changing, the flexibility afforded to us by Cloud Firestore in extending our applications has allowed us to stay ahead of the game."

Getting started with Cloud Firestore

We've heard from many of you that Cloud Firestore is helping solve some of your most timely development challenges by simplifying real-time data and data synchronization, eliminating server-side code, and providing flexible yet secure database authentication rules. This reflects the state of the cloud app market, where developers are exploring lots of options to help them build better and faster while also providing modern user experiences. This glance at Stack Overflow questions gives a good picture of some of these trends, where Cloud Firestore is a hot topic among cloud databases.

Source: StackExchange

We've seen close to a million Cloud Firestore databases created since its beta launch. The platform is designed to serve databases ranging in size from kilobytes to multiple petabytes of data. Even a single application running on Cloud Firestore is delivering more than 1 million real-time updates per second to users. These apps are just the beginning. To learn more about serverless application development, take a look through the archive of the recent application development digital conference.

We'd love to hear from you, and we can't wait to see what you build next. Try Cloud Firestore today for your apps.

Google opens new innovation space in San Francisco for the developer community

Posted by Jeremy Neuner, Head of Launchpad San Francisco

Google's Developer Relations team is opening a new innovation space at 543 Howard St. in San Francisco. By working with more than a million developers and startups we've found that something unique happens when we interact with our communities face-to-face. Talks, meetups, workshops, sprints, bootcamps, and social events not only provide opportunities for Googlers to authentically connect with users but also build trust and credibility as we form connections on a more personal level.

The space will be the US home of Launchpad, Google's startup acceleration engine. Founded in 2016 the Launchpad Accelerator has seen 13 cohorts graduate across 5 continents, reaching 241 startups. In 2019, the program will bring together top Google talent with startups from around the world who are working on AI-enabled solutions to problems in financial technology, healthcare, and social good.

In addition to its focus on startups, the Google innovation space will offer programming designed specifically for developers and designers throughout the year. For example, in tandem with the rapid growth of Google Cloud Platform, we will host hands-on sessions on Kubernetes, big data and AI architectures with Google engineers and industry experts.

Finally, we want the space to serve as a hub for industry-wide Developer Relations' diversity and inclusion efforts. And we will partner with groups such as Manos Accelerator and dev/Mission to bring the latest technologies to underserved groups.

We designed the space with a single credo in mind, "We must continually be jumping off cliffs and developing our wings on the way down." The flexible design of the space ensures our community has a place to learn, experiment, and grow.

For more information about our new innovation space, click here.


Kotlin Momentum for Android and Beyond

Posted by James Lau (@jmslau), Product Manager

Today marks the beginning of KotlinConf 2018 - the largest in-person gathering of the Kotlin community annually. 2018 has been a big year for Kotlin, as the language continues to gain adoption and earn the love of developers. In fact, 27% of the top 1000 Android apps on Google Play already use Kotlin. More importantly, Android developers are loving the language with over 97% satisfaction in our most recent survey. It's no surprise that Kotlin was voted as the #2 most-loved language in the 2018 StackOverflow survey.

Google supports Kotlin as a first-class programming language for Android development. In the past 12 months, we have delivered a number of important improvements to the Kotlin developer experience. This includes the Kotlin-friendly SDK, Android KTX, new Lint checks and various Kotlin support improvements in Android Studio. We have also launched Kotlin support in our official documentation, new flagship samples in Kotlin, a new Kotlin Bootcamp Udacity course, #31DaysOfKotlin and other deep dive content. We are committed to continuing to improve the Kotlin developer experience.

As the language continues to advance, more developers are discovering the benefits of Kotlin across the globe. Recently, we traveled to India and worked with local developers like Zomato to better understand how adopting Kotlin has benefited their Android development. Zomato is a leading restaurant search & discovery service that operates in 24 countries, with over 150 million monthly users. Kotlin helped Zomato reduce the number of lines of code in their app significantly, and it has also helped them find important defects in their app at compile time. You can watch their Kotlin adoption story in the video below.

Android Developer Story: Zomato uses Kotlin to write safer, more concise code.

Going beyond Android, we are happy to announce that the Google Cloud Platform team is launching a dedicated Kotlin portal today. This will help developers more easily find resources related to Kotlin on Google Cloud. We want to make it as easy as possible for you to use Kotlin, whether it's on mobile or in the Cloud.

Google Cloud Platform's Kotlin Homepage

Adopting a new language is a major decision for most companies, and you need to be confident that the language you choose will have a bright future. That's why Google has joined forces with JetBrains and established the Kotlin Foundation. The Foundation will ensure that Kotlin continues to advance rapidly, remain free and stay open. You can learn more about the Kotlin Foundation here.

It's an exciting time to be a Kotlin developer. If you haven't tried Kotlin yet, we encourage you to join this growing global community. You can get started by visiting kotlinlang.org or the Android Developer Kotlin page.

Kotlin Momentum for Android and Beyond

Posted by James Lau (@jmslau), Product Manager

Today marks the beginning of KotlinConf 2018 - the largest in-person gathering of the Kotlin community annually. 2018 has been a big year for Kotlin, as the language continues to gain adoption and earn the love of developers. In fact, 27% of the top 1000 Android apps on Google Play already use Kotlin. More importantly, Android developers are loving the language with over 97% satisfaction in our most recent survey. It's no surprise that Kotlin was voted as the #2 most-loved language in the 2018 StackOverflow survey.

Google supports Kotlin as a first-class programming language for Android development. In the past 12 months, we have delivered a number of important improvements to the Kotlin developer experience. This includes the Kotlin-friendly SDK, Android KTX, new Lint checks and various Kotlin support improvements in Android Studio. We have also launched Kotlin support in our official documentation, new flagship samples in Kotlin, a new Kotlin Bootcamp Udacity course, #31DaysOfKotlin and other deep dive content. We are committed to continuing to improve the Kotlin developer experience.

As the language continues to advance, more developers are discovering the benefits of Kotlin across the globe. Recently, we traveled to India and worked with local developers like Zomato to better understand how adopting Kotlin has benefited their Android development. Zomato is a leading restaurant search & discovery service that operates in 24 countries, with over 150 million monthly users. Kotlin helped Zomato reduce the number of lines of code in their app significantly, and it has also helped them find important defects in their app at compile time. You can watch their Kotlin adoption story in the video below.

Android Developer Story: Zomato uses Kotlin to write safer, more concise code.

Going beyond Android, we are happy to announce that the Google Cloud Platform team is launching a dedicated Kotlin portal today. This will help developers more easily find resources related to Kotlin on Google Cloud. We want to make it as easy as possible for you to use Kotlin, whether it's on mobile or in the Cloud.

Google Cloud Platform's Kotlin Homepage

Adopting a new language is a major decision for most companies, and you need to be confident that the language you choose will have a bright future. That's why Google has joined forces with JetBrains and established the Kotlin Foundation. The Foundation will ensure that Kotlin continues to advance rapidly, remain free and stay open. You can learn more about the Kotlin Foundation here.

It's an exciting time to be a Kotlin developer. If you haven't tried Kotlin yet, we encourage you to join this growing global community. You can get started by visiting kotlinlang.org or the Android Developer Kotlin page.

Code that final mile: from big data analysis to slide presentation

Posted by Wesley Chun (@wescpy), Developer Advocate, Google Cloud

Google Cloud Platform (GCP) provides infrastructure, serverless products, and APIs that help you build, innovate, and scale. G Suite provides a collection of productivity tools, developer APIs, extensibility frameworks and low-code platforms that let you integrate with G Suite applications, data, and users. While each solution is compelling on its own, users can get more power and flexibility by leveraging both together.

In the latest episode of the G Suite Dev Show, I'll show you one example of how you can take advantage of powerful GCP tools right from G Suite applications. BigQuery, for example, can help you surface valuable insight from massive amounts of data. However, regardless of "the tech" you use, you still have to justify and present your findings to management, right? You've already completed the big data analysis part, so why not go that final mile and tap into G Suite for its strengths? In the sample app covered in the video, we show you how to go from big data analysis all the way to an "exec-ready" presentation.

The sample application is meant to give you an idea of what's possible. While the video walks through the code a bit more, let's give all of you a high-level overview here. Google Apps Script is a G Suite serverless development platform that provides straightforward access to G Suite APIs as well as some GCP tools such as BigQuery. The first part of our app, the runQuery() function, issues a query to BigQuery from Apps Script then connects to Google Sheets to store the results into a new Sheet (note we left out CONSTANT variable definitions for brevity):

function runQuery() {
// make BigQuery request
var request = {query: BQ_QUERY};
var queryResults = BigQuery.Jobs.query(request, PROJECT_ID);
var jobId = queryResults.jobReference.jobId;
queryResults = BigQuery.Jobs.getQueryResults(PROJECT_ID, jobId);
var rows = queryResults.rows;

// put results into a 2D array
var data = new Array(rows.length);
for (var i = 0; i < rows.length; i++) {
var cols = rows[i].f;
data[i] = new Array(cols.length);
for (var j = 0; j < cols.length; j++) {
data[i][j] = cols[j].v;
}
}

// put array data into new Sheet
var spreadsheet = SpreadsheetApp.create(QUERY_NAME);
var sheet = spreadsheet.getActiveSheet();
var headers = queryResults.schema.fields;
sheet.appendRow(headers); // header row
sheet.getRange(START_ROW, START_COL,
rows.length, headers.length).setValues(data);

// return Sheet object for later use
return spreadsheet;
}

It returns a handle to the new Google Sheet which we can then pass on to the next component: using Google Sheets to generate a Chart from the BigQuery data. Again leaving out the CONSTANTs, we have the 2nd part of our app, the createColumnChart() function:

function createColumnChart(spreadsheet) {
// create & put chart on 1st Sheet
var sheet = spreadsheet.getSheets()[0];
var chart = sheet.newChart()
.setChartType(Charts.ChartType.COLUMN)
.addRange(sheet.getRange(START_CELL + ':' + END_CELL))
.setPosition(START_ROW, START_COL, OFFSET, OFFSET)
.build();
sheet.insertChart(chart);

// return Chart object for later use
return chart;
}

The chart is returned by createColumnChart() so we can use that plus the Sheets object to build the desired slide presentation from Apps Script with Google Slides in the 3rd part of our app, the createSlidePresentation() function:

function createSlidePresentation(spreadsheet, chart) {
// create new deck & add title+subtitle
var deck = SlidesApp.create(QUERY_NAME);
var [title, subtitle] = deck.getSlides()[0].getPageElements();
title.asShape().getText().setText(QUERY_NAME);
subtitle.asShape().getText().setText('via GCP and G Suite APIs:\n' +
'Google Apps Script, BigQuery, Sheets, Slides');

// add new slide and insert empty table
var tableSlide = deck.appendSlide(SlidesApp.PredefinedLayout.BLANK);
var sheetValues = spreadsheet.getSheets()[0].getRange(
START_CELL + ':' + END_CELL).getValues();
var table = tableSlide.insertTable(sheetValues.length, sheetValues[0].length);

// populate table with data in Sheets
for (var i = 0; i < sheetValues.length; i++) {
for (var j = 0; j < sheetValues[0].length; j++) {
table.getCell(i, j).getText().setText(String(sheetValues[i][j]));
}
}

// add new slide and add Sheets chart to it
var chartSlide = deck.appendSlide(SlidesApp.PredefinedLayout.BLANK);
chartSlide.insertSheetsChart(chart);

// return Presentation object for later use
return deck;
}

Finally, we need a driver application that calls all three one after another, the createColumnChart() function:

function createBigQueryPresentation() {
var spreadsheet = runQuery();
var chart = createColumnChart(spreadsheet);
var deck = createSlidePresentation(spreadsheet, chart);
}

We left out some detail in the code above but hope this pseudocode helps kickstart your own project. Seeking a guided tutorial to building this app one step-at-a-time? Do our codelab at g.co/codelabs/bigquery-sheets-slides. Alternatively, go see all the code by hitting our GitHub repo at github.com/googlecodelabs/bigquery-sheets-slides. After executing the app successfully, you'll see the fruits of your big data analysis captured in a presentable way in a Google Slides deck:

This isn't the end of the story as this is just one example of how you can leverage both platforms from Google Cloud. In fact, this was one of two sample apps featured in our Cloud NEXT '18 session this summer exploring interoperability between GCP & G Suite which you can watch here:

Stay tuned as more examples are coming. We hope these videos plus the codelab inspire you to build on your own ideas.

How we brought the latest version of Python to App Engine and Cloud Functions

At Cloud Next 2018, we added Python 3.7 support to Cloud Functions and now we’ve announced Python 3.7 support for the App Engine standard environment. These new runtimes allow you to write Python functions and apps using the latest version of Python and the rich ecosystem of packages available on Python Packaging Index (PyPI).

This new runtime marks a significant update to App Engine and was enabled by new open source software that we recently released: gVisor and FTL.

Python, straight from the source

Running Python 3.7 on App Engine and Cloud Functions required us to fundamentally rethink our infrastructure. Traditionally, meeting Google Cloud’s security requirements meant that we had to run a modified version of the Python interpreter. However, using a modified interpreter constrained some language features and only allowed us to support a limited set of whitelisted Python libraries.

Thanks to gVisor, a container sandbox that provides improved security and process isolation, we can now run the unmodified Python 3.7.0 interpreter. We’ve done extensive testing to make sure Python 3.7 is compatible with gVisor. As part of our compatibility testing, we run Python’s full suite of language tests, and tests for Python packages that are popular on PyPI. We’re committed to ensuring that everything you’ve come to know and love about Python is supported on our platform.

Seamless deployments

Most importantly, this change in our infrastructure makes it easier to take advantage of Python’s vast ecosystem. As a developer, you just add project dependencies to a requirements.txt file and deploy.

During deployment, FTL, a tool for building containers, fetches dependencies listed in your requirements.txt file and installs them alongside your app or function. FTL also includes a short-lived dependency cache, which speeds up repeated deployments if no changes are detected in your requirements.txt file. This is particularly useful if you find just need to re-deploy because you found a typo.

Keeping up with the Pythonistas

In making these changes, we also decided to expand the list of system packages that are included with each runtime’s Ubuntu 18.04 distribution. We think that will make life just a little bit easier for developers working with the latest release of Python.

Looking forward, we’re excited about how these changes will allow us to keep up with the Python community’s progress as they release new versions and libraries. Please let us know what you think and if you run into any challenges.

You can learn more about how to get started with it on App Engine and Cloud Functions in our documentation. We can’t wait to see what you build with Python 3.7.

By Stewart Reichling, Product Manager

We’ve moved! Come see our new home!


Ten years, three months and 30 days ago, we wrote our first post on this blog, and now, we’re writing our last at this particular web address. Today, it’s with great excitement that we present to you the Google Cloud blog, your home for all the latest GCP product news, how-to’s, perspectives and customer stories that you’re used to, all living happily on a shiny, new mobile-friendly platform.

We’re really excited about this change. Not only does the new blog look really nice, but it includes all the content from across the entire Google Cloud family—GCP, G Suite, Google Maps Platform and Chrome Enterprise—so you can see how they all fit together. And because data analysis and artificial intelligence are so central to everything people are building today, we’ve also folded our Big Data and Machine Learning blog into this new platform.

Besides collecting all Google Cloud blog content in one place, we think you’ll really benefit from the blog’s rich tagging capabilities. Now, you can view blog posts by platform, and also drill down to specific technology areas like Application Development, Networking or Open Source, so you can quickly find related content. There are also dedicated pages for partners, customers, trainings and certifications, and solutions and how-to’s, to name a few. And because we can also tag posts to multiple products and topics, you’ll be sure to find what you’re looking for.

Those are just the high-level changes. There are a whole lot of new features to use and explore, and we encourage you to browse the site and get familiar with it. What’s not new is our mission: to provide you with honest, technical content to show you how to build your business on GCP.

To date, we’ve migrated over two year’s worth of GCP blog posts to this new home, with more to come. Let us know if you find any broken links, typos, or just flat-out missing content. And of course, we’d love your feedback on our content, the design, or any features you’d like to see. Thanks for reading!

Last month today: July on GCP

The month of July saw our Google Cloud Next ‘18 conference come and go, and there was plenty of exciting news, updates and demos to share from the show. Here’s a look at some of the most-read blog posts from July.

What caught your attention this month: Creating the open cloud
  • One of the most-read posts this month covered the launch of our Cloud Services Platform, which allows you to build a true hybrid cloud infrastructure. Some of the key components of Cloud Services Platform include the managed Istio service mesh, Google Kubernetes Engine (GKE) On-Prem and GKE Policy Management, Cloud Build for fully managed CI/CD, and several serverless offerings (more on that below). Combined, these technologies can help you gain consistency, security, speed and flexibility of the cloud in your local data center, along with the freedom of workload portability to the environment of your choice.
  • Another popular read was a rundown of Google Cloud’s new serverless offerings. These include core serverless compute announcements such as new App Engine runtimes, Cloud Functions general availability and more. It also included serverless containers, so you can run serverless workloads in a fully managed container environment; GKE Serverless add-on to easily run serverless workloads on Kubernetes Engine; and Knative, the open-source project on which that add-on is built. There are even more features included in this post, too, like Cloud Build, Stackdriver monitoring and Cloud Firestore integration with GCP. 
Bringing detailed metrics and Kubernetes apps to the forefront
  • Another must-read post this month for many of you was Transparent SLIs: See Google Cloud the way your application experiences it, announcing the availability of detailed data insights on GCP services that your workloads use—helping you see like a Google site reliability engineer (SRE). These new service-level indicators (SLIs) go way beyond basic uptime and downtime to delve into response codes, latency and more. You can then separate out metrics by GCP service to see things like API version, location and protocol. The result is that you can filter and sort to get extremely fine-grained information on your software and the GCP services you use, which helps cut resolution times and improve the support experience. Transparent SLIs are available now through the Stackdriver monitoring console. Learn more here about the basics of using SLIs and other SRE tools to measure and manage availability.
  • It’s also now faster and easier to find production-ready commercial Kubernetes apps in the GCP Marketplace. These apps are prepackaged and configured to get up and running easily, whether on Kubernetes Engine or other Kubernetes clusters, and run the gamut from security, data analytics and developer tools to storage, machine learning and monitoring.
There was obviously a lot to talk about at the show, and you can get even more detail on what happened at Next ‘18 here.

Building the cloud back-end
  • For all of you developing cloud apps with Java, the availability of Jib was an exciting announcement last month. This open-source container image builder, available as Gradle and Maven plugins, cuts out several steps from the Docker build flow. Jib does all the work required to package your app into a container image—you don’t need to write a Dockerfile or even have Docker installed. You end up with faster builds and reproducible container images.
  • And on that topic, this best practices for building containers post was a hit, too, giving you tips that will set you up to run your environment more smoothly. The tips in this blog post cover graceful application shutdowns, how to simplify containers and how to choose and tag the container images you’ll use. 
It’s been a busy month at GCP, and we’re glad to share lots of new tools with you. Till next time, build away!

Repairing network hardware at scale with SRE principles



To support our Google Cloud Platform (GCP) customers, we run a complex global network that depends on multiple providers and a lot of hardware. Google network engineering uses a diverse set of vendor equipment to route user traffic from an internet service provider to one of our serving front ends inside a GCP data center. This equipment is proprietary and made by external networking vendors such as Arista, Cisco and Juniper. Each vendor has distinct operational methods, configurations and operational consoles.

With hundreds of distinct components utilized across our global network, we routinely deal with hardware failures—for example, a failed power supply, line card or control plane card. The complexity of today’s cloud networks means that there are a huge number of places where failure can occur. When we first began building and operating our own data centers, Google had a team of engineers, network engineers and site reliability engineers (SREs) who performed fault detection, mitigation and repair work on these devices, using manual processes guided by a ticket system. Google’s SRE principles are prescriptive, and aim to guide developers and operations teams toward better systems reliability. As with DevOps, avoiding toil—the manual tasks that can eat up too much time—is an essential goal.

We realized after becoming familiar with common hardware problems that any ticket type that we encountered repeatedly and that follows a predetermined sequence of steps can easily be automated. Our team created a list of playbooks over time that detailed steps of how to deal with each hardware failure scenario, taking into account relevant software and hardware bugs and typical steps to resolution. Each playbook is used when an alert is received. Given that we already knew in advance how to deal with each issue as it arose, it made sense to automate the work. Here’s how we did it.

Building the automation interface

“In the old way of doing things, we treat our servers like pets, for example, Bob the mail server. If Bob goes down, it’s all hands on deck. The CEO can’t get his email and it’s the end of the world. In the new way, servers are numbered, like cattle in a herd. For example, www001 to www100. When one server goes down, it’s taken out back, shot, and replaced on the line.”
- Randy Bias

The above quote describes a classic engineering scenario often applied within SRE: "Pets vs. cattle," which describes a way of looking at data center hardware as either individual components or a herd of them. The two categories of equipment can be described as follows:

Pet:
  • An individual device you work on. You're familiar with all of its particular failure modes. 
  • When it gets sick, you come to the rescue.

Cattle:
  • A fleet of devices with a common interface.
  • You manage the "herd" of devices as a group.
  • The common interface lets you perform the same basic operations on any device, regardless of its manufacturer.
Before we moved to automating network hardware failure resolution, we were stuck handling our networking equipment like pets, with an eye toward what made it unique, rather than as cattle, with an eye toward what made it a commodity. We needed to make it easier not to custom-manage all these networking devices. Our initial automation design aimed to turn our fleet into cattle by providing a common interface for interacting with networking equipment. Specifically, we used the underlying primitives to implement a higher-level interface for performing common operations—in this case, the basic operations of a line card in a network device, regardless of vendor: "Bring it online," "Take it offline" and "Check the status." We defined the following interface for a line card, using the Go programming language.


type Linecard interface {
  Online() error
  Offline() error 
  Status() error
}
The error qualifier in Go simply means that the function returns an error object if it fails. The underlying code implementing this interface for a Juniper line card varies significantly from implementation on the Cisco line card, but the caller of the function is insulated from the implementation. The upper level code imports the library, and when it operates on a line card, it can only perform one of those three actions we specified above.

We then realized that we could apply the same interface to many hardware components—for example, a fan. For certain vendors, the Online() and Offline() functions did nothing, because those vendors didn't support turning a fan off, so we just used the interface to check the status.
type Fan interface {
  Online() error
  Offline() error 
  Status() error
}
Building upon this line of thought, we realized that we could generalize this interface to define a common interface for all hardware components within a device.
type Component interface {
  Online() error
  Offline() error 
  Status() error
}
By structuring the code this way, anyone can add a device from a new vendor. Moreover, anyone can add any type of new component as a library. Once the library implements this common interface, it can be registered as a handler for that specific vendor and component.

Deciding what to automate

The system needed to interact with humans at various stages of the automation. To decide what to automate, we drew a flow chart of the normal human-based repair sequence and drew boxes around stages we believed we could replace with automation. We used the task of replacing a vendor control plane board as an example. Many of the steps have self-explanatory names, but these are definitions of some of the more complex ones:
  • Determine control plane: Find faulty control plane unit.
  • Determine state: Is it the master or the backup? 
  • Copy image to control plane: Copy the appropriate software image to the master control plane.
  • Offline control plane: Send the backup control plane offline.
  • Toggle mastership: Make the replaced control plane the new master.
Figure 1: Manual workflow for replacing a vendor control plane board
When we needed to carry out this workflow, a Google network engineer performed each step in Figure 1, with the exception of pulling out and replacing the failed control plane, which was performed by someone on-site at a data center location.

Once we had defined this task, we created an automated workflow. The goal of the new system was to provide a UI for our hardware engineers in a data center that allowed them to perform one of those operations at a specific time under specific conditions and with various automated safety checks, followed by an entire device audit at the end of the operation. Previously, a human had performed all of these steps, but now a human only needed to perform the step “hardware gets replaced” in Figure 2—the hardware replacement.
Figure 2: Automated workflow for replacing a vendor control plane board
Automation, before and after
Figure 3: High-level system view.
You can see in Figure 3 what the system looked like after automation. Before automating this workflow, there would have been a lot of manual work. When an alert initially came in, an engineer would have stopped traffic to the device, and offlined by hand the bad component. Our network operations center (NOC) team would then work with the vendor—for example, Juniper or Cisco— to get a replacement part on-site. Next, we would file a change request in our change management system, noting the date of the operation.

On the day of the operation:
  • The data center technician would click “start” on the change management system to begin the repair.
  • Our system picks up this change and is ready to begin the repair.
  • The technician clicks “start” on our UI.
  • An “offline” state machine starts proceeding through the various steps to take the component offline safely.
  • The UI notifies the user each step of the way.
  • Once the state machine has completed, it notifies the technician, who can safely replace the component.
  • Once the component is replaced and re-cabled, the technician returns to the UI and begins the “online” state machine, which safely returns the component into production.
When we reviewed our original automation design, we noticed there would be a lot of work involved in building the various systems needed to implement the automated workflow. To facilitate collaboration, we created ticket items for each component of the system, so multiple engineers could work on the project in parallel.

Automation lessons learned

We used an iterative approach in our planning and execution. We first focused on replacing the line card for one vendor, then moved on to multiple vendors and multiple components. Due to the modular design of the code base and the interacting systems, adding more modules and scaling the code horizontally was easy. 

For example, adding a new library that handled fan replacements meant simply creating the code to handle this and ensuring it implemented the above interface. Then it registered itself in the main function.

We had the option to extend or repurpose existing automation systems owned by our software management teams to meet our needs. We had to carefully consider whether to use those systems or build our own, potentially duplicating work if we chose the latter. Ultimately, we built our own automation because the other systems were understaffed. Trying to extend their tools would have disrupted other teams' project work and delayed our own project.

What worked well
Leveraging multiple engineers to automate our internal part of the workflow allowed us to take the project from design to implementation within a short period—about one year.

What didn’t
We haven't yet fully automated our hardware replacement workflow. Doing so involves troubleshooting hardware issues with vendors and persuading them that each individual failure merits a device or component replacement. We work around this gap in our automation by keeping spares on site for use with our repair automation, and handling the vendor workflow portion of the process separately and mostly manually through our NOC. We are currently working toward a fully automated vendor interaction with our vendor partners.

Measuring automation success
We can measure the hours our automation saves engineers using Google's production change logging service, which all internal tools use to record changes made to the production environment. The service logs changes made by tools manually invoked by engineers as well as tools that provide end-to-end automation without manual input. Thus we can compare how long each network repair action used to take when performed manually vs. the number of repair actions that are undertaken by today's fully automated system. These two data sets allow us to calculate the total time savings from automation. As shown in Figure 4, network hardware repair automation saves us hundreds of hours every month.

Tips for reducing toil through automation

While strategies for eliminating toil must be tailored to your individual environment and use cases, some approaches are universal. Based upon our own experience eliminating toil by automating network repair tasks, we recommend the following: 
  • Measure your toil.  
  • Tackle the biggest sources of toil first, and don't try to solve all problems at once.  
  • Carefully consider whether to enhance existing tools or build new ones. Even if you can partially repurpose another team's work, would creating a tool from scratch actually make more sense cost- or resource-wise? 
  • Take a design-driven approach. Iterate on the design, starting small and iterating quickly. Don't try to design the perfect approach from the start.  
  • Measure your time savings to determine your return on investment.
Automation has proved useful for our team of network site reliability engineers at GCP. Learn more about the practice of SRE and how you might apply its principles to your own network projects.

Istio reaches 1.0: ready for prod



Today, Google Cloud is proud to announce, together with our collaborators, that the Istio open-source project has reached the 1.0 milestone. This is a key step toward delivering the Cloud Services Platform that we discussed last week, helping you manage your services in a hybrid world where some of your infrastructure runs on VMs and some in Kubernetes, some services run in the cloud and some on-premises.

Istio: a service mesh

Istio is at its heart a service mesh—software that layers transparently onto an existing distributed application. It collects logs, traces and telemetry, and adds security and policy without embedding client libraries. Moreover, Istio is also a platform, complete with APIs that let you integrate with systems for logging, telemetry and policy.

Istio delivers a service-based view of the service interactions across the mesh. Whereas traditional monitoring gives you low-level metrics such as nodes’ CPU consumption, Istio measures the actual traffic between services: requests per second, error rates and latency. It also generates a dependency graph so you can see how services affect one another.

With Istio, your DevOps team gets the tools it needs to run distributed apps smoothly. Istio does canary rollouts, letting you smoke-test a new build to make sure it’s performing well before ramping up. It also offers fault-injection, retry logic and circuit breaking so DevOps teams can do more testing and change network behavior at runtime to keep applications up and running.

And finally, Istio adds security. It can be used to layer mTLS on every call, adding encryption-in-flight and giving you the ability to authorize every single call on your cluster and in your mesh.

Istio in action

Istio provides foundational capabilities for your infrastructure, freeing developers to work on code that is critical to your business. But there’s only one way to prove that Istio is ready for the enterprise: by running real workloads on it in production. Already, there are at least a dozen companies running Istio in production, including several on GCP. We worked with them through early hurdles, incorporated their feedback, and they’re reaping the benefits of Istio already. A great example is Auto Trader UK, which used Istio to help accelerate their move to containers and the public cloud.

Auto Trader UK is not only migrating from private cloud to public cloud, but also moving from virtual machines to Kubernetes. The level of control and visibility that Istio provides has enabled us to significantly de-risk this ambitious work, and in several cases has actually helped surface issues we were previously unaware of. We've been able to accelerate the delivery of capabilities such as mutual TLS, that previously would have taken significant engineering effort, allowing us to focus on our market differentiators.
- Karl Stoney, Delivery Infrastructure Lead, Auto Trader UK

A true joint effort

We first released Istio as open source last year, and what a year it’s been. Since that first 0.1 release, Istio has improved and matured significantly, with eight versions, 200+ contributors, and 4,000+ check-ins adding an ever growing set of functionality.

Getting to version 1.0 was truly a community-driven effort. IBM was a key collaborator and co-founder, and Lyft’s Envoy proxy is a key component of the project. Since then, the number of companies involved in Istio has skyrocketed, including Cisco, Red Hat, and VMware consolidating industry support with the goal of accelerating adoption and meeting the service mesh needs of their customers.

“The growth of Istio since its launch last year has been tremendous, and it’s quickly taking its place as the standard way to manage microservices in the cloud,” said Jason McGee, IBM Fellow and VP, IBM Cloud. “Our mission since Istio’s launch has been to enable everyone to succeed with microservices, especially in the enterprise. This is why we’ve focused the community around improving security and scale, and heavily leaned our contributions on what we’ve learned from building agile cloud architectures for companies of all sizes.”
- Jason McGee, IBM Fellow and VP, IBM Cloud 
"We see Istio's potential to be able to solve some of the most complex aspects of application development and deployment. It brings a control plane for service mesh, cluster orchestration, and network control that will support and enable developers to focus on the more important aspects of their application development. We are looking forward to leveraging Istio in Red Hat OpenShift to enable developers to deploy their applications in a more secure and efficient manner." 
- Brian 'Redbeard' Harrington, product manager, Istio, Red Hat
“VMware has been an integral part of the community developing Istio service mesh. We see great potential in Istio’s service-based approach to connectivity, security, and observability. We believe it will become an infrastructure cornerstone, spanning across vSphere and Kubernetes platforms and multiple private and public clouds, and helping our enterprise customers improve development efficiencies and deliver on their SLAs / SLOs in a secure manner. Istio’s application layer complements the network virtualization layer, and together allow enterprises to achieve defense in depth, improve performance and scalability, and speed time to application value.” 
- Pere Monclus, CTO Network and Security, VMware

We’re also thrilled with the number of companies writing adapters for Istio—from observability software from SolarWinds and Datadog, to deployment tools from Weaveworks and CodeFresh, to policy and security offerings from Aspenmesh and Octarine. While Istio is transparent to application developers, it provides a standard integration interface for anyone writing observability tools or policy engines.

Working and integrating with other open source projects in the community drives our success, as well. Integrations with SPIFFE, the Open Policy Agent and OpenTracing all improve the state of open source and the lives of developers.

Istio on GCP

While the open-source Istio project is a major undertaking, we’re also intent on making it especially easy to use on Google Cloud Platform. Last week at Google Cloud Next we announced the alpha release of Managed Istio: open-source Istio that’s automatically installed and upgraded on your Kubernetes Engine clusters as a part of the Cloud Services Platform. Managed Istio will help provide the visibility, security and control you need over services running in hybrid environments, and it integrates with other Google products like Stackdriver and Apigee.

Achieving 1.0 is just a first step, both for the project and for us at Google Cloud. We have ambitious plans for adding features and improving Istio’s usability with  the ultimate goal of delivering a complete set of tools to manage all of your services, so that you can focus on writing software and running a business.

To find out more about Istio and how to get started using it on GCP, please visit cloud.google.com/istio.