Tag Archives: Management Tools

Greetings from North Pole Operations! All systems go!



Hi there! I’m Merry, Santa’s CIE (Chief Information Elf), responsible for making sure computers help us deliver joy to the world each Christmas. My elf colleagues are really busy getting ready for the big day (or should I say night?), but this year, my team has things under control, thanks to our fully cloud-native architecture running on Google Cloud Platform (GCP)! What’s that? You didn’t know that the North Pole was running in the cloud? How else did you think that we could scale to meet the demands of bringing all those gifts to all those children around the world?
You see, North Pole Operations have evolved quite a lot since my parents were young elves. The world population increased from around 1.6 billion in the early 20th century to 7.5 billion today. The elf population couldn’t keep up with that growth and the increased production of all these new toys using our old methods, so we needed to improve efficiency.

Of course, our toy list has changed a lot too. It used to be relatively simple rocking horses, stuffed animals, dolls and toy trucks, mostly. The most complicated things we made when I was a young elf were Teddy Ruxpins (remember those?). Now toy cars and even trading card games come with their own apps and use machine learning.

This is where I come in. We build lots of computer programs to help us. My team is responsible for running hundreds of microservices. I explain microservices to Santa as a computer program that performs a single service. We have a microservice for processing incoming letters from kids, another microservice for calculating kids’ niceness scores, even a microservice for tracking reindeer games rankings.
Here's an example of the Letter Processing Microservice, which takes handwritten letter in all languages (often including spelling and grammatical errors) and turns each one into text.
Each microservice runs on one or more computers (also called virtual machines or VMs). We tried to run it all from some computers we built here at the North Pole but we had trouble getting enough electricity for all these VMs (solar isn’t really an option here in December). So we decided to go with GCP. Santa had some reservations about “the Cloud” since he thought it meant our data would be damaged every time it rained (Santa really hates rain). But we managed to get him a tour of a data center (not even Santa can get in a Google data center without proper clearances), and he realized that cloud computing is really just a bunch of computers that Google manages for us.

Google lets us use projects, folders and orgs to group different VMs together. Multiple microservices can make up an application and everything together makes up our system. Our most important and most complicated application is our Christmas Planner application. Let’s talk about a few services in this application and how we make sure we have a successful Christmas Eve.
Our Christmas Planner application includes microservices for a variety of tasks: microservices generate lists of kids that are naughty or nice, as well as a final list of which child receives which gift based on preferences and inventory. Microservices plan the route, taking into consideration inclement weather and finally, generate a plan for how to pack the sleigh.

Small elves, big data


Our work starts months in advance, tracking naughty and nice kids by relying on parent reports, teacher reports, police reports and our mobile elves. Keeping track of almost 2 billion kids each year is no easy feat. Things really heat up around the beginning of December, when our army of Elves-on-the-Shelves are mobilized, reporting in nightly.
We send all this data to a system called BigQuery where we can easily analyze the billions of reports to determine who's naughty and who's nice in just seconds.

Deck the halls with SLO dashboards


Our most important service level indicator or SLI is “child delight”. We target “5 nines” or 99.999% delightment level meaning 99,999/100,000 nice children are delighted. This limit is our service level objective or SLO and one of the few things everyone here in the North Pole takes very seriously. Each individual service has SLOs we track as well.

We use Stackdriver for dashboards, which we show in our control center. We set up alerting policies to easily track when a service level indicator is below expected and notify us. Santa was a little grumpy since he wanted red and green to be represented equally and we explained that the red warning meant that there were alerts and incidents on a service, but we put candy canes on all our monitors and he was much happier.

Merry monitoring for all


We have a team of elite SREs (Site Reliability Elves, though they might be called Site Reliability Engineers by all you folks south of the North Pole) to make sure each and every microservice is working correctly, particularly around this most wonderful time of the year. One of the most important things to get right is the monitoring.

For example, we built our own “internet of things” or IoT where each toy production station has sensors and computers so we know the number of toys made, what their quota was and how many of them passed inspection. Last Tuesday, there was an alert that the number of failed toys had shot up. Our SREs sprang into action. They quickly pulled up the dashboards for the inspection stations and saw that the spike in failures was caused almost entirely by our baby doll line. They checked the logs and found that on Monday, a creative elf had come up with the idea of taping on arms and legs rather than sewing them to save time. They rolled back this change immediately. Crisis averted. Without the proper monitoring and logging, it would be very difficult to find and fix the issue, which is why our SREs consider it the base of their gift reliability pyramid.

All I want for Christmas is machine learning


Running things in Google Cloud has another benefit: we can use technology they’ve developed at Google. One of our most important services is our gift matching service, which takes 50 factors as input including the child’s wish list, niceness score, local regulations, existing toys, etc., and comes up with the final list of which gifts should be delivered to this child. Last year, we added machine learning or ML, where we gave the Cloud ML engine the last 10 years of inputs, gifts and child and parent delight levels. It automatically learned a new model to use in gift matching based on this data.

Using this new ML model, we reduced live animal gifts by 90%, ball pits by 50% and saw a 5% increase in child delight and a 250% increase in parent delight.

Tis the season for sharing


Know someone who loves technology that might enjoy this article or someone who reminds you of Santa  someone with many amazing skills but whose eyes get that “reindeer-in-the-headlights look” when you talk about cloud computing? Share this article with him or her and hopefully you’ll soon be chatting about all the cool things you can do with cloud computing over Christmas cookies and eggnog... And be sure to tell them to sign up for a free trial  Google Cloud’s gift to them!

A developer’s toolkit for building great applications on GCP



Whether you're looking to build your next big application, learn some really cool technology concepts or gain some hands-on development experience, Google Cloud Platform (GCP) is a great development platform. But where do you start?

Lots of customers talk to us about their varying application development needs  for example, what are the best tools to use for web and mobile app development, how do I scale my application back-end and how do I add data processing and intelligence to my application? In this blog, we’ll share some resources to help you identify which products are best suited to your development goals.

To help you get started, here are a few resources such as quick start guides, videos and codelabs for services across web and mobile app development, developer tools, data processing and analytics and monitoring:

Web and mobile app development:


Google App Engine offers a fully managed serverless platform that allows you to build highly scalable web and mobile applications. If you're looking for a zero-config application hosting service that will let you auto-scale from zero to infinite-size without having to manage any infrastructure, look no further than App Engine
Cloud Functions is another great event-driven serverless compute platform you can use to build microservices at the functions level, scale to infinite size and pay only for what you use. If you're looking for a lightweight compute platform to run your code in response to user actions, analytics, authentication events or for telemetry data collection, real-time processing and analysis, Cloud Functions has everything you need to be agile and productive.

Build, deploy and debug:

Developer tools provide plugins to build, deploy and debug code using your favorite IDEs, such as IntelliJ, Eclipse, Gradle and Maven. You can use either cloud SDK or a browser-based command line to build your apps. Cloud Source Repositories that come as a part of developer tools let you host private Git repos and organize the associated code. Below are a sampling of resources, check out the developer tools section for more.

Data Processing and Analytics:

BigQuery offers a fast, highly scalable, low cost and fully managed data warehouse on which you can perform analytics. Cloud Pub/Sub allows you to ingest event streams from anywhere, at any scale, for simple, reliable, real-time stream analytics. BigQuery and Pub/Sub work seamlessly with services like Cloud Machine Learning Engine to help you add an intelligence layer to your application.

Monitoring, logging and diagnostics:

Google Stackdriver provides powerful monitoring, logging and diagnostics. It equips you with insight into the health, performance and availability of cloud-powered applications, enabling you to find and fix issues faster. It's natively integrated with GCP, other cloud providers and popular open source packages.
I hope this gives you enough content to keep you engaged and provide great learning experiences of the different application development services on GCP. Looking for tips and tricks as you're building your applications? Check out this link for details. Sign up for your free trial today and get a $300 GCP credit!

Extracting value from your logs with Stackdriver logs-based metrics



Most developers know that logs are important for debugging purposes but struggle in extracting value from the massive volume of logs generated across their system. This is where the right tools to extract insights from logs can really make a difference. Stackdriver Logging’s improved analytics tools became available in beta earlier this year and are now generally available, and being used by our customers.

Logs-based metrics are the cornerstone of Stackdriver Logging’s analytics platform and allow you to identify trends and extract numeric values out of the logs. Here are a few examples of how we’ve seen customers use logs-based metrics.

Filter on labels 

As a simple example, we have a sample App Engine restaurant application that includes a parameter that includes food ordered in the URL. We want to be able to count the number of each menu item ordered. Logs-based metric labels allow you to use a regular expression to extract the desired field.
You can then view the logs-based metric, filter on the labels or create alerts in Stackdriver Monitoring. In the example below, we use Metric Explorer to filter down to just “burgers” and can see the rate of orders placed over the last day.
Google Cloud Apigee needed to take their terabytes of rich log data and understand errors and latency for each service they run across each region. They used logs based metric labels to extract the service and region directly from the log entry. As Madhurranjan Mohaan, software engineer at Apigee explains, “Parameterizing log searches based on labels have really helped us visualize errors and latency across multiple charts. It’s great because it’s parameterized across all regions.”

Extract values from log entries with distribution metrics


Waze uses distribution metrics to extract fields such as latency from their log entries. They can then view the distributions coming from logs as a heat map in Stackdriver Monitoring.

As a simple example, we have an application running on Compute Engine VMs, emitting logs with a latency value in the payload. We first filter down to the relevant log entries that include the “healthz” text in the payload and then use a regular expression to extract the value for the distribution logs-based metric.
In Stackdriver Monitoring, we can visualize the data. In the example below, we add the chart as a heatmap to a custom dashboard. We could also instead choose to visualize it as a line chart of the 99th percentile, for example, by changing the aggregation and visualization.

Alert on matching log entries


Another common use case is to get notified whenever a matching log entry occurs. For example, to get an email whenever a new VM is created in your system, you can create a logs-based metric on log entries for new VMs and then create a metric threshold based alerting policy. To see how to do this, watch this short video.

You can also use logs-based metrics to gain insight into the performance of Google Cloud Platform (GCP) services. For example, recently, our BigQuery team did an analysis of easily fixed errors that BigQuery users commonly encounter. We discovered that many users received rateLimitExceeded and quotaExceeded. We found that using logs-based metrics can improve the user experience by adding visibility to BigQuery administrators. You can read more about this analysis here.

We hope these examples help you to leverage the power of Stackdriver Logging analytics. If you have questions, feedback or want to share how you’ve made logs based metrics work for you, we’d love to hear from you.

Extracting value from your logs with Stackdriver logs-based metrics



Most developers know that logs are important for debugging purposes but struggle in extracting value from the massive volume of logs generated across their system. This is where the right tools to extract insights from logs can really make a difference. Stackdriver Logging’s improved analytics tools became available in beta earlier this year and are now generally available, and being used by our customers.

Logs-based metrics are the cornerstone of Stackdriver Logging’s analytics platform and allow you to identify trends and extract numeric values out of the logs. Here are a few examples of how we’ve seen customers use logs-based metrics.

Filter on labels 

As a simple example, we have a sample App Engine restaurant application that includes a parameter that includes food ordered in the URL. We want to be able to count the number of each menu item ordered. Logs-based metric labels allow you to use a regular expression to extract the desired field.
You can then view the logs-based metric, filter on the labels or create alerts in Stackdriver Monitoring. In the example below, we use Metric Explorer to filter down to just “burgers” and can see the rate of orders placed over the last day.
Google Cloud Apigee needed to take their terabytes of rich log data and understand errors and latency for each service they run across each region. They used logs based metric labels to extract the service and region directly from the log entry. As Madhurranjan Mohaan, software engineer at Apigee explains, “Parameterizing log searches based on labels have really helped us visualize errors and latency across multiple charts. It’s great because it’s parameterized across all regions.”

Extract values from log entries with distribution metrics


Waze uses distribution metrics to extract fields such as latency from their log entries. They can then view the distributions coming from logs as a heat map in Stackdriver Monitoring.

As a simple example, we have an application running on Compute Engine VMs, emitting logs with a latency value in the payload. We first filter down to the relevant log entries that include the “healthz” text in the payload and then use a regular expression to extract the value for the distribution logs-based metric.
In Stackdriver Monitoring, we can visualize the data. In the example below, we add the chart as a heatmap to a custom dashboard. We could also instead choose to visualize it as a line chart of the 99th percentile, for example, by changing the aggregation and visualization.

Alert on matching log entries


Another common use case is to get notified whenever a matching log entry occurs. For example, to get an email whenever a new VM is created in your system, you can create a logs-based metric on log entries for new VMs and then create a metric threshold based alerting policy. To see how to do this, watch this short video.

You can also use logs-based metrics to gain insight into the performance of Google Cloud Platform (GCP) services. For example, recently, our BigQuery team did an analysis of easily fixed errors that BigQuery users commonly encounter. We discovered that many users received rateLimitExceeded and quotaExceeded. We found that using logs-based metrics can improve the user experience by adding visibility to BigQuery administrators. You can read more about this analysis here.

We hope these examples help you to leverage the power of Stackdriver Logging analytics. If you have questions, feedback or want to share how you’ve made logs based metrics work for you, we’d love to hear from you.

Now, you can monitor, debug and log your Ruby apps with Stackdriver



The Google Cloud Ruby team continues to expand our support for Ruby apps running on Google Cloud Platform (GCP). Case in point, we’ve released beta gems for Stackdriver, our monitoring, logging and diagnostics suite. Now you can use Stackdriver in your Ruby projects not only on GCP but also on AWS and in your own data center. You can read more about the libraries on GitHub.

Like with all our Ruby libraries, we’re focused on ensuring the Stackdriver libraries make sense to Rubyists and helps them do their jobs more easily. Installation is easy. With Rails, simply add the "stackdriver" gem to your Gemfile and the entire suite is automatically loaded for you. With other Rack-based web frameworks like Sinatra, you require the gem and use the provided middleware.

Stackdriver Debugger is my favorite Stackdriver product. It provides live, production debugging without needing to redeploy. Once you’ve included the gem in your application, go to Cloud Console, upload your code (or point the debugger at a repository) and you’ll get snapshots of your running application including variable values and stacktraces. You can even add arbitrary log lines to your running application without having to redeploy it. Better yet, Debugger captures all this information in just one request to minimize the impact on your running application.
Stackdriver Error Reporting is Google Cloud's exception detection and reporting tool. It catches crashes in your application, groups them logically, alerts you to them (with appropriate alerting back-off), and displays them for you neatly in the UI. The UI shows you stacktraces of the errors and links to logs and distributed traces for each crash, and lets you acknowledge errors and link a group of errors to a bug in your bug database so you can keep better track of what is going on. In addition to automatically detecting errors, Stackdriver Error Reporting lets you send errors from your code in just a single line. 
Stackdriver Trace is Google's application performance monitoring and distributed tracing tool. In Rails it automatically shows you the amount of time a request spends hitting the database, rendering the views, and in application logic. It can also show you how a request moves through a microservices architecture and give you detailed reports on latency trends over time. This way, you can answer once and for all "Did the application get slower after the most recent release?"
Stackdriver Logging’s Ruby library was already generally available, and is currently being used by many of Container Engine customers in conjunction with the fluentd logging agent. You can use the logging library even if you don't use Container Engine, since it’s a drop-in replacement for the Ruby and Rails Logger. And when the Stackdriver gem is included in a Rails application, information like request latency is automatically pushed to Stackdriver Logging as well. 
You can find instructions for getting started with the Stackdriver gems on GitHub. The Stackdriver gems are currently in beta, and we’re eager for folks to try them out and give us feedback either in the Ruby channel on the GCP Slack or on GitHub, so we can make the libraries as useful and helpful as possible to the Ruby community.

Customizing Stackdriver Logs for Container Engine with Fluentd


Many Google Cloud Platform (GCP) users are now migrating production workloads to Container Engine, our managed Kubernetes environment.  Container Engine supports Stackdriver logging on GCP by default, which uses Fluentd under the hood to send your logs to Stackdriver. 


You may also want to fully customize your Container Engine cluster’s Stackdriver logs with additional logging filters. If that describes you, check out this tutorial where you’ll learn how you can configure Fluentd in Container Engine to apply additional logging filters prior to sending your logs to Stackdriver.

Using Stackdriver Logging for dedicated game server instances: new tutorial


Capturing logs from dedicated game server instances in a central location can be useful for troubleshooting, keeping track of instance runtimes and machine load, and capturing historical data that occurs during the lifetime of a game.

But collecting and making sense of these logs can be tricky, especially if you are launching the same game in multiple regions, or have limited resources on which to collect the logs themselves.

One possible solution to these problems is to collect your logs in the cloud. Doing this enables you to mine your data with tools that deliver speed and power not possible from an on-premise logging server. Storage and data management is simple in the cloud and not bound by physical hardware. Additionally, you can access cloud logging resources globally. Studios and BI departments across the globe can access the same logging database regardless of physical location, making collaboration for distributed teams significantly easier.


We recently put together a tutorial that shows you how to integrate Stackdriver Logging, our hosted log management and analysis service for data running on Google Cloud Platform (GCP) and AWS, into your own dedicated game server environment. It also offers some key storage strategies, including how to migrate this data to BigQuery and other Google Cloud tools. Check it out, and let us know what other Google Cloud tools you’d like to learn how to use in your game operations. You can reach me on Twitter at @gcpjoe.

Partnering on open source: Managing Google Cloud Platform with Chef


Managing cloud resources is a critical part of the application lifecycle. That’s why today, we released and open sourced a set of comprehensive cookbooks for Chef users to manage Google Cloud Platform (GCP) resources.

Chef is a continuous automation platform powered by an awesome community. Together, Chef and GCP enable you to drive continuous automation across infrastructure, compliance and applications.

The new cookbooks allow you to define an entire GCP infrastructure using Chef recipes. The Chef server then creates the infrastructure, enforces it, and ensures it stays in compliance. The cookbooks are idempotent, meaning you can reapply them when changes are required and still achieve the same result.

The new cookbooks support the following products:



We also released a unified authentication cookbook that provides a single authentication mechanism for all the cookbooks.

These new cookbooks are Chef certified, having passed the Chef engineering team’s rigorous quality and review bar, and are open-source under the Apache-2.0 license on GCP's Github repository.

We tested the cookbooks on CentOS, Debian, Ubuntu, Windows and other operating systems. Refer to the operating system support matrix for compatibility details. The cookbooks work with Chef Client, Chef Server, Chef Solo, Chef Zero, and Chef Automate.

To learn more about these Chef cookbooks, register for the webinar with myself and Chef’s JJ Asghar on 15 October 2017.

Getting started with Chef on GCP

Using these new cookbooks is as easy as following these four steps:
  1. Install the cookbooks.
  2. Get a service account with privileges for the GCP resources that you want to manage and enable the the APIs for each of the GCP services you will use.
  3. Describe your GCP infrastructure in Chef:
    1. Define a gauth_credential resource
    2. Define your GCP infrastructure
  4. Run Chef to apply the recipe.
Now, let’s discuss these steps in more detail.

1. Install the cookbooks

You can find all the GCP cookbooks for Chef on Chef Supermarket. We also provide a “bundle” cookbook that installs every GCP cookbook at once. That way you can choose the granularity of the code you pull into your infrastructure.

Note: These Google cookbooks require neither administrator privileges nor special privileges/scopes on the machines that Chef runs on. You can install the cookbooks either as a regular user on the machine that will execute the recipe, or on your Chef server; the latter option distributes the cookbooks to all clients.

The authentication cookbook requires a few of our gems. You can install them using various methods, including using Chef itself:


chef_gem 'googleauth'
chef_gem 'google-api-client'


For more details on how to install the gems, please visit the authentication cookbook documentation.

Now, you can go ahead and install the Chef cookbooks. Here’s how to install them all with a single command:


knife cookbook site install google-cloud


Or, you can install only the cookbooks for select products:


knife cookbook site install google-gcompute    # Google Compute Engine
knife cookbook site install google-gcontainer  # Google Container Engine
knife cookbook site install google-gdns        # Google Cloud DNS
knife cookbook site install google-gsql        # Google Cloud SQL
knife cookbook site install google-gstorage    # Google Cloud Storage


2. Get your service account credentials and enable APIs

To ensure maximum flexibility and portability, you must authenticate and authorize GCP resources using service account credentials. Using service accounts allows you to restrict the privileges to the minimum necessary to perform the job.

Note: Because service accounts are portable, you don’t need to run Chef inside GCP. Our cookbooks run on any computer with internet access, including other cloud providers. You might, for example, execute deployments from within a CI/CD system pipeline such as Travis or Jenkins, or from your own development machine.

Click here to learn more about service accounts, and how to create and enable them.

Also make sure to enable the the APIs for each of the GCP services you intend to use.

3a. Define your authentication mechanism

Once you have your service account, add the following resource block to your recipe to begin authenticating with it. The resource name, here 'mycred' is referenced in the objects in the credential parameter.


gauth_credential 'mycred' do
  action :serviceaccount
  path '/home/nelsonjr/my_account.json'
  scopes ['https://www.googleapis.com/auth/compute']
end


For further details on how to setup or customize authentication visit the Google Authentication cookbook documentation.

3b. Define your resources

You can manage any resource for which we provide a type. The example below creates an SQL instance and database in Cloud SQL. For the full list of resources that you can manage, please refer to the respective cookbook documentation link or to this aggregate summary view.


gsql_instance ‘my-app-sql-server’ do
  action :create
  project 'google.com:graphite-playground'
  credential 'mycred'
end

gsql_database 'webstore' do
  action :create
  charset 'utf8'
  instance ‘my-app-sql-server’
  project 'google.com:graphite-playground'
  credential 'mycred'
end


Note that the above code has to be described in a recipe within a cookbook. We recommend you have a “profile” wrapper cookbook that describes your infrastructure, and reference the Google cookbooks as a dependency.

4. Apply your recipe

Next, we direct Chef to enforce the recipe in the “profile” cookbook. For example:

$ chef-client -z --runlist ‘recipe[mycloud::myapp]’

In this example, mycloud is the “profile” cookbook, and myapp is the recipe that contains the GCP resource declarations.

Please note that you can apply the recipe from anywhere that Chef can execute recipes (client, server, automation), once or multiple times, or periodically in the background using an agent.

Next steps

Now you're ready to start managing GCP resources with Chef, and start reaping the benefits of cross-cloud configuration management. Our plan is to continue to improve the cookbooks and add support for more Google products. We're also preparing to release the technology used to create these cookbooks as open source. If you have questions about this effort please visit Chef on GCP discussions forum, or reach out to us on chef-on-gcp@google.com.

Announcing Stackdriver Debugger for Node.js



We’ve all been there. The code looked fine on your machine, but now you’re in production and it’s suddenly not working.

Tools like Stackdriver Error Reporting can make it easier to know when something goes wrong — but how do you diagnose the root cause of the issue? That’s where Stackdriver Debugger comes in.
Stackdriver Debugger lets you inspect the state of an application at any code location without using logging statements and without stopping or slowing down your applications. This means users are not impacted during debugging. Using the production debugger, you can capture the local variables and call stack and link it back to a specific line location in your source code. You can use this to analyze your applications’ production state and understand your code’s behavior in production.

What’s more, we’re excited to announce that Stackdriver Debugger for Node.js is now officially in beta. The agent is open source, and available on npm.


Setting up Stackdriver Debugger for Node.js


To get started, first install the @google-cloud/debug-agent npm module in your application:

$ npm install --save @google-cloud/debug-agent

Then, require debugger in the entry point of your application:

require('@google-cloud/debug-agent')
.start({ allowExpressions: true });

Now deploy your application! You’ll need to associate your sources with the application running in production, and you can do this via Cloud Source Repositories, GitHub or by copying sources directly from your desktop.



Using Logpoints 

The passive debugger is just one of the ways you can diagnose issues with your app. You can also add log statements in real time — without needing to re-deploy your application. These are called Stackdriver Debugger Logpoints.

Logpoints let you inject log statements in real time, in your production application, without redeploying your application.
These are just a few of ways you can use Stackdriver Debugger for Node.js in your application. To get started, check out the full setup guide.

We can’t wait to hear what you think. Feel free to reach out to us on Twitter @googlecloud, or request an invite to the Google Cloud Slack community and join the #nodejs channel.

Announcing new Stackdriver Logging features and expanded free logs limits



When we announced the general availability of Google Stackdriver, our integrated monitoring, logging and diagnostics suite for applications running on cloud, we heard lots of enthusiasm from our user community as well as some insightful feedback:
  • Analysis - Logs based metrics are great, but you’d like to be able to extract labels and values from logs, too. 
  • Exports - Love being able to easily export logs, but it’s hard to manage them across dozens or hundreds of projects. 
  • Controls - Aggregating all logs in a single location and exporting them various places is fantastic, but you want control over which logs go into Stackdriver Logging. 
  • Pricing - You want room to grow with Stackdriver without worrying too much about the cost of logging all that data. 
We heard you, which is why today we’re announcing a variety of new updates to Stackdriver, as well as updated pricing to give you the flexibility to scale and grow.

Here’s a little more on what’s new.

Easier analysis with logs-based metrics 

Stackdriver was created with the belief that bringing together multiple signals from logs, metrics, traces and errors can provide greater insight than any single signal. Logs-based metrics are a great example. That’s why the new and improved logs-based metrics are:
  • Faster - We’ve decreased the time from when a log entry arrives until it’s reflected in a logs-based metric from five minutes to under a minute. 
  • Easier to manage - Now you can extract user-defined labels from text in the logs. Instead of creating a new logs based metric for each possible value, you can use a field in the log entry as a label. 
  • More powerful - Extract values from logs and turn them into distribution metrics. This allows you to efficiently represent many data points at each point in time. Stackdriver Monitoring can then visualize these metrics as a heat map or by percentile. 
The example above shows a heat map produced from a distribution metric extracted from a text field in log entries.

Tony Li, Site Reliability Engineer at the New York Times, explains how they use the new user defined labels applied to proxies help them improve reliability and performance from logs.
“With LBMs [Logs based metrics], we can monitor errors that occur across multiple proxies and visualize the frequency based on when they occur to determine regressions or misconfigurations."
The faster pipeline applies to all logs-based metrics, including the already generally available count-based metrics. Distribution metrics and user labels are now available in beta.


Manage logs across your organization with aggregated exports 


Stackdriver Logging gives you the ability to export logs to GCS, PubSub or BigQuery using log sinks. We heard your feedback that managing exports across hundreds or thousands of projects in an organization can sometimes be tedious and error prone. For example, if a security administrator in an organization wanted to export all audit logs to a central project in BigQuery, she would have to set up a log sink at every project and validate that the sink was in place for each new project.

With aggregated exports, administrators of an organization or folder can set up sinks once to be inherited by all the child projects and subfolders. This makes it possible for the security administrator to export all audit logs in her organization to BigQuery with a single command:

gcloud beta logging sinks create my-bq-sink 
bigquery.googleapis.com/projects/my-project/datasets/my_dataset 
--log-filter='logName= "logs/cloudaudit.googleapis.com%2Factivity"' 
--organization=1234 --include-children

Aggregated exports help ensure that logs in future projects will be exported correctly. Since the sink is set at the organization or folder level, it also prevents an individual project owner from turning off a sink.

Control your Stackdriver Logging pipeline with exclusion filters 

All logs sent to the Logging API, whether sent by you or by Google Cloud services, have always gone into Stackdriver Logging where they're searchable in the Logs Viewer. But we heard feedback that users wanted more control over which logs get ingested into Stackdriver Logging, and we listened. To address this, exclusion filters are now in beta. Exclusion filters allow you to reduce costs, improve the signal to noise ratio by reducing chatty logs and manage compliance by blocking logs from a source or matching a pattern from being available in Stackdriver Logging. The new Resource Usage page provides visibility into which resources are sending logs and which are excluded from Stackdriver Logging.


This makes it easy to exclude some or all future logs from a specific resource. In the example above, we’re excluding 99% of successful load balancer logs. We know the choice and freedom to choose any solution is important, which is why all GCP logs are available to you irrespective of the logging exclusion filters, to export to BigQuery, Google Cloud Storage or any third party tool via PubSub. Furthermore, Stackdriver will not charge for this export, although BigQuery, GCS and PubSub charges will apply.

Starting Dec 1, Stackdriver Logging offers 50GB of logs per project per month for free 


You told us you wanted room to grow with Stackdriver without worrying about the cost of logging all that data, which is why on December 1 we’re increasing the free logs allocation to an industry-leading 50GB per project per month. This increase aims to bring the power of Stackdriver Logging search, storage, analysis and alerting capabilities to all our customers.

Want to keep logs beyond the free 50GB/month allocation? You can sign up for the Stackdriver Premium Tier or the logs overage in the Basic Tier. After Dec 1, any additional logs will be charged at a flat rate of $0.50/GB.


Audit logs, still free and now available for 13 months 

We’re also exempting admin activity audit logs from the limits and overage. They’ll be available in Stackdriver in full without any charges. You’ll now be able to keep them for 13 months instead of 30 days.

Continuing the conversation 


We hope this brings the power of Stackdriver Logging search, storage, analysis and alerting capabilities to all our customers. We have many more exciting new features planned, including a time range selector coming in September to make it easier to get visibility into the timespan of search results. We’re always looking for more feedback and suggestions on how to improve Stackdriver Logging. Please keep sending us your requests and feedback.

Interested in more information on these new features?