Tag Archives: Management Tools

Viewing trace spans and request logs in multi-project deployments



Google Cloud Platform (GCP) provides developers and operators with fine-grained billing and resource access management for separate applications through projects. But while isolating application services across projects is important for security and cost allocation, it can make debugging cross-service issues more difficult.

Stackdriver Trace, our tool for analyzing latency data from your applications, can now visualize traces and logs for requests that cross multiple projects, all in a single waterfall chart. This lets you see how requests propagate through services in separate projects and helps to identify sources of poor performance across your entire stack.

To view spans and log entries for cross-project traces, follow the instructions in the Viewing traces across projects documentation. Your projects will need to be part of a single organization, as explained in Best Practices for Enterprise Organizations. To do so, create an organization and then migrate existing projects to it.

Once your projects are in an organization, you’re ready to view multi-project traces. First, select any one of the relevant projects in the GCP Console, and then navigate to the Trace List page and select a trace. You will see spans for all the projects in your organization for which you have “cloudtrace.traces.get” permission. The “Project” label in the span details panel on the right indicates which project the selected span is from.

You can also view log entries associated with the request from all projects that were part of the trace. This requires the “logging.logEntries.list” permission on the associated projects and it requires you to set the LogEntry “trace” field using the format “projects/[PROJECT-ID]/traces/[TRACE-ID]” when you write your logs to Stackdriver Logging. You may also set the LogEntry “span_id” field as the 16-character hexadecimal encoded span ID to associate logs with specific trace spans. See Viewing Trace Details > Log Entries for details.

If you use Google Kubernetes Engine or the Stackdriver Logging Agent via Fluentd, you can set the LogEntry “trace” and “span_id” fields by writing structured logs with the keys of “logging.googleapis.com/trace” and “logging.googleapis.com/span_id”. See Special fields in structured payloads for more information.

To view the associated log entries inline with trace spans, click “Show Logs.”




Automatic association of traces and logs

Here are the GCP languages and environments that support automatically associating traces and log entries:
Now, having applications in multiple projects is no longer a barrier to identifying the sources of poor performance in your stack. Click here to learn more about Stackdriver Trace.

New ways to manage and automate your Stackdriver alerting policies



If your organization uses Google Stackdriver, our hybrid monitoring, logging and diagnostics suite, you’re most likely familiar with Stackdriver alerting. DevOps teams use alerting to monitor and respond to incidents impacting their applications running in the cloud. We’ve received a lot of great feedback about the Stackdriver alerting functionality, notably, the need for a programmatic interface to manage alerting policies and a means of automating them across different cloud projects.

Today, we're pleased to announce the beta release of new endpoints in the Stackdriver Monitoring v3 API to manage alerting policies and notification channels. Now, it’s possible to create, read, write, and manage your Stackdriver alerting policies and notification channels. You can perform these operations using client libraries in one of the supported languages (Java or C#, with more to come later) or by directly invoking the API, which supports both gRPC and HTTP / JSON REST protocols. There's also command line support in the Google Cloud SDK via the gcloud alpha monitoring policies, gcloud alpha monitoring channel-descriptors, and gcloud alpha monitoring channels commands.

Providing programmatic access to alerting policies and notification channels can help automate common tasks such as:
  • Copying policies and notification channels between different projects, for example between test, dev and production 
  • Disabling and later re-enabling policies and notification channels in the event of alerting storms 
  • Utilizing user labels to organize and filter notification channels and policies 
  • Programatically verifying SMS channels as new SMS numbers get added to the team

Organizing policies


If you have multiple alerting policies configured by various teams within a single Google Cloud project, navigating and organizing these policies can be challenging. With the Stackdriver Alerting API, you can add "user labels" to annotate policies with metadata, which then makes it easier to find and navigate these policies. For example, here’s how to list all your policies:

gcloud alpha monitoring policies list

Here’s how to tag a given policy with your team name:

gcloud alpha monitoring policies update \
        "projects/my-project/alertPolicies/12345" \
        --update-user-labels=team=myteamname

You can then easily find policies that have your team name:

gcloud alpha monitoring policies list --filter='user_label.team="myteamname"'

Updating channels


When someone new joins your DevOps team, it can be a very tedious process to update all your policies so that they receive all the relevant notifications. Now, with the Alerting API, you can quickly add your new teammate to all of the alerting policies that your team owns.

First, find the channels that belong to the team member:

gcloud alpha monitoring channels list

If they don't already have a notification channel, you can create one:

gcloud alpha monitoring channels create \
      --display-name="Anastasia Alertmaestro" \
      --type="email" \
      --channel-labels=email_address=aamaestro@alertme.tld

Then, add a notification channel to a given policy:

gcloud alpha monitoring policies update \
     "projects/my-project/alertPolicies/12345" \    
     --add-notification-channels="projects/my-project/notificationChannels/56789"

Combined with the policies list command, adding the notification channel to all of your team's policies is a matter of a simple BASH script, not tons of tedious point-and-click configuration.

Disabling alerts to a given endpoint


If you're in the middle of a pagerstorm and getting endless alerts, it’s easy to disable notifications to a channel without removing that channel from all existing policies:

gcloud alpha monitoring channels update \
    "projects/my-project/notificationChannels/9817323" \
    --enabled=false

Conclusion


To summarize, the alerting policy and notification channel management features in the Monitoring v3 API will help you simplify and automate a number of tasks. We hope that this saves you time, and we look forward to your feedback!

Please send your feedback to google-stackdriver-discussion_AT_googlegroups.com.

How to export logs from Stackdriver Logging: new solution documentation



Stackdriver Logging is broadly integrated with Google Cloud Platform (GCP), offering rich logging information about GCP services and how you use them. The Stackdriver Logging Export functionality allows you to export your logs and use the information to suit your needs.

There are lots of reasons to export your logs: to retain them for long-term storage (months or years) to meet compliance requirements; to run data analytics against the metrics extracted from the logs; or simply to import them into another system. Stackdriver Logging can export to Cloud Storage, BigQuery and Cloud Pub/Sub.

How you set up Logging Export on GCP depends on the complexity of your GCP organization, the types of logs to export and how you want to use the logs.

We recently put together a three-part solution that explores best practices for three common logging export scenarios:
  1. Export to GCS for Compliance Requirements 
  2. Export to BigQuery for Security and Access Analytics
  3. Export to Pub/Sub for 3rd party (Splunk) integration
For each scenario, we provide examples of export requirements, detailed setup steps, best practices and tips on using the exported logs.

We’re always looking for more feedback and suggestions on how to improve Stackdriver Logging. Please keep sending us your requests and feedback.

Introducing Stackdriver APM and Stackdriver Profiler

Distributed tracing, debugging, and profiling for your performance-sensitive applications


Like all developers that care about their users, you’re probably obsessed with how your applications perform and how you can make them faster and more reliable. Monitoring and logging software like Stackdriver Monitoring and Logging provide a first line of defense, alerting you to potential infrastructure or security problems, but what if the performance problem lies deeper than that, in your code?

Here at Google, we’re developers too, and we know that tracking down performance problems in your code can be hard—particularly if the application is live. Today we’re announcing new products that offer the same Application Performance Management (APM) capabilities that we use internally to monitor and tune the performance of our own applications. These tools are powerful, can be used on applications running anywhere, and are priced so that virtually any developer can make use of them.

The foundation of our APM tooling is two existing products, Stackdriver Trace and Debugger, which give you the power to analyze and debug applications while they're running in production, without impacting user experience in any way.

On top of that, we’re introducing Stackdriver Profiler to our APM toolkit, which lets you profile and explore how your code actually executes in production, to optimize performance and reduce cost of computation.

We’re also announcing integrations between Stackdriver Debugger and GitHub Enterprise and GitLab, adding to our existing code mirroring functionality for GitHub, Bitbucket, Google Cloud Repositories, as well as locally-stored source code.

All of these tools work with code and applications that run on any cloud or even on-premises infrastructure, so no matter where you run your application, you now have a consistent, accessible APM toolkit to monitor and manage the performance of your applications.

Introducing Stackdriver Profiler


Production profiling is immensely powerful, and lets you gauge the impact of any function or line of code on your application’s overall performance. If you don’t analyze code execution in production, unexpectedly resource-intensive functions increase the latency and cost of web services every day, without anyone knowing or being able to do anything about it.

At Google, we continuously profile our applications to identify inefficiently written code, and these tools are used every day across the company. Outside of Google, however, these techniques haven’t been widely adopted by service developers, for a few reasons:
  1. While profiling client applications locally can yield useful results, inspecting service execution in development or test environments does not. 
  2. Profiling production service performance through traditional methods can be difficult and risks causing slowdowns for customers. 
  3. Existing production profiling tools can be expensive, and there’s always the option of simply scaling up a poorly performing service with more computing power (for a price).
Stackdriver Profiler addresses all of these concerns:
  1. It analyzes code execution across all environments. 
  2. It runs continually and uses statistical methods to minimize impact on targeted codebases.
  3. It makes it more cost-effective to identify and remediate your performance problems rather than scaling up and increasing your monthly bill. 
Stackdriver Profiler collects data via lightweight sampling-based instrumentation that runs across all of your application’s instances. It then displays this data on a flame chart, presenting the selected metric (CPU time, wall time, RAM used, contention, etc.) for each function on the horizontal axis, with the function call hierarchy on the vertical axis.
Early access customers have used Stackdriver Profiler to improve performance and reduce their costs.
"We used Stackdriver Profiler as part of an effort to improve the scalability of our services. It helped us to pinpoint areas we can optimize and reduce CPU time, which means a lot to us at our scale." 
 Evan Yin, Software Engineer, Snap Inc. 
 "Profiler helped us identify very slow parts of our code which were hidden in the middle of large and complex batch processes. We run hundreds of batches every day, each with different data sets and configurations, which makes it hard to track down performance issues related to client-specific configurations. Stackdriver Profiler was super helpful." 
Nicolas Fonrose, CEO, Teevity 

 Stackdriver Profiler is now in public beta, available for everyone. It supports:

Unearth tricky code problems with Stackdriver Debugger

Stackdriver Debugger provides a familiar breakpoint-style debugging process for production applications, with no negative customer impact.


Additionally, Stackdriver Debugger’s logpoints feature allows you to add log statements to production apps, instantly, without having to redeploy them.
Debugger simplifies root-cause analysis for hard-to-find production code issues. Without Debugger, finding these kinds of problems usually requires manually adding new log statements to application code, redeploying any affected services, analyzing logs to determine what is actually going wrong, and finally, either discovering and fixing the issue or adding additional log statements and starting the cycle all over again. Debugger reduces this iteration cycle to zero.

Stackdriver Debugger is generally available and supports the following languages and platforms:

Reduce latency with Stackdriver Trace


Stackdriver Trace allows you to analyze how customer requests propagate through your application, and is immensely useful for reducing latency and performing root cause analysis. Trace continuously samples requests, automatically captures their propagation and latency, presents the results for display, and finds any latency-related trends. You can also add custom metadata to your traces for deeper analysis.
Trace is based off of Google’s own Dapper, which pioneered the concept of distributed tracing and which we still used every day to make our services faster and more reliable.

We’re also adding multi-project support to Trace in the coming weeks, a long-requested feature that will let you view complete traces across multiple GCP projects at the same time. Expect to hear more about this very soon.

Stackdriver Trace is generally available and offers the following platform and language support:

Get started today with Stackdriver APM


Whether your application is just getting off the ground, or live and in production, using APM tools to monitor and tune its performance can be a game changer. To get started with Stackdriver APM, simply link the appropriate instrumentation library for each tool to your app and start gathering telemetry for analysis. Stackdriver Debugger is currently free, as is the beta of Stackdriver Profiler. Stackdriver Trace includes a large monthly quota of free trace submissions.

To learn more, see the Stackdriver Profiler, Debugger and Trace documentation

Understand your spending at a glance with Google Cloud Billing reports beta



Whether you’re a developer working on a new project, an engineering manager checking your budget or a billing administrator keeping tabs on your monthly spending, you're probably asking yourself questions about your GCP bill such as:
  • Which project cost the most last month? 
  • What’s the trend for my GCP costs? 
  • Which GCP product costs the most?
Today, we’re excited to launch Cloud Billing reports in beta to help you quickly answer these questions, and others like them. Billing reports lets you view your GCP usage costs at a glance as well as discover and analyze trends.

With billing reports you can see data for all the projects linked to a billing account. You can adjust your views to uncover specific trends, including:
  • Costs grouped by project, product or SKU 
  • Different time aggregation including daily and monthly views. You can even view hourly if you select a time range of one week or less. 
  • Costs with and without the application of service credits 
Let’s watch billing reports in action:

Billing reports will be available to all accounts in the coming weeks. Get started by navigating to your account’s billing page in the GCP console and opening the reports tab in the left-hand navigation bar.

You can learn more in the billing reports documentation. If you're interested in creating more visualizations of your billing data you can do so by exporting to BigQuery and visualizing your billing data with Data Studio.

Introducing the ability to connect to Cloud Shell from any terminal



If you develop or administer apps running on Google Cloud Platform (GCP), you’re probably familiar with Cloud Shell, an on-demand interactive shell environment that contains a wide variety of pre-installed developer tools. Up until now, you could only access Cloud Shell from your browser. Today, we're introducing the ability to connect to Cloud Shell directly from your terminal using the gcloud command-line tool.

Starting an SSH session is a single command:

erik@localhost:~$ ls
Desktop
erik@localhost:~$ gcloud alpha cloud-shell ssh
Welcome to Cloud Shell! Type "help" to get started.
erik@cloudshell:~$ ls
server.py  README-cloudshell.txt

You can also use gcloud to copy files between your Cloud Shell and your local machine:

erik@localhost:~$ gcloud alpha cloud-shell scp cloudshell:~/data.txt localhost:~
data.txt                                           100% 1897    28.6KB/s   00:00
erik@localhost:~$
If you're using Mac or Linux, you can even mount your Cloud Shell home directory onto your local file system after installing sshfs. This allows you to edit the files in your Cloud Shell home directory using whatever local tools you want! All the data in your remotely mounted file system is stored on a Persistent Disk, so it's fast, strongly consistent and retained across sessions and regions.

erik@localhost:~$ gcloud alpha cloud-shell get-mount-command ~/my-cloud-shell
sshfs ekuefler@35.197.73.198: /home/ekuefler/my-cloud-shell -p 6000 -oIdentityFile=/home/ekuefler/.ssh/google_compute_engine
erik@localhost:~$ sshfs ekuefler@35.197.73.198: /home/ekuefler/my-cloud-shell -p 6000 -oIdentityFile=/home/ekuefler/.ssh/google_compute_engine
erik@localhost:~$ cd my-cloud-shell
erik@localhost:~$ ls
server.py  README-cloudshell.txt
erik@localhost:~$ vscode server.py

We're sure you'll find plenty of uses for these features, but here are a few to get you started:
  • Use it as a playground — take advantage of the tools and language runtimes installed in Cloud Shell to do quick experiments without having to install software on your machine.
  • Use it as a sandbox — install or run untrusted programs in Cloud Shell without the risk of them damaging your local machine or reading your data, or to avoid polluting your machine with programs you rarely need to run.
  • Use it as a portable development environment — store your files in your Cloud Shell home directory and edit them using your favorite IDEs when you're at your desk, then keep working on the same files later from a Chromebook using the web terminal and editor.
The full documentation for the command-line interface is available here. The cloud-shell command group is currently in alpha, so we're still making changes to it and welcome your feedback and suggestions via the feedback link at the bottom of the documentation page.

Best practices for working with Google Cloud Audit Logging



As an auditor, you probably spend a lot of time reviewing logs. Google Cloud Audit Logging is an integral part of the Google Stackdriver suite of products, and understanding how it works and how to use it is a key skill you need to implement an auditing approach for systems deployed on Google Cloud Platform (GCP). In this post, we’ll discuss the key functionality of Cloud Audit Logging and call out some best practices.

The first thing to know about Cloud Audit Logging is that each project consists of two log streams: admin activity and data access. GCP services generate these logs to help you answer the question of "who did what, where, and when?" within your GCP projects. Further, these logs are distinct from your application logs.

Admin activity logs contain log entries for API calls or other administrative actions that modify the configuration or metadata of resources. Admin activity logs are always enabled. There's no charge for admin activity audit logs, and they're retained for 13 months/400 days.

Data access logs, on the other hand, record API calls that create, modify or read user-provided data. Data access audit logs are disabled by default because they can grow to be quite large.

For your reference, here’s the full list of GCP services that produce audit logs.


Configure and view audit logs


Getting started with Cloud Audit Logging is simple. Some services are on by default, and others are just a few clicks away from being operational. Here’s how to set up, configure and use various Cloud Audit Logging capabilities.

Configuring audit log collection 

Admin activity logs are enabled by default; you don’t need to do anything to start collecting them. With the exception of BigQuery, however, data Access audit logs are disabled by default. Follow the guidance detailed in Configuring Data Access Logs to enable them.

One best practice for data access logs is to use a test project to validate the configuration for your data access audit collection before you propagate it to developer and production projects. If you configure your IAM controls incorrectly, your projects may become inaccessible.

Viewing audit logs 

You can view audit logs from two places in the GCP Console: via the activity feed, which provides summary entries, and via the Stackdriver Logs viewer page, which gives full entries.

Permissions

You should consider access to audit log data as sensitive and configure appropriate access controls. You can do this by using IAM roles to apply access controls to logs.

To view logs, you need to grant the IAM role logging.viewer (Logs Viewer) for the admin activity logs, and logging.privateLogViewer (Private Logs viewer) for the data access logs.

When configuring roles for Cloud Audit Logging, this how to guide describes some typical scenarios and provides guidance on configuring IAM policies that address the need to control access to audit logs. One best practice is to ensure that you’ve applied the appropriate IAM controls, to restrict who can access the audit logs.

Viewing the activity feed

You can see a high-level overview of all your audit logs on the Cloud Console Activity page. Click on any entry to display a detailed view of that event, as shown below.

By default, this feed does not display data access logs. To enable them, go to the Filter configuration panel and select the “Data Access” field under Categories. (Please note, you also need to have the Private Logs Viewer IAM permission in order to see data access logs).

Viewing audit logs via the Stackdriver Logs viewer 

You can view detailed log entries from the audit logs in the Stackdriver Logs Viewer. With Logs Viewer, you can filter or perform free text search on the logs, as well as select logs by resource type and log name (“activity” for the admin activity logs and “data_access” for the data access logs).

The example below displays some log entries in their JSON format, and highlights a few important fields.

Filtering Audit Logs 

Stackdriver provides both basic and advanced logs filters. Basic log filters allows you to filter the results displayed in the feed by user, resource type and date/time.

An advanced logs filter is a Boolean expression that specifies a subset of all the log entries in your project. You can use to it choose the log entries:
  • from specific logs or log services 
  • within a given time range
  • that satisfy conditions on metadata or user-defined fields 
  • that represent a sampling percentage of all log entries 
The following filter shows a filter on all calls made to the Cloud IAM API that calls the SetIamPolicy method.

resource.type="project"
logName="projects/a-project-id-here/logs/cloudaudit.googleapis.com%2Factivity"
protoPayload.methodName="SetIamPolicy"

Below is a snippet of the log entry that shows that the SetIamPolicy call was made to grant the BigQuery dataviewer IAM role to Alice.

resourceName: "projects/a-project-id-here"  
 response: {
  @type: "type.googleapis.com/google.iam.v1.Policy"   
  bindings: [
   0: {
    members: [
     0: "user:alice@example.com"      
    ]
    role: "roles/bigquery.dataViewer"     
   }

Exporting logs

Log entries are held in Stackdriver Logging for a limited time known as the retention period. After that, the entries are deleted. To keep log entries longer, you need to export them outside of Stackdriver Logging by configuring log sinks.

A sink includes a destination and a filter that selects the log entries to export, and consists of the following properties:
  • Sink identifier: A name for the sink 
  • Parent resource: The resource in which you create the sink. This can be a project, folder, billing account, or an organization 
  • Logs filter: Selects which log entries to export through this sink, giving you the flexibility to export all logs or specific logs 
  • Destination: A single place to send the log entries matching your filter. Stackdriver Logging supports three destinations: Google Cloud Storage buckets, BigQuery datasets, and Cloud Pub/Sub topics. 
  • Writer identity: A service account that has been granted permissions to write to the destination.
You need to configure log sinks before you can receive any logs, and you can’t retroactively export logs that were written before the sink was created.

Another feature for working with logs is Aggregated Exports, which allows you to set up a sink at the Cloud IAM organization or folder level, and export logs from all the projects inside the organization or folder. For example, the following gcloud command sends all admin activity logs from your entire organization to a single BigQuery sink:

gcloud logging sinks create my-bq-sink 
bigquery.googleapis.com/projects/my-project/datasets/my_dataset 
--log-filter='logName: "logs/cloudaudit.googleapis.com%2Factivity"' 
--organization=1234 --include-children

Be aware that an aggregated export sink sometimes exports very large numbers of log entries. When designing your aggregated export sink to export the data you need to store, here are some best practices to keep in mind:

  • Ensure that logs are exported for longer term retention 
  • Ensure that appropriate IAM controls are set against the export sink destination 
  • Design aggregated exports for your organization to filter and export the data that will be useful for future analysis 
  • Configure log sinks before you start receiving logs 
  • Follow the best practices for common logging export scenarios 

Managing exclusions



Stackdriver Logging provides exclusion filters to let you completely exclude certain log messages for a specific product or messages that match a certain query. You can also choose to sample certain messages so that only a percentage of the messages appear in Stackdriver Logs Viewer. Excluded log entries do not count against the Stackdriver Logging logs allotment provided to projects.

It’s also possible to export log entries before they're excluded. For more information, see Exporting Logs. Excluding this noise will not only make it easier to review the logs but will also allow you to minimize any charges for logs over your monthly allotment.

Best practices:

  • Ensure you're using exclusion filters to exclude logging data that will not be useful. For example, you shouldn’t need to log data access logs in development projects. Storing data access logs is a paid service (see our log allotment and coverage charges), so recording superfluous data incurs unnecessary overhead


Cloud Audit Logging best practices, recapped

Cloud Audit Logging is a powerful tool that can help you manage and troubleshoot your GCP environment, as well as demonstrate compliance. As you start to set up your logging environment, here are some best practices to keep in mind:

  • Use a test project to validate the configuration of your data-access audit collection before propagating to developer and production projects 
  • Be sure you’ve applied appropriate IAM controls to restrict who can access the audit logs 
  • Determine whether you need to export logs for longer-term retention 
  • Set appropriate IAM controls against the export sink destination 
  • Design aggregated exports on which your organization can filter and export the data for future analysis 
  • Configure log sinks before you start receiving logs 
  • Follow the best practices for common logging export scenarios 
  • Make sure to use exclusion filters to exclude logging data that isn’t useful.

We hope you find these best practices helpful when setting up your audit logging configuration. Please leave a comment if you have any best practice tips of your own.

Announcing new Stackdriver pricing — visibility for less



Today we're introducing simplified pricing for Stackdriver Monitoring and Logging, and bringing advanced functionality that was limited to a premium pricing tier to all Stackdriver users.

Starting June 30, 2018, you get the advanced alerting and notification options you need to monitor your cloud applications, as well as the flexibility to create monitoring dashboards and alerting policies—without having to opt-in to premium pricing.

Stackdriver Monitoring


Stackdriver Monitoring provides visibility into the performance, uptime and overall health of cloud-powered applications. A hybrid service, Stackdriver Monitoring integrates with GCP, AWS and a variety of common application components.

Highlights of the new Stackdriver Monitoring pricing model include:

  • Flexible pay-as-you-go pricing model that optimizes your spend—pay only for the monitoring data you send, not by the number of resources you have in your projects.
  • Permanent free allocation replaces free trials — all GCP metrics and the first 150 MB of non-GCP metrics per month are available at no cost. 
  • Automatic volume-based discounts — for non-GCP metrics including agent metrics, AWS metrics, logs based metrics, and custom metrics, this volume-based pricing of $.258 down to $.061 per MB ingested represents an up to 80% discount over previously announced prices.


Stackdriver Logging


The key to a well-managed application is to retain meaningful quantities of logging data. Stackdriver Logging allows you to store, search, analyze, monitor and alert on log data and events from GCP, AWS, or ingest custom log data from any source. Beginning today, we’re increasing the retention of logs from seven days to 30 days for all users regardless of tier. In addition, we’re delaying enforcement of log pricing until June 30 from our previously announced date of March 31.

The pricing model for logs is:

  • 50 GB per month free allocation of logs ingested
  • Logs over the free allocation are billed based on volume ingested at $.50 per GB
  • Stackdriver Monitoring and Logging are priced independently


In order to help you control costs, we also provide exclusion filters that enable you to pay only for the logs you want to keep—or even to turn off log ingestion to Stackdriver completely while still allowing logs to be exported to GCS, PubSub or BigQuery.

Here at Google Cloud, we believe that monitoring, logging and performance management are the foundation of any well-managed application—in our cloud, on another cloud, or on-premises. We hope that this new pricing model will enable you to use the Stackdriver family of tools widely and freely. Thank you for your continued feedback—it helps us make our products better. To learn more about Stackdriver, check out our documentation or join in the conversation in our discussion group.

Queue-based scaling made easy with new Stackdriver per-group metrics



Even if you use managed instance groups in your Compute Engine environment, you know that scaling worker VM instances efficiently on a queue of jobs is not a trivial exercise. Sometimes the queue is empty and you want zero workers so that you’re not wasting money and resources. Other times the queue fills up quickly, is bursting at the seams, and you need all the workers you can get. Still other times, there’s a steady flow of work that you want to process at a consistent pace.

To help with these challenges, we're announcing per-group metrics scaling for managed instance groups, which lets you create a simple queue-based scaling system to address all of these scenarios. The feature relies on allowing managed instance groups to scale on previously unsupported Stackdriver monitoring metrics, such as the amount of work in a Pub/Sub queue.

This is a big improvement over the prior state of affairs. Before per-group metrics scaling, your best options were either to have a statically sized worker pool waiting around for work, or to write custom code to monitor the jobs in a queue, then manually scale worker pools up and down based on the current amount of work.

Using per-group metrics scaling 


Let’s work through an example of how you can use per-group scaling in managed instance groups. Consider this simple setup. You receive data jobs that you want to process as they come in. Once started, a job can be processed in a couple of minutes, but the jobs arrive in unpredictable bursts. When a new data job appears, a Cloud Pub/Sub message is created and sent, and as these messages build up, the number of unprocessed messages in the Pub/Sub queue is exported as a Stackdriver monitoring metric. We’ll use this metric to drive the number of workers, which in turn pulls the Pub/Sub messages, processes the data and reduces the length of the Pub/Sub queue.

To do this, start by creating a managed instance group with autoscaling enabled. For this example, we assume that you’ve already configured the Pub/Sub queue and that you have an instance template with your worker image ready to go.

Set “Autoscale on” to “Stackdriver monitoring metric” and “Metric export scope” to “Single time series per group.” This is the setting that configures the managed instance group to scale on a metric that's independent of individual instances. Unlike typical autoscaling metrics such as average CPU utilization, the length of a queue is independent from the instances in the managed instance group.

Set the metric identifier to be the number of undelivered Pub/Sub messages, filtered by your specific subscription name. This allows the autoscaler to find the correct Pub/Sub queue to scale on.
Now the managed instance group is connected to the correct metrics, and it’s time to set up how quickly the group scales up and down. Set the scaling policy to “Single instance assignment” of ‘2’ to indicate that for every two unprocessed messages in the queue, the managed instance group should have one worker instance. Finally, set the maximum and minimum size of the group. We want the group to scale to zero when there's no work, so set the minimum to "0" and the maximum to whatever makes sense for your workload. For this example, we’ll go with 20.
You can also configure per-group metric scaling programmatically. Here’s the same command written using the Google Cloud SDK CLI tool

gcloud beta compute instance-groups managed set-autoscaling \
    my-queue-based-scaling-group --zone us-central1-b --project my-project \
    --min-num-replicas 0 \
    --max-num-replicas 20 \
    --update-stackdriver-metric \
    pubsub.googleapis.com/subscription/num_undelivered_messages \
    --stackdriver-metric-single-instance-assignment 2 \
    --stackdriver-metric-filter \
    'resource.type = pubsub_subscription AND resource.label.subscription_id = "MY_SUBSCRIPTION"'

That’s it! Now, when messages arrive in the Pub/Sub queue, the managed instance group scales up, and as messages get processed, scales back down. When all the messages have been processed and the queue is empty, the managed instance group shuts down all machines in the group so that you don’t pay for resources that you aren’t using.

The diagram below shows how the number of instances in the managed instance group changes over 10 time-steps in response to the length of the Pub/Sub queue.
As work starts accumulating in the queue, the managed instance group scales up at an average rate of one instance per two messages sitting in the queue. Once the amount of queued-up work starts to decrease at time-step 6, the managed instance group scales down to match. Finally, as the queue empties around time-step 9, the managed instance group scales down to zero in time-step 10. It will stay at size zero until more work shows up.

Queue unto others


It’s never been easier to set up automatic scaling for queued jobs using managed instance groups. With a single metric to measure the amount of work and a simple assignment of work per worker, you can set up a responsive scaling system in a couple of clicks.

Of course, this also works for queueing systems other than Pub/Sub queues. Anytime you can express the amount of work as a Stackdriver metric, and assign work per node, you can use per group metrics scaling for managed instance groups to optimize costs.

To get started, check out the documentation for more details.

Introducing Cloud Billing Catalog API: GCP pricing in real time



As your organization uses more cloud resources, effective pricing and cost management are critical. We are delighted to announce the general availability of the Cloud Billing Catalog API, which helps you gain programmatic, real-time access to authoritative Google Cloud Platform (GCP) list pricing.

The Cloud Billing Catalog API joins the Google Cloud Billing API to help you manage your billing experience. The Cloud Billing API allows programmatic management of billing accounts, and allows you to get, list and manage permissions on billing accounts. The Cloud Billing Catalog API builds on that functionality with programmatic access to the rich information you see for GCP SKUs on our website.
Now, with the Cloud Billing Catalog API, you can predict bills, estimate costs and reconcile rates when you're using list GCP pricing. It provides you with a list of all GCP services. In addition, it contains a list of all SKUs within a service including:
  • A human-readable description of the SKU 
  • List pricing for the SKU 
  • Regions where the SKU is available for purchase
  • Categorization data about the SKU 
You can use the Cloud Billing Catalog API with your existing cost management tools, as well as to reconcile list pricing rates when you export billing data to Google BigQuery.

Both the Cloud Billing and Cloud Billing Catalog APIs are available via REST and RPC. To find out more and get started, visits the Cloud Billing Catalog API documentation.