Tag Archives: Management Tools

How to do serverless pixel tracking with GCP



Whether they’re opening a newsletter or visiting a shopping cart page, how users interact with web content is very interesting to publishers. One way to understand user behavior is by using pixels, small 1x1 transparent images embedded into the web property. When loaded, the pixel calls a web server that records the request parameters passed in the URL that can be processed later.

Adding a pixel is easy, but hosting it and processing the request can be challenging for various reasons:
  • You need to set up, manage and monitor your ad servers
  • Users are usually global, which means that you need ad servers around the world
  • User visits are spiky, so pixel servers must scale up to sustain the load and scale down to limit the spend.
Google Cloud Platform (GCP) services such as Container Engine and managed autoscaled instance groups can help with those challenges. But at Google Cloud, we think companies should avoid managing infrastructure whenever possible.

For example, we recently worked with GCP partner and professional services firm DoiT International to build a pixel tracking platform that relieves the administrator from setting up or managing any servers. Instead, this serverless pixel tracking solution leverages managed GCP services, including:
  • Google Cloud Storage: A global or regional object store that offers different options such as Standard, Nearline, Cold with various prices and SLAs depending on your needs. In our case, we used Standard, which offers low millisecond latency
  • Google HTTP(s) Load Balancer: A global anycast IP load balancer service that can scale to millions of QPS with integrated logging. It also can be leveraged by Cloud CDN to prevent useless access to Google Cloud Storage by caching pixels closer to the user in Google edges
  • BigQuery: Google's fully managed, petabyte-scale, low-cost enterprise data warehouse for analytics
  • Stackdriver Logging: A logging system that allows you to store, search, analyze, monitor and alert on log data and events from GCP and Amazon Web Services (AWS). It supports Google load balancers and can export data to Cloud Storage, BigQuery or Pub/Sub
Tracking pixels with these services works as follows:
  1. A client calls a pixel URL that's served directly by Cloud Storage.
  2. A Google Cloud Load Balancer in front of Cloud Storage records the request to Stackdriver Logging, whether there was a cache hit or not.
  3. Stackdriver Logging exports every request to BigQuery as they come in, which acts as a storage and querying engine for ad-hoc analytics that can help business analysts better understand their users.


All those services are fully managed and do not require you to set up any instances or VMs. You can learn more about this solution by:
Going forward, we look forward to building more serverless solutions on top of GCP managed offerings. Let us know in the comments if there’s a solution that you’d like us to build!

Distributed tracing for Go



The Go programming language has emerged as a popular choice for building distributed systems and microservices. But troubleshooting Go-based microservices can be tough if you don’t have the right tooling. Here at Google Cloud, we’re big fans of Go, and we recently added a native Go client library to Stackdriver Trace, our distributed tracing backend to help you unearth (and resolve) difficult performance problems for any Go application, whether it runs on Google Cloud Platform (GCP) or some other cloud.

The case for distributed tracing

Suppose you're trying to troubleshoot a latency problem for a specific page. Suppose your system is made of many independent services and the data on the page is generated through many downstream services. You have no idea which of those services are causing the slowdown. You have no clear understanding of whether it’s a bug, an integration issue, a bottleneck due to poor choice of architecture or poor networking performance.

Solving this problem becomes even more difficult if your services are running as separate processes in a distributed system. We cannot depend on the traditional approaches that help us diagnose monolithic systems. We need to have finer-grained visibility into what’s going on inside each service and how they interact with one another over the lifetime of a user request.

In monolithic systems, it's relatively easy to collect diagnostic data from the building blocks of a program. All modules live within one process and share common resources to report logs, errors and other diagnostics information. Once your system grows beyond a single process and starts to become distributed, it becomes harder to follow a call starting from the front-end web server to all of its back-ends until a response is returned back to the user.
To address this problem, Google developed the distributed tracing system Dapper to instrument and analyze its production services. The Dapper paper has inspired many open source projects, such as Zipkin, and Dapper-style tracing has emerged as an industry-wide standard.

Distributed tracing enabled us to:
  • Instrument and profile application latency in a large system.
  • Track all RPCs within the lifecycle of a user request and see integration issues that are only visible in production.
  • Figure out performance improvements that can be applied to our systems. Many bottlenecks are not obvious before the collection of tracing data.

Tracing concepts

Tracing works on the basic principle of propagating tracing data between services. Each service annotates the trace with additional data and passes the tracing header to other services until the user request is served. Services are responsible for uploading their traces to a tracing backend. Then, the tracing backend puts related latency data together like the pieces of a puzzle. Tracing backends also provide UIs to analyze and visualize traces.

In Dapper-style tracing, each trace is a call tree, beginning with the entry point of a user request and ending with the server’s response, including all RPCs along the way. Each trace consists of small units called spans.
Above, you see a trace tree for a TaskQueue.Stats request. Each row is labelled with the span name. Before the system can serve TaskQueue.Stats, five other RPCs have been made to other services. First, TaskQueue.Auth checks if we're authorized for the request. Then, QueueService is queried for two reports. In the meantime, System.Stats is retrieved from another service. Once reports and system stats are retrieved, the Graphiz service renders a graph. In total, TaskQueue.Stats returns in 581 ms, and we have a good picture of what has happened internally to serve this call. By looking at this trace, maybe we'll learn that rendering is taking more time than we expect.

Each span name should be carefully chosen to represent the work it does. For example, TaskQueue.Stats is easily identified within the system and, as its name implies, reads stats from the TaskQueue service.

Spans can start new spans where a span depends on other spans to be completed. These spans are visualized as children spans of their starter span in a trace tree.

Spans can also be annotated with labels to convey more fine-grained information about a specific request. Request ID, user IDs and RPC parameters are good examples of labels commonly attached to traces. Choose labels by determining what else you want to see in a particular trace tree and what you would like to query from the collected data.

Working with Stackdriver Trace

One of the exciting things about GCP is that customers can use the same services and tools we use daily at Google-scale. We launched Stackdriver Trace to provide a distributing tracing backend for our customers. Stackdriver Trace collects latency data from your applications, lists and visualizes it on Cloud Console, and allows you to analyze your application’s latency profile. Your code doesn’t have to run on GCP to use Stackdriver Trace  we can upload your trace data to our backends even if your production environment doesn’t run on our cloud.

To collect latency data, we recently released the cloud.google.com/go/trace package for Go programmers to instrument their code with marking spans and annotations. Please note that the trace package is still in alpha and we're looking forward to improving it over time. At this stage, please feel free to file bugs and feature requests.

To run this sample, you’ll need Google Application Default Credentials. First, use the gcloud command line tool to get application default credentials if you haven’t already.

Then, import the trace package:
import "cloud.google.com/go/trace"

Create a new trace client with your project ID:
traceClient, err = trace.NewClient(ctx, "project-id")
if err != nil {
 log.Fatal(err)
}

We recommend you have a long-living trace.Client instance. You can create a client once and keep using it until your program terminates.

The sample program makes an outgoing HTTP request. In this example, we attach tracing information to the outgoing HTTP request so that the trace can be propagated to the destination server:
func fetchUsers() ([]*User, error) {
 span := traceClient.NewSpan("/users")
 defer span.Finish()

 // Create the outgoing request, a GET to the users endpoint.
 req, _ := http.NewRequest("GET", "https://userservice.corp/users", nil)

 // Create a new child span to identify the outgoing request,
 // and attach tracing information to the request.
 rspan := span.NewRemoteChild(req)
 defer rspan.Finish()

 res, err := http.DefaultClient.Do(req)
 if err != nil {
  return nil, err
 }

 // Read the body, unmarshal, and return a slice of users.
 // ...
}

The User service extracts the tracing information from the incoming request, and creates and annotates any additional child spans. In this way, the trace of a single request can be propagated between many different systems:

func usersHandler(w http.ResponseWriter, r *http.Request) {
 span := traceClient.SpanFromRequest(r)
 defer span.Finish()

 req, _ := http.NewRequest("GET", "https://meta.service/info", nil)
 child := span.NewRemoteChild(req)
 defer child.Finish()

 // Make the request…
}

Alternatively, you can also use the HTTP utilities to easily add tracing context to outgoing requests via HTTPClient, and extract the spans from incoming requests with HTTPHandler.

var tc *trace.Client // initiate the client
req, _ := http.NewRequest("GET", "https://userservice.corp/users", nil)

res, err := tc.NewHTTPClient(nil).Do(req)
if err != nil {
 // TODO: Handle error.
}

And on the receiving side, you can use our handler wrapper to access the span via the incoming request’s context:

handler := http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
    span := trace.FromContext(r.Context())
    // TODO: Use the span.
})
http.Handle("/foo", tc.HTTPHandler(handler))

A similar utility to enable auto-tracing is also available for gRPC Go clients and servers.

Please note that not all services need to be written in Go  propagation works across all services written in other languages as long as they rely on the Stackdriver header format to propagate the tracing context. See the Stackdriver Trace docs to learn about the header format.


Future work

Even though we currently provide a solution for GCP, our goal is to contribute to the Go ecosystem beyond GCP. There are many groups working on tracing for Go, and there's a lot of work to do to ensure it's aligned. We look forward to working with these groups to make tracing accessible and easy for Go programmers.

One particular problem we want to solve is enabling third-party library authors to provide out-of-the-box tracing without depending on a particular tracing backend. Then, open-source library developers can instrument their code by marking spans and annotating them to be traced by the user's choice of tracing backend. We also want to work on reusable utilities to automatically enable tracing anywhere without requiring Go programmers to significantly modify their code.

We're currently working with a large group of industry experts and examining already-established solutions to understand their requirements and provide a solution that will foster our integrations with tracing backends. With these first-class building blocks and utilities, we believe distributed tracing can be a core and accessible tool to diagnose Go production systems.

Google Cloud Audit Logging now available across the GCP stack



Google Cloud Audit Logging helps you to determine who did what, where and when on Google Cloud Platform (GCP). This fall, Cloud Audit Logging became generally available for a number of products. Today, we’re significantly expanding the set of products integrated with Cloud Audit Logging:
The above integrations are all currently in beta.

We’re also pleased to announce that audit logging for Google Cloud Dataflow, Stackdriver Debugger and Stackdriver Logging is now generally available.

Cloud Audit Logging provides log streams for each integrated product. The primary log stream is the admin activity log that contains entries for actions that modify the service, individual resources or associated metadata. Some services also generate a data access log that contains entries for actions that read metadata as well as API calls that access or modify user-provided data managed by the service. Right now only Google BigQuery generates a data access log, but that will change soon.

Interacting with audit logs in Cloud Console

You can see a high-level overview of all your audit logs on the Cloud Console Activity page. Click on any entry to display a detailed view of that event, as shown below.

By default, data access logs are not displayed in this feed. To enable them from the Filter configuration panel, select the “Data Access” field under Categories. (Please note, you also need to have the Private Logs Viewer IAM permission in order to see data access logs). You can also filter the results displayed in the feed by user, resource type and date/time.

Interacting with audit logs in Stackdriver

You can also interact with the audit logs just like any other log in the Stackdriver Logs Viewer. With Logs Viewer, you can filter or perform free text search on the logs, as well as select logs by resource type and log name (“activity” for the admin activity logs and “data_access” for the data access logs).

Here are some log entries in their JSON format, with a few important fields highlighted.
In addition to viewing your logs, you can also export them to Cloud Storage for long-term archival, to BigQuery for analysis, and/or Google Cloud Pub/Sub for integration with other tools. Check out this tutorial on how to export your BigQuery audit logs back into BigQuery to analyze your BigQuery spending over a specified period of time.
"Google Cloud Audit Logs couldn't be simpler to use; exported to BigQuery it provides us with a powerful way to monitor all our applications from one place.Darren Cibis, Shine Solutions

Partner integrations

We understand that there are many tools for log analysis out there. For that reason, we’ve partnered with companies like Splunk, Netskope, and Tenable Network Security. If you don’t see your preferred provider on our partners page, let us know and we can try to make it happen.

Alerting using Stackdriver logs-based metrics

Stackdriver Logging provides the ability to create logs-based metrics that can be monitored and used to trigger Stackdriver alerting policies. Here’s an example of how to set up your metrics and policies to generate an alert every time an IAM policy is changed.

The first step is to go to the Logs Viewer and create a filter that describes the logs for which you want to be alerted. Be sure that the scope of the filter is set correctly to search the logs corresponding to the resource in which you are interested. In this case, let’s generate an alert whenever a call to SetIamPolicy is made.

Once you're satisfied that the filter captures the correct events, create a logs-based metric by clicking on the "Create Metric" option at the top of the screen.

Now, choose a name and description for the metric and click "Create Metric." You should then receive a confirmation that the metric was saved.
Next, select “Logs-based Metrics” from the side panel. You should see your new metric listed there under “User Defined Metrics.” Click on the dots to the right of your metric and choose "Create alert from metric."

Now, create a condition to trigger an alert if any log entries match the previously specified filter. To do that, set the threshold to "above 0" in order to catch this occurrence. Logs-based metrics count the number of entries seen per minute. With that in mind, set the duration to one minute as the duration specifies how long this per-minute rate needs to be sustained in order to trigger an alert. For example, if the duration were set to five minutes, there would have to be at least one alert per minute for a five-minute period in order to trigger the alert.

Finally, choose “Save Condition” and specify the desired notification mechanisms (e.g., email, SMS, PagerDuty, etc.). You can test the alerting policy by giving yourself a new permission via the IAM console.

Responding to audit logs using Cloud Functions


Cloud Functions is a lightweight, event-based, asynchronous compute solution that allows you to execute small, single-purpose functions in response to events such as specific log entries. Cloud functions are written in JavaScript and execute in a standard Node.js environment. Cloud functions can be triggered by events from Cloud Storage or Cloud Pub/Sub. In this case, we'll trigger cloud functions when logs are exported to a Cloud Pub/Sub topic. Cloud Functions is currently in alpha, please sign up to request enablement for your project.

Let’s look at firewall rules as an example. Whenever a firewall rule is created, modified or deleted, a Compute Engine audit log entry is written. The firewall configuration information is captured in the request field of the audit log entry. The following function inspects the configuration of a new firewall rule and deletes it if that configuration is of concern (in this case, if it opens up any port besides port 22). This function could easily be extended to look at update operations as well.

Copyright 2017 Google Inc.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

'use strict';

exports.processFirewallAuditLogs = (event) => {
  const msg = JSON.parse(Buffer.from(event.data.data, 'base64').toString());
  const logEntry = msg.protoPayload;
  if (logEntry &&
      logEntry.request &&
      logEntry.methodName === 'v1.compute.firewalls.insert') {
    let cancelFirewall = false;
    const allowed = logEntry.request.alloweds;
    if (allowed) {
      for (let key in allowed) {
        const entry = allowed[key];
        for (let port in entry.ports) {
          if (parseInt(entry.ports[port], 10) !== 22) {
            cancelFirewall = true;
            break;
          }
        }
      }
    }
    if (cancelFirewall) {
      const resourceArray = logEntry.resourceName.split('/');
      const resourceName = resourceArray[resourceArray.length - 1];
      const compute = require('@google-cloud/compute')();
      return compute.firewall(resourceName).delete();
    }
  }
  return true;
};

As the function above uses the gcloud Node.js module, be sure to include that as a dependency in the package.json file that accompanies the index.js file specifying your source code:
{
  "name" : "audit-log-monitoring",
  "version" : "1.0.0",
  "description" : "monitor my audit logs",
  "main" : "index.js",
  "dependencies" : {
    "@google-cloud/compute" : "^0.4.1"
  }
}

In the image below, you can see what happened to a new firewall rule (“bad-idea-firewall”) that did not meet the acceptable criteria as determined by the cloud function. It's important to note, that this cloud function is not applied retroactively, so existing firewall rules that allow traffic on ports 80 and 443 are preserved.

This is just one example of many showing how you can leverage the power of Cloud Functions to respond to changes on GCP.


Conclusion


Cloud Audit Logging offers enterprises a simple way to track activity in applications built on top of GCP, and integrate logs with monitoring and logs analysis tools. To learn more and get trained on audit logging as well as the latest in GCP security, sign up for a Google Cloud Next ‘17 technical bootcamp in San Francisco this March.

Google Cloud Platform for data center professionals: what you need to know



At Google Cloud, we love seeing customers migrate to our platform. Companies move to us for a variety of reasons, from low costs to our machine learning offerings. Some of our customers, like Spotify and Evernote, have described the various reasons that motivated them to migrate to Google Cloud.

However, we recognize that a migration of any size can be a challenging project, so today we're happy to announce the first part of a new resource to help our customers as they migrate. Google Cloud Platform for Data Center Professionals is a guide for customers who are looking to move to Google Cloud Platform (GCP) and are coming from non-cloud environments. We cover the basics of running IT  Compute, Networking, Storage, and Management. We've tried to write this from the point of view of someone with minimal cloud experience, so we hope you find this guide a useful starting point.

This is the first part of an ongoing series. We'll add more content over time, to help describe the differences in various aspects of running your company's IT infrastructure.

We hope you find this useful in learning about GCP. Please tell us what you think and what else you would like to add, and be sure to follow along with our free trial when you sign up!

How to enable Google Stackdriver Logging, Monitoring and Error Reporting for .NET apps



A critical part of creating a great cloud application is making sure it runs today, tomorrow and every day thereafter. Google Stackdriver offers industrial-strength logging, monitoring and error reporting tools for Windows and .NET, so that your applications are consistently available. And companies of all sizes, such as Khan Academy and Wix, are already using Stackdriver to simplify ops.

With Stackdriver Logging and Stackdriver Monitoring, Google Cloud Platform (GCP) now has several excellent tools for .NET developers to stay on top of what's happening with their applications: a Logging agent and client library, a Monitoring agent and a Stackdriver Diagnostics library for error reporting. Let's take a look at these new options available for .NET developers deploying and running applications on GCP.

Logging agent


Google Compute Engine virtual machines (VMs) running .NET applications can now automatically collect request and application logs. This is similar to the logging information provided by VMs running in Google App Engine standard and flexible environments. To start logging to Stackdriver, install the Logging agent on your Compute Engine VMs, following these instructions. To confirm things are working, look for a test log entry that reads textPayload: "Successfully sent to Google Cloud Logging API" in the Stackdriver Logs Viewer.

Once the Logging agent is installed in a VM, it starts emitting logs, and you'll have a "log's-eye-view’" of what's happening via auto-generated logs that reflect the events collected by Windows Event Viewer. No matter how many VMs your application requires, the Logs Viewer provides a consolidated view of the Windows logs being generated across your application.

Monitoring agent


Automated logging of warnings and errors from your apps are just the beginning. Monitoring also lets you track specific metrics about your Windows VMs and receive an alert when they cross a predefined threshold. For example, imagine you want to know when a Windows VM's memory usage exceeds 80%. Monitoring agent to the rescue, an optional agent for your Windows VMs that collects CPU and memory utilization, pagefile and volume usage metrics for Monitoring. If the VM is running Microsoft IIS or SQL server, the agent also collects metrics from those services. See the Metrics List page for the full list of metrics it can collect, including metrics from third-party apps, and follow these installation instructions to install it.

Once the Monitoring agent is up and running, it's time to explore the real power of monitoring alerting! You can create a policy to alert you when a specific threshold value is crossed. For example, here's how to create a policy that sends a notification when a VM's CPU utilization stays above 80% for more than 15 minutes:

Step 1. Add a metric threshold condition. From the Monitoring main menu select "Alerting > Create a policy." Click "Add Condition." Select a condition type and appropriate threshold.

Step 2. Complete the details of the alerting policy. Under "Notification" enter an optional email address to receive alerts via email. Add any other details to the optional "Documentation" field. Finally, name the policy and click "Save Policy."
After creating a monitoring policy, you'll see the policy details page along with the status of any incidents:
To monitor web servers, Monitoring has a built-in "Uptime check" alert that continuously pings your VM over HTTP, HTTPS or TCP at a custom interval, helping you ensure that your web server is responding and serving pages as expected.

Here's how to create an Uptime check that pings the webserver at the specified hostname every 5 minutes:
  1. From the Monitoring dashboard click "Create Check" under "Uptime checks."
  2. Enter the details for the new Uptime check including Name, Check Type, Resource Type, Hostname and Path and specify how often to run the Uptime check under the "Check every" field.
  3. Click "Save."
The new Uptime checks page lists the geographic locations from where the checks are being run along with a status indicator:

Logging custom events for .NET Applications


Not only can you monitor resources, but you can also log important events specific to your application. "Google.Cloud.Logging.V2" is a beta .NET client library for Logging that provides an easy way to generate custom event logs using Stackdriver integration with Log4Net.

Step 1: Add the Logging client's Nuget packages to your Visual Studio project.

Right click your solution in Visual Studio and choose "Manage Nuget packages for solution." In the Visual Studio NuGet user interface, check the "Include prerelease" box, search for the package named "Google.Cloud.Logging.V2" and install it. Then install the "Google.Cloud.Logging.Log4Net" package in the same way.

Step 2: Add a Log4Net XML configuration section to your web application's Web.config file containing the following code:

<configuration>
  <configSections>
    <section name="log4net" type="log4net.Config.Log4NetConfigurationSectionHandler, log4net" />
  </configSections>
  <log4net>
   <appender name="CloudLogger" type="Google.Cloud.Logging.Log4Net.GoogleStackdriverAppender,Google.Cloud.Logging.Log4Net">
     <layout type="log4net.Layout.PatternLayout">
       <conversionPattern value="%-4timestamp [%thread] %-5level %logger %ndc - %message" />
      </layout>
      <projectId value="YOUR-PROJECT-ID" />
      <logId value="mySampleLog" />
    </appender>
    <root>
     <level value="ALL" />
     <appender-ref ref="CloudLogger" />
    </root>
  </log4net>



Step 3: Configure Log4net to use Logging by adding the following line of code to your application’s Global.asax.cs file:
log4net.Config.XmlConfigurator.Configure();

. The Application_Start() method in Global.asax.cs should look like this:

  protected void Application_Start()
{
    GlobalConfiguration.Configure(WebApiConfig.Register);

    // Configure log4net to use Stackdriver logging from the XML configuration file.
    log4net.Config.XmlConfigurator.Configure();
}

Step 4: Add this statement to your application code to include the client libraries:
using log4net;

Step 5: To write logs that will appear in the Stackdriver Logs Viewer, add the following code to your application:

// Retrieve a logger for this context.
ILog log = LogManager.GetLogger(typeof(WebApiConfig));

// Log some information to Google Stackdriver Logging.
log.Info("Hello World.");

Once you build and run this code, you'll get log entries that look like this:
See the "How-To" documentation for installing and using the Logging client Nuget package for .NET applications.

Error Reporting for .NET Applications


Even if your VMs are running perfectly, your application may encounter runtime exceptions due to things like unexpected usage patterns. Good news! We recently released the beta Stackdriver Diagnostics ASP.NET NuGet package for Compute Engine VMs running .NET. With it, all exception errors from your application are automatically logged to Error Reporting.

Step 1: Enable the Error Reporting API.

Step 2: Right-click your solution in Visual Studio, choose "Manage Nuget packages for solution."
Check the "Include prerelease" checkbox. Search for the package named "Google.Cloud.Diagnostics.AspNet" and then install the package.

Step 3: Add the library to your application code:
using Google.Cloud.Diagnostics.AspNet;

Step 4: Add the following code to the "Register" method of your .NET web app:
public static void Register(HttpConfiguration config)
public static void Register(HttpConfiguration config)
{
    // Add a catch all for the uncaught exceptions.
    string projectId = "YOUR-PROJECT-ID";
    string serviceName = "NAME-OF-YOUR-SERVICE";
    string version = "VERSION-OF-YOUR-SERVICE";
    // Add a catch all for the uncaught exceptions.
    config.Services.Add(typeof(IExceptionLogger), 
        ErrorReportingExceptionLogger.Create(projectId, serviceName, version));
}


Here's an example of the exceptions you'll see in Error Reporting:

Click on an exception to see its details:
See the "How-To" documentation for installing and using the Stackdriver Diagnostics ASP.NET NuGet package for .NET applications.

Try it out


Now that you know how easy it is to log, monitor and enable error reporting for .NET applications on Google Cloud, go ahead and deploy a .NET application to Google Cloud for yourself. Next install the Logging and Monitoring agents on your VM(s) and add the Stackdriver Diagnostics and Logging client packages to your application. You can rest easier knowing that you're logging exactly what's going on with your application and that you'll be notified whenever something goes bump in the night.

What is Google Cloud Deployment Manager and how to use it



Using Google Cloud Deployment Manager is a great way to manage and automate your cloud environment. By creating a set of declarative templates, Deployment Manager lets you consistently deploy, update and delete resources like Google Compute Engine, Google Container Engine, Google BigQuery, Google Cloud Storage and Google Cloud SQL. As one of the less well known features of Google Cloud Platform (GCP), let’s talk about how to use Deployment Manager.

Deployment Manager uses three types of files:
Using templates is the recommended method of using Deployment Manager, and requires a configuration file as a minimum. The configuration file defines the resources you wish to deploy and their configuration properties such as zone and machine type.

Deployment manager supports a wide array of GCP resources. Here's a complete list of supported resources and associated properties, which you can also retrieve with this gcloud command:

$ gcloud deployment-manager types list


Deployment Manager is often used alongside a version control system into which you can check in the definition of your infrastructure. This approach is commonly referred to as "infrastructure as code" It’s also possible to pass properties to Deployment Manager directly using gcloud command, but that's not a very scalable approach.

Anatomy of a Deployment Manager configuration

To understand how things fit together, let’s look at the set of files that are used to create a simple network with two subnets and a single deployed instance.

The configuration consists of three files:
  • net-config.yaml - configuration file
  • network.jinja - template file
  • instance.jinja - template file
You can use template files as logical units that break down the configuration into smaller and reusable parts. Templates can then be composed into a larger deployment. In this example, network configuration and instance deployment have been broken out into their own templates.


Understanding templates

Templates provide the following benefits and functionality:

  • Composability, making it easier to manage, maintain and reuse the definitions of the cloud resources declared in the templates. In some cases you may not want to recreate the end-to-end configuration as defined in the configuration file. In that case, you can just reuse one or more templates to help ensure consistency in the way in which you create resources.
  • Templates written in your choice of Python or Jinja2. Jinja2 is a simpler but less powerful templating language than Python. It uses the same syntax as YAML but also allows the use of conditionals and loops. Python templates are more powerful and allow you to programmatically generate the contents of your templates.
  • Template variables – an easy way to reuse templates by allowing you to declare the value to be passed to the template in the configuration file. This means that you can change a specific value for each configuration without having to update the template. For example, you may wish to deploy your test instances in a different zone to your production instances. In that case, simply declare within the template a variable that inherits the zone value from the master configuration file.
  • Environment variables, which also help you reuse templates across different projects and deployments. Examples of an environment variable include things like the Project ID or deployment name, rather than resources you want to deploy.
Here’s how to understand the distinction between the template and environment variables. Imagine you have two projects where you wish to deploy identical instances, but to different zones. In this case, name your instances based on the Project ID and Deployment name found from the environment variables, and set the zone through a template variable.

A Sample Deployment Manager configuration

For this example, we’ve decided to keep things simple and use templates written in Jinja2.

The network file

This file creates a network and its subnets whose name and range are passed through from the variable declaration in net-config.yaml, the calling configuration file.
The “for” subnet loop repeats until it has read all the values in the subnets properties. The config file below declares two subnets with the following values:

Subnet name
IP range
web
10.177.0.0/17
data
10.178.128.0/17

The deployment will be deployed into the us-central1 region. You can easily change this by changing the value of the “region” property in the configuration file without having to modify the network template itself.

The instance file

The instance file, in this case "instance.jinja," defines the template for an instance whose machine type, zone and subnet are defined in the top level configuration file’s property values.

The configuration file

This file, called net-config.yaml, is the main configuration file that marshals the templates that we defined above to create a network with a single VM.
To include templates as part of your configuration, use the imports property in the configuration file that calls the template (going forward, the master configuration file). In our example the master configuration file is called net-config.yaml and imports two templates at lines 15 - 17:
The resource network is defined by the imported template network.jinja.
The resource web-instance is defined by the imported template instance.jinja.

Template variables are declared that are passed to each template. In our example, lines 19 - 27 define the network values that are passed through to the network.ninja template.
Lines 28 to 33 define the instance values.
To deploy a configuration, pass the configuration file to Deployment Manager via the gcloud command or the API. Using gcloud command, type the following command:

$ gcloud deployment-manager deployments create net --configuration net-config.yaml

You'll see a message indicating that the deployment has been successful
You can see the deployment from Cloud Console.
Note that the instance is named after the deployment specified in instance.jinja.
The value for the variable “deployment” was passed in via the gcloud command “create net” where “net” is the name of the deployment

You can explore the configuration by looking at the network and Compute Engine menus:
You can delete a deployment from Cloud Console by clicking the delete button or with the following gcloud command:

$ gcloud deployment-manager deployments delete net

You'll be prompted for verification that you want to proceed.

Next steps

Once you understand the basics of Deployment Manager, there’s a lot more you can do. You can take the example code snippets that we walked through here and build more complicated scenarios, for example, implementing a VPN that connects back to your on premises environment. There are also many Deployment Manager example configurations on Github.

Then, go ahead and start thinking about advanced Deployment Manager features such as template modules and schemas. And be sure to let us know how it goes.




Exploring your application’s latency profile using Stackdriver Trace



Google Cloud Platform customers can now analyze changes to their applications’ latency profiles through Google Cloud Console and on their Android devices, with iOS support coming soon.

Using the latency reports feature, developers can:

  • View the latency profiles of their application’s endpoints
  • Compare the latency profile of their application between different times or versions
  • Observe if a report is flagged as having a major or minor latency shift

This functionality, along with the full suite of Stackdriver Trace features on the web-based Cloud Console, is available for all projects hosted on Google App Engine and any projects on Google Compute Engine and Google Container Engine that use the Stackdriver Trace SDKs (currently available for Node.js and Java). The latency reports can be accessed through the Analysis Reports tab within the Stackdriver Trace section of Cloud Console, or from the Trace tab of the Cloud Console mobile app. Links to endpoint-specific reports are found in the analysis report column of the Trace List page and under the Reports heading on individual traces in the Cloud Console.

Here’s an example of what you’ll see in Cloud Console for a project that's capturing trace data:
(click to enlarge)

You’ll observe the following in the mobile app:
Latency reports are automatically generated for the endpoints with the highest traffic in each project; each of these reports compares each endpoint’s current latency profile to the prior week’s.

In the web-based console, selecting New Report allows you to create custom reports to observe the latency profile of a particular endpoint, or to compare the performance of an endpoint between different times and versions (see here for more details).

Some reports are flagged as having major or minor changes, which indicates that the latency distribution across percentiles is substantially different between the two versions or times being compared. These are often worth investigating, as they can represent changes in a service’s underlying performance characteristics.
(click to enlarge)

Each web-based report contains a graph of the endpoint’s latency distribution across percentiles. The auto analysis example above compares the latency profiles of a given endpoint over one week. As indicated by the graph and the “major change” text, the endpoint’s latency distribution has changed significantly over this time period.

The table at the bottom of the report shows that the application’s latency has increased in the 90th percentile and lower, while it has decreased in the higher percentile cases. This distinction is important: a simple comparison of the mean latencies between times A and B shows little change, but the report correctly identifies that the service is now considerably faster for the worst 10% of requests.

Here’s an example of a similar report in the mobile app, with a similar percentile comparison grid:
This feature will be available for the iOS Cloud Console app shortly.

For more information on how to create and understand latency reports, see this page. Let us know what you think about this feature, either by commenting here or through the send feedback button in Cloud Console.

Production debugging the easy way, with Stackdriver Debugger GA



When it comes to cloud-based applications, traditional debugging tools are slow and cumbersome for production systems. When an issue occurs in production, engineers inspect the logs and try to reproduce the problem in a non-production environment. Once they successfully reproduce the problem, they attach a traditional debugger, set breakpoints, step through the code and inspect application state in an attempt to understand the issue. This is often followed up by adding log statements, rebuilding and redeploying code to production and sifting through logs again until the issue's resolved.

Google's been a cloud company for a long time, and over the years, we've built developer tools optimized for cloud development. Today we're happy to announce that one such tool, Stackdriver Debugger, is generally available.

Stackdriver Debugger allows engineers to inspect an application's state, its variables and call stack at any line of code without stopping the application or impacting the customer. Being able to debug production code cuts short the many hours engineers invest in finding and reproducing a bug.

Since our beta launch, we've added a number of new features including support for multiple source repositories, logs integration and dynamic log point insertion.

Stackdriver’s Debug page uses source code from repositories such as Github and Bitbucket or local source to display and take debug snapshots. You can also use the debugger without any source files at all, simply by typing in the filename and line number.

The debug snapshot allows you to examine the call-stack and variables and view the raw logs associated with your Google App Engine projects — all on one page.
Out of the box, Stackdriver Debugger supports the following languages and platforms:

Google App Engine (Standard and Flexible): Java, Python, Node
Google Compute Engine and Google Container Engine: Java, Python, Node (experimental), Go

All of this functionality is backed by a publicly accessible Stackdriver Debugger API with which applications interact with the Google Stackdriver Debugger backends. The API enables you to implement your own agent to capture debug data for your favorite programming language. It also allows you to implement a Stackdriver Debugger UI integrated into your favorite IDE to directly set and view debug snapshots and logpoints. Just for fun, we used the same API to integrate the Stackdriver Debugger into the gcloud debug command line.

We're always looking for feedback and suggestions to improve Stackdriver Debugger. Please send us your requests and feedback. If you're interested in contributing to creating additional agents or extending our existing agents, please connect with the Debugger team.

Google Stackdriver is now generally available for hybrid cloud monitoring, logging and diagnostics



Google Stackdriver is now generally available.

Since its inception, Stackdriver was designed to make ops easier by reducing the burden associated with keeping applications fast, error-free and available in the cloud.

We started with a single pane of glass to monitor and alert on metrics from Google Cloud Platform (GCP), Amazon Web Services1 and common application components such as Tomcat, Nginx, Cassandra and MySQL. We added Stackdriver Logging, Error Reporting, Trace and Debugger to help you get to the root cause of issues quickly. And we introduced a simple pricing model that bundles advanced monitoring and logging into a single low-cost package in Stackdriver Premium. Finally, we migrated the service to the same infrastructure that powers the rest of Google so that you can expect world-class reliability and scalability.

Companies of all sizes are already using Stackdriver to simplify ops. For example:
  • Uber uses Stackdriver Monitoring to monitor Google Compute Engine, Cloud VPN and other aspects of GCP. It uses Stackdriver alerts to notify on-call engineers when issues occur.
  • Khan Academy uses Stackdriver Monitoring dashboards to quickly identify issues within its online learning platform. It troubleshoots issues with our integrated Logging, Error Reporting and Tracing tools.
  • Wix uses Stackdriver Logging and Google BigQuery to analyze large volumes of logs from Compute Engine auto-scaled deployments. They get intelligence on system health state and error rates that provides essential insight for them to run their operations.

If you’d like to learn more about Google Stackdriver, please check out our website or documentation. If you’re running on GCP or Amazon Web Services and want to join us on the journey to easier ops, sign up for a 30-day free trial of Stackdriver Premium today.

Happy Monitoring!



1 "Amazon Web Services" and "AWS" are trademarks of Amazon.com, Inc. or its affiliates in the United States