Announcing variable substitution in Stackdriver alerting notifications



When an outage occurs in your cloud application, having fast insight into what’s going on is crucial to resolving the issue quickly. If you use Google Stackdriver, you probably rely on alerting policies to detect these issues and notify you with relevant information. To improve the organization and readability of the information contained in these alerts, we’ve added some new features to make our alerting notifications more descriptive, useful and actionable. We’ll gradually roll out these updates over the next few weeks.

One of these new features is the ability to add variables to your alerting notifications. You can use this to include more metadata in your notifications, for example information on Kubernetes clusters and other resources. You can also use this to construct specific playbook information and links using the variable substitution.

In addition, we’re transitioning to HTML-formatted emails that are easier to read and more clearly organized. We’re also adding the documentation field to Slack notifications, as well as webhook, so teams using these notification methods can utilize these new features.

New variable substitution in alerting policy documentation

You can now include variables in the documentation section of your alerting policies. The contents of this field are also now included in Slack and webhook notifications, in addition to email.

The following syntax:

${varname}


will be formatted by replacing the expression ${varname} with the value of varname. We support only simple variable substitutions; more complex expressions, for example ${varname1 + varname2}, are not. We also support the use of $$ as an escape sequence (so that the literal text "${" may be written using "$${").

Variable Meaning
condition.name The REST resource name of the condition (e.g. "projects/foo/alertPolicies/12345/conditions/5678")
condition.display_name The display name for the triggering condition
metadata.user_label.key The value of the metadata label "key" (replace "key" appropriately)
metric.type The metric (e.g. "compute.googleapis.com/instance/cpu/utilization")
metric.display_name The display name associated with this metric type
metric.label.key The value of the metric label "key" (replace "key" appropriately)
policy.user_label.key The value of the user label "key" (replace "key" appropriately)
policy.name The REST resource name of the policy (e.g. "projects/foo/alertPolicies/12345")
policy.display_name The display name associated with the alerting policy
project The project ID of the Stackdriver host account
resource.project The project ID of the monitored resource of the alerting policy.
resource.type The type of the resource (e.g. "gce_instance")
resource.display_name The display name of the resource
resource.label.key The value of the resource label "key" (replace "key" appropriately)


Note: You can only set policy user labels via the Monitoring API.

@mentions for Slack

Slack notifications now include the alerting policy documentation. This means that you can include customized Slack formatting and control sequences for your alerts. For the various options, please refer to the Slack documentation.

One useful feature is linking to a user. So for example, including this line in the documentation field

@backendoncall policy ${policy.display_name} triggered an incident


notifies the user backend-oncall in addition to sending the message to the relevant Slack channel that was described in the policy’s notification options.

Notification examples

Now, when you look at a Stackdriver notification, all notification methods (with the exception of SMS) include the following fields:

  • Incident ID/link: the incident that triggered the notification along with a link to the incident page 
  • Policy name: the name of the configured alerting policy
  • Condition name: the name of the alerting policy condition that is in violation Email:

Email:


Slack:


Webhook:


{  
   "incident":{  
      "incident_id":"0.kmttg2it8kr0",
      "resource_id":"",
      "resource_name":"totally-new cassweb1",
      "started_at":1514931579,
      "policy_name":"Backend processing utilization too high",
      "condition_name":"Metric Threshold on Instance (GCE) cassweb1",
      "url":"https://app.google.stackdriver.com/incidents/0.kmttg2it8kr0?project=tot
ally-new",
      "documentation":{  
         "content":"CPU utilization sample. This might affect our backend
processing.\u000AFollowing playbook here: https://my.sample.playbook/cassweb1",
         "mime_type":"text/markdown"
      },
      "state":"open",
      "ended_at":null,
      "summary":"CPU utilization for totally-new cassweb1 is above the threshold of
 0.8 with a value of 0.994."
   },
   "version":"1.2"
}


Next steps

We’ll be rolling out these new features in the coming weeks as part of the regular updating process. There’s no action needed on your part, and the changes will not affect the reliability or latency of your existing alerting notification pipeline. Of course, we encourage you to give meaningful names to your alerting policies and conditions, as well as add a “documentation” section to configured alerting policies to help oncall engineers understand the alert notification when they receive it. And as always, please send us your requests and feedback, and thank you for using Stackdriver!