Tag Archives: observability

OpenTelemetry’s First Release Candidates

OpenTelemetry has hit another milestone with the tracing specification reaching release candidate status.

With the specification now ready to go, expect to see tracing release candidates of the official APIs and SDKs over the next few weeks, along with updated exporters for Cloud Trace. In the coming months the same will follow for the metrics specification, followed by metrics release candidates of the APIs and SDKs and Cloud Monitoring exporters, followed by the project’s general availability. At this point we’ll switch our default application metrics and distributed tracing instrumentation from OpenCensus to OpenTelemetry.

This is exciting news for Google Cloud customers, as OpenTelemetry will enable even better observability experiences, both with Cloud Monitoring and Cloud Trace, or the third party monitoring and operations tools of your choice.

Originally posted on the on the OpenTelemetry blog.


As we’ve discussed in past announcements, we’re hard at work building OpenTelemetry’s first GA quality release. Today marks another milestone in this journey, with the freezing and first release candidate of the tracing specification.
Tracing Spec Release Candidate

The tracing specification is now considered to be a release candidate (RC) and is frozen, and the OpenTelemetry APIs and SDKs have a stable specification to build their own release candidates against. This means:
  • API, SDK, and Collector release candidates will appear within the next few weeks.
  • No breaking spec changes are allowed between now and the final GA specification, beyond any showstopper (P1) issues that are revealed in the RC period. We don’t expect any of these to appear, but the purpose of the RC period is for us to validate that we have a GA-worthy spec.
  • Some non-breaking changes will be allowed during the RC period. Most of these are clarifications of existing behaviour or are pure editorial updates.
The release candidate sections of the specification include all tracing related dependencies, specifically the following sections: Trace, Baggage, Resource, Context Propagation, Environment Variables, Exporters (for traces). You can view the progress of each OpenTelemetry component’s implementation in the project status matrix.

What’s Coming Next?

Achieving a release candidate of the tracing specification has been the top priority of OpenTelemetry since releasing our beta in March. With this completed, our focus now shifts to tracing release candidates of the APIs, SDKs, Collector, and auto instrumentation components, and producing a release candidate of the metrics specification.

RC Tracing Implementations

Most OpenTelemetry APIs and SDKs are close to completing their tracing RC implementations, and we expect the first wave of these to arrive within the next two weeks. Contributors who are looking to provide instrumentation (for various web frameworks, storage clients, etc.) can start building against release candidate APIs once they arrive. While the APIs may change in response to issues discovered during RC usage and testing (which will result in multiple pre-GA release candidates for these components), these will be extremely constrained.

Several SDKs will have two waves of release candidate milestones: the first will contain functionality from the tracing and context propagation sections of the specification, and the second will include release candidate implementations for baggage, exporters, resources, and environment variables.

Metrics

In parallel to the tracing RC component releases, we will apply the focus that we’ve had on tracing to the metrics specification. Starting this week, we will categorize which work items are required for GA, which can be optionally allowed in GA (non-breaking), and which will be shifted to post-GA. After completing this, we will track our burndown progress, and lock the metrics specification and publish a metrics specification release candidate once all P1 items are complete. Shortly after this, the APIs, SDKs, Collector, and other components will publish release candidates with RC-quality tracing and metrics functionality.

Productionization and GA Readiness Work

Once the metrics specification, SDKs, Collector, and other components reach release candidate status, we will focus on productionization tasks like writing documentation, producing a post-GA versioning strategy, building additional automated tests, etc. Once we are satisfied with each component’s adoptability and reliability, we will announce their general availability.

Overall Timeline

  1. Components (APIs, SDKs, Collector, auto instrumentation, etc.) issue release candidates with RC-quality tracing functionality.
  2. The metrics section of the specification achieves RC quality and is frozen.
  3. Components issue release candidates with RC-quality tracing and metrics functionality.
  4. Once we are satisfied with our metrics + tracing release candidates, OpenTelemetry goes GA.
  5. Logging enters beta, then issues an RC specification, followed by RC-quality logging functionality in each component, followed by a GA for logging.
We will have a better understanding of our GA release timeline in the coming weeks once outstanding work on the metrics specification is fully accounted for.

Tracking a Language’s Progress

As mentioned above, you can view the progress of a particular component (API, SDK, etc.) in the project status matrix. Each component’s implementation has their own timeline, though a core set (the JavaScript, Java, Go, Python, and .Net APIs + SDKs, the Collector, and Java auto instrumentation) are all tracking well. Each component has its own GA burndown board.

FAQ

I want to use OpenTelemetry on my production services; what’s the impact of today’s announcement?

SDKs with release candidate quality tracing support will be available in a few weeks. Release candidates are not recommended for critical production services, however they are functional and are intended to offer APIs that are compatible with their upcoming GA counterparts.

I want to write instrumentation for OpenTelemetry; what’s the impact of today’s announcement?

APIs with release candidate quality tracing support will be available shortly (prior to the SDKs). You can bind against these to produce traces that will be picked up by the OpenTelemetry SDKs or any other implementations that implement the OpenTelemetry APIs.

When will OpenTelemetry offer drop-in replacements for OpenCensus and OpenTracing?

Work is currently underway on bridge APIs that allow OpenTelemetry SDKs to seamlessly replace OpenCensus libraries or OpenTracing implementations. While the delivery date of this functionality is not tied to OpenTelemetry’s GA goals, we expect this to arrive between each API + SDK’s release candidate and GA milestones.

Wrapping Up

Producing a specification release candidate is an important milestone for the OpenTelemetry community, and it took significant effort on the part of our contributors to make this happen. We’d like to thank every person and every organization that was a part of this release, and to recognize that their contributions are laying the groundwork for the project's long term success.

If you haven’t been a part of the OpenTelemetry community but would like to join, now is the perfect time! OpenTelemetry is now in the top three CNCF projects by weekly and cumulative commits, and no matter your level of commitment (ha!) to the project, contributions are always welcome. If you have a particular area that you’re interested in (for example, the Python API + SDK), the best way to get involved is to join the relevant weekly SIG meetings or interact with other contributors on Gitter.

By Morgan McLean, Google Cloud

OpenTelemetry is now beta!

OpenTelemetry and OpenCensus have been a critical part of our goal of making platforms like Kubernetes more observable and more manageable. This has been a multi-year journey for us, from creating OpenCensus and growing it into a core part of major web services’ observability stack, to our announcement of OpenTelemetry last year and the rapid growth of the OpenTelemetry community.

Beta is a big milestone for OpenTelemetry, as developers can now use the SDKs, integrations, and Collector to capture distributed traces and metrics from their applications and send them to backends like Prometheus, Jaeger, Cloud Monitoring, Cloud Trace, and others for analysis. This is a great time to try out OpenTelemetry and get involved in the observability community— whether you’re looking to improve your visibility into production services, giving your users performance data from client libraries that you maintain—or want to join a rapidly-growing open source project!

To learn more, please read our official community announcement, which copied below:

Co-authored by maintainers, community contributors, and members of the OpenTelemetry governance committee.

OpenTelemetry has just begun its first wave of beta releases, starting with the Collector and the Erlang, Go, Java, JavaScript, and Python SDKs, followed by the .Net SDK and Java auto-instrumentation agent. This means that you can begin integrating OpenTelemetry into your applications and client libraries to capture app-level metrics and distributed traces.

If you’re not already familiar with OpenTelemetry, the project provides a single set of language-specific APIs, SDKs, agents, and other components that you can use to collect distributed traces, metrics, and related metadata from your applications. In addition to its core capabilities, much of OpenTelemetry’s utility comes from integrations for HTTP and RPC libraries, storage clients, etc. that allow developers to capture critical observability data from their applications with almost zero effort. After capturing these signals, each OpenTelemetry component can export them to your backends of choice, including Prometheus, Jaeger, Zipkin, Azure Monitor, Dynatrace, Google Cloud Monitoring + Trace, Lightstep, New Relic, and Splunk.

This first beta release includes:
  • APIs and SDKs for Erlang, Go, Java, JavaScript, and Python, which include the interfaces and implementations that you need to define and create distributed traces and metrics, manage sampling and context propagation, etc. The .Net API + SDK will follow shortly.
  • Language-specific API integrations for at least one popular HTTP framework, gRPC, and at least one popular storage client, which can be enabled with one line of code, and will automatically capture relevant traces and metrics and handle context propagation.
  • Language-specific exporters that allow SDKs to send captured traces and metrics to any supported backends.
  • The OpenTelemetry Collector, which can receive data from OpenTelemetry SDKs and other sources, and then export this telemetry to any supported backend.
  • Auto-Instrumentation for Java that captures telemetry from 47 Java libraries and frameworks without requiring any modification to your application.
  • Documentation for each component including getting started guides.
As these and subsequent OpenTelemetry components enter beta (requirements and release plan), we are declaring that they are ready to start integrating with. This means that service developers can begin to include OpenTelemetry in their applications and that maintainers of storage, RPC, etc. clients should start testing the OpenTelemetry APIs to provide better observability of their users.

However, this does come with some caveats:
  • Each OpenTelemetry component will likely undergo several beta releases in the coming weeks — this is simply the first.
  • While functional, beta components have not gone through thorough testing or benchmarking and they are not intended for production workloads.
  • While we aim to avoid any major changes to the OpenTelemetry APIs between beta and GA release candidates, we cannot guarantee that there will not be any changes during this period.
  • Some functionality is still missing from the first beta and will be added in subsequent releases; this is documented in each component’s GitHub repository.
In the coming weeks, you can expect additional beta releases from the first wave of OpenTelemetry components and others. In particular, we expect the API + SDK for .Net and the Java auto-instrumentation agent to be ready soon. Eventually, components will reach a level of maturity and testing where we’ll feel confident in naming them a release candidate (RC), after which we will not make any breaking changes to the APIs for that component.

This beta milestone is a huge accomplishment for the OpenTelemetry community, and every contributor should be proud of the fact that OpenTelemetry is now working and ready to integrate with. This is a great opportunity for the maintainers of client libraries to begin integrating with the OpenTelemetry APIs, for end-users to start integrating it into their services, and for anyone interested in contributing to join our rapidly growing community by joining our mailing lists, Gitter chats, and the monthly community meeting!

By Morgan McLean, Product Manager

OpenTelemetry: The Merger of OpenCensus and OpenTracing

We’ve talked about OpenCensus a lot over the past few years, from the project’s initial announcement, roots at Google and partners (Microsoft, Dynatrace) joining the project, to new functionality that we’re continually adding. The project has grown beyond our expectations and now sports a mature ecosystem with Google, Microsoft, Omnition, Postmates, and Dynatrace making major investments, and a broad base of community contributors.

We recently announced that OpenCensus and OpenTracing are merging into a single project, now called OpenTelemetry, which brings together the best of both projects and has a frictionless migration experience. We’ve made a lot of progress so far: we’ve established a governance committee, a Java prototype API + implementation, workgroups for each language, and an aggressive implementation schedule.

Today we’re highlighting the combined project at the keynote of Kubecon and announcing that OpenTelemetry is now officially part of the Cloud Native Computing Foundation! Full details are available in the CNCF’s official blog post, which we’ve copied below:

A Brief History of OpenTelemetry (So Far)

After many months of planning, discussion, prototyping, more discussion, and more planning, OpenTracing and OpenCensus are merging to form OpenTelemetry, which is now a CNCF sandbox project. The seed governance committee is composed of representatives from Google, Lightstep, Microsoft, and Uber, and more organizations are getting involved every day.

And we couldn't be happier about it – here’s why.

Observability, Outputs, and High-Quality Telemetry

Observability is a fashionable word with some admirably nerdy and academic origins. In control theory, “observability” measures how well we can understand the internals of a given system using only its external outputs. If you’ve ever deployed or operated a modern, microservice-based software application, you have no doubt struggled to understand its performance and behavior, and that’s because those “outputs” are usually meager at best. We can’t understand a complex system if it’s a black box. And the only way to light up those black boxes is with high-quality telemetry: distributed traces, metrics, logs, and more.

So how can we get our hands – and our tools – on precise, low-overhead telemetry from the entirety of a modern software stack? One way would be to carefully instrument every microservice, piece by piece, and layer by layer. This would literally work, it’s also a complete non-starter – we’d spend as much time on the measurement as we would on the software itself! We need telemetry as a built-in feature of our services.

The OpenTelemetry project is designed to make this vision a reality for our industry, but before we describe it in more detail, we should first cover the history and context around OpenTracing and OpenCensus.

OpenTracing and OpenCensus

In practice, there are several flavors (or “verticals” in the diagram) of telemetry data, and then several integration points (or “layers” in the diagram) available for each. Broadly, the cloud-native telemetry landscape is dominated by distributed traces, timeseries metrics, and logs; and end-users typically integrate with a thin instrumentation API or via straightforward structured data formats that describe those traces, metrics, or logs.



For several years now, there has been a well-recognized need for industry-wide collaboration in order to amortize the shared cost of software instrumentation. OpenTracing and OpenCensus have led the way in that effort, and while each project made different architectural choices, the biggest problem with either project has been the fact that there were two of them. And, further, that the two projects weren’t working together and striving for mutual compatibility.

Having two similar-yet-not-identical projects out in the world created confusion and uncertainty for developers, and that made it harder for both efforts to realize their shared mission: built-in, high-quality telemetry for all.

Getting to One Project

If there’s a single thing to understand about OpenTelemetry, it’s that the leadership from OpenTracing and OpenCensus are co-committed to migrating their respective communities to this single and unified initiative. Although all of us have numerous ideas about how we could boil the ocean and start from scratch, we are resisting those impulses and focusing instead on preparing our communities for a successful transition; our priorities for the merger are clear:
  • Straightforward backwards compatibility with both OpenTracing and OpenCensus (via software bridges)
  • Minimizing the time where OpenTelemetry, OpenTracing, and OpenCensus are being co-developed: we plan to put OpenTracing and OpenCensus into “readonly mode” before the end of 2019.
  • And, again, to simplify and standardize the telemetry solutions available to developers.
In many ways, it’s most accurate to think of OpenTelemetry as the next major version of both OpenTracing and OpenCensus. Like any version upgrade, we will try to make it easy for both new and existing end-users, but we recognize that the main benefit to the ecosystem is the consolidation itself – not some specific and shiny new feature – and we are prioritizing our own efforts accordingly.

How you can help

OpenTelemetry’s timeline is an aggressive one. While we have many open-source and vendor-licensed observability solutions providing guidance, we will always want as many end-users involved as possible. The single most valuable thing any end-user can do is also one of the easiest: check out the actual work we’re doing and provide feedback. Via GitHub, Gitter, email, or whatever feels easiest.

Of course we also welcome code contributions to OpenTelemetry itself, code contributions that add OpenTelemetry support to existing software projects, documentation, blog posts, and the rest of it. If you’re interested, you can sign up to join the integration effort by filling in this form.

By Ben Sigelman, co-creator of OpenTracing and member of the OpenTelemetry governing committee, and Morgan McLean, Product Manager for OpenCensus at Google since the project’s inception

OpenMetrics project accepted into CNCF Sandbox

For the past several months, engineers from Google Cloud, Prometheus, and other vendors have been aligning on OpenMetrics, a specification for metrics exposition. Today, the project was formally announced and accepted into the CNCF Sandbox, and we’re currently working on ways to support OpenMetrics in OpenCensus, a set of uniform tracing and stats libraries that work with multiple vendors’ services. This multi-vendor approach works to put architectural choices in the hands of developers.
+
OpenMetrics stems from the stats formats used inside of Prometheus and Google’s Monarch time-series infrastructure, which underpins both Stackdriver and internal monitoring applications. As such, it is designed to be immediately familiar to developers and capable of operating at extreme scale. With additional contributions and review from AppOptics, Cortex, Datadog, InfluxData, Sysdig, and Uber, OpenMetrics has begun the cross-industry collaboration necessary to drive adoption of a new specification.

OpenCensus provides automatic instrumentation, APIs, and exporters for stats and distributed traces across C++, Java, Go, Node.js, Python, PHP, Ruby, and .Net. Each OpenCensus library allows developers to automatically capture distributed traces and key RPC-related statistics from their applications, add custom data, and export telemetry to their back-end of choice. Google has been a key collaborator in defining the OpenMetrics specification, and we’re now focusing on how to best implement this inside of OpenCensus.

“Google has a history of innovation in the metric monitoring space, from its early success with Borgmon, which has been continued in Monarch and Stackdriver. OpenMetrics embodies our understanding of what users need for simple, reliable and scalable monitoring, and shows our commitment to offering standards-based solutions,” said Sumeer Bhola, Lead Engineer on Monarch and Stackdriver at Google.

For more information about OpenMetrics, please visit openmetrics.io. For more information about OpenCensus and how you can quickly enable trace and metrics collection from your application, please visit opencensus.io.

By Morgan McLean, Product Manager for OpenCensus and Stackdriver APM

OpenCensus: A Stats Collection and Distributed Tracing Framework

Today we’re pleased to announce the release of OpenCensus, a vendor-neutral open source library for metric collection and tracing. OpenCensus is built to add minimal overhead and be deployed fleet wide, especially for microservice-based architectures.

The Need for Instrumentation & Observability 

As a startup, often the focus is to get an initial version of the product out the door, rapidly prototype and iterate with customers. Most startups start out with monolithic applications as a simple model-view-controller (MVC) web application. As the customer base, code, and number of engineers increase, they migrate from monolithic architecture to a microservices architecture. A microservices architecture has its advantages, but often makes debugging more challenging as traditional debugging and monitoring tools don’t always work in these environments or are designed for monolithic use cases. When operating multiple microservices with strict service level objectives (SLOs), you need insights into the root cause of reliability and performance problems.

Not having proper instrumentation and observability can result in lost engineering hours, violated SLOs and frustrated customers. Instead, diagnostic data should be collected from across the stack. This data can be used for incident management to identify and debug potential bottlenecks or for system tuning and performance improvement.

OpenCensus

At Google scale, an instrumentation layer with minimal overhead is a requirement. As Google grew, we realized the importance of having a highly efficient tracing and stats instrumentation library that could be deployed fleet wide.

OpenCensus is the open source version of Google’s Census library, written based on years of optimization experience. It aims to make the collection and submission of app metrics and traces easier for developers. It is a vendor neutral, single distribution of libraries that automatically collects traces and metrics from your app, displays them locally, and sends them to analysis tools. OpenCensus currently supports Prometheus, SignalFX, Stackdriver and Zipkin.

Developers can use this powerful, out-of-the box library to instrument microservices and send data to any supported backend. For an Application Performance Management (APM) vendor, OpenCensus provides free instrumentation coverage with minimal work, and affords customers a simple setup experience.

Below are Stackdriver Trace and Monitor screenshots showing traces generated from a demo app, which calls Google’s Cloud Bigtable API and uses OpenCensus.



We’d love to hear your feedback on OpenCensus. Try using it in your app, tell us about your success story, and help by contributing to our existing language-specific libraries, or by creating one for an not-yet-supported language. You can also help us integrate OpenCensus with new APM tools!

We hope you find this as useful as we have. Visit opencensus.io for more information.

By Pritam Shah, Census team