OpenTelemetry gains ground as an observability standard

(vs148 / Shutterstock)

Traditionally, organizations that needed application performance management (APM) capabilities turned to closed-source tools and technologies, such as Splunk, New Relic, and Dynatrace. But in the emerging world of observability, vendors are starting to pool their resources and jointly develop standard tools and technologies, such as OpenTelemetry, that will not only deliver a better customer experience, but also lower costs for them. suppliers.

OpenTelemetry is one of the main open source technologies benefiting from this momentum. Created from the merger of the OpenCensus and OpenTracing projects in May 2019, OpenTelemetry sets a standard for how logs, traces, and metrics should be extracted from the servers, infrastructure, and applications that businesses need to monitor. . The OpenTelemetry protocol (OTLP) manages the encoding and transport of this data to observability platforms, such as those offered by Splunk, New Relic, and Dynatrace, where users can consume and analyze the data.

The OpenTelemetry project has the participation of more than 500 developers from 220 companies, including Splunk, Dynatrace, Amazon, Google, No light, Microsoft, and Uber. This makes it the second largest project in the Cloud Native Computing Foundation (CNCF), behind Kubernetes in terms of number of contributors.

Splunk chief product officer Morgan McClean, who co-founded the OpenTelemetry project while working at Google, said the project is long in coming.

“Historically, many vendors in the space… offered their own exclusive agents,” says McClean. “If we go back years, if you use New Relic or Dynatrace, they would have an agent that you would install and install on all of your virtual machines or whatever you use in your environment. And they would capture that data automatically.

While vendors could optimize data collection for their own specific APM and observability applications, this approach introduced certain challenges, both for customers and vendors. “First of all, you were very locked in [the system]”says McClean.” If you use the New Relic agent, you’re going to be stuck on New Relic for a long time because it’s hard to change. Removing one from your system and installing a new one is expensive.

Second, there were gaps in their language coverage. For example, if Dynatrace provided software development kits (SDKs) for Java and .NET, but you wanted to use it with your Python applications, you were out of luck. This is because adding additional language support is expensive, and the vendor needed a critical mass of customers requesting it before they could justify the expense.

“It sounds trivial, but getting this information from an application is actually very difficult,” says McClean. Datanami, “Because you have to integrate with every web framework, every storage client, every language. Every little piece of software that’s in a back-end service – and there are hundreds or thousands of different permutations – you need to have integrations and you need to maintain them all the time.

If the Apache HTTP server has been updated and your APM provider hasn’t taken the time to update the data collection mechanism, sorry, you still don’t stand a chance.

“For a single vendor, it’s sort of impossible to keep building and maintaining all of these integrations, which is why historically you’ve seen very limited support for a small set of languages ​​or a small set of technologies, ”says McClean.

OpenTelemetry Reference Architecture (Source: OpenTelemetry)

The idea behind OpenTelemetry is to expose this complexity to the community and allow the community to collectively bear the burden of developing and maintaining the integration points. This job becomes much easier when there is only one standard that developers can write about. This is ultimately why many providers in the traditional and emerging APM sectors of observability have embarked on the OpenTelemetry train.

“The reason they quickly changed air is because they saw the light and they said, of course we are removing a little from our gap, but at the same time with a lot less effort we can support virtually any software ever written, ”says McClean. “And so that’s the beauty of OpenTelemetry that we have all these different companies, including Splunk, contributing a lot to it. So we can all get that information now.

The OpenTelemetry project itself is made up of several components, including a collection agent and various SDKs or language-specific agents. Currently, OpenTelemetry supports 11 languages ​​including Java, C #, C ++, Go, JavaScript, Rust, Erlang / Elixir and, yes, Python. It supports a host of additional libraries and databases including MySQL, Redis, Django, Kafka, Jetty, Akka, RabbitMQ, Spring, Quarkus, Flask, net / http, gorilla / mux, WSGI, JDBC, and PostgreSQL.

It doesn’t cover the entire IT world, of course. The vastness of the possible number of ways in which software components can be deployed is not negligible. McClean admits that OpenTelemetry is mistaken for the ‘modern stack’ side, that is, it is unlikely to develop an SDK for COBOL that can extract application data from a mainframe system .

But because OpenTelemetry already has such a large base of supported software, it increases the chances that some of it will end up sneaking into that old mainframe, even if a developer didn’t build it deterministically.

“There are millions of permutations of different software and maintaining all of these different connections would be very difficult,” says McClean. “I had a customer call this morning with a large Splunk customer bank and they were impressed because they are using Camel. They said, “Oh look, the developers at Camel have actually adopted the OpenTelemetry APIs and are generating data.” And so no one needed to maintain this integration. No one in the OpenTelemetry community has had to do anything with this.

Splunk, of course, is one of those proprietary APM and observability platforms that have traditionally drawn the ire of open source folks. This is what has fueled the rise of the Elastic open-source community and the large number of copy log data platforms.

But even Splunk has seen the light of day and embraces OpenTelemetry. It is currently not supported in the flagship Splunk product, called Splunk Enterprise. But it’s ongoing, according to McClean.

OpenTelemetry is the primary data collection method used in Splunk Observability Cloud, the SaaS offering resulting from Splunk’s 2019 acquisitions of Omnition and Signal FX. Later this year, Splunk Enterprise and Splunk Enterprise Cloud will adopt OpenTelemetry to capture data Kubernetes, McClean said. But ultimately all data collection in Splunk Enterprise and the cloud version of it will be done through OpenTelemtry.

“It’s really just a question of timing,” says McClean. “We have a lot of momentum behind our universal forwarder, who is our main agent. But eventually you will see all of this replaced by OpenTelemetry.

Not all components of the OpenTelmetry project are in beta. For example, the plotting component was just published as generally available yesterday. But the project is moving quickly.

More information on the future of OpenTelemetry at Splunk will be shared at the vendor next .conf21 conference, which will be held virtually from October 19 to 20.

Related articles:

Who wins in the $ 17 billion AIOps and observability market

Splunk makes a news flurry on .conf20

The real cost of IT operations, the added value of AIOps

Margie D. Carlisle