OpenTsdb official documentation-understanding indicators and time series 07/09 Update SLTechnology News&Howtos

OpenTsdb official documentation-understanding indicators and time series

2025-07-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

OpenTSDB is a time series database. A time series is a series of numerical data points of a specific indicator over a period of time. Each time series consists of a metric quantity plus one or more tags associated with that metric (we will introduce the label a little bit). A pointer is any specific data you want to track at any time (for example, click on the Apache hosts file).

OpenTSDB is also a data mapping system. What OpenTSDB draws is a little different from other systems. We will discuss drawing in more detail below, but for now it is important for OpenTSDB to understand that the basis of any given drawing is a scalar. It uses this metric to find all the time series of the selected time range, aggregate these time series together (for example, accumulate them), and draw the results. The mapping mechanism is so flexible and powerful that you can do much more than that, but now let's talk about the key to time series-indicators (Metric).

in OpenTSDB, the metric is named as a string, like "http.hits". In order to be able to store different values at all locations where the metric exists, you can mark the data with one or more tags when sending the data to the TSD. TSD stores timestamps, values, and labels. When you want to read this data, the TSD retrieval reads all the values of the time range you provide, optionally uses the label filter you provide, aggregates all these values according to the algorithm you want, and draws a graph of the indicator value changing over time.

so far, we have introduced some content. To help you understand how it works, I'll start with a typical example. Suppose you have a bunch of Web servers and you want to track two things: the average number of clicks on the Web server and the average of the system load. Let's define the names of indicators to express them. For average load, we call it "proc.loadavg.lmin" (because on Linux, you can easily get this data by reading / proc/loadavg). For many Web servers, there is a way to request a counter from the Web server that represents the number of clicks on the server since the server was started. This is a convenient counter, which we call the "http.hits" indicator. I chose these two examples for the following two reasons:

We'll look at how easily OpenTSDB handles the two counters (values increase monotonously over time, except through restart/reboot or overflow overflow resets), and how to handle the rise and fall of normal values, such as load averages. One of the advantages of OpenTSDB is that there is no need to calculate the Rate ratio of the counter, it will do this for you. We can also show you how to draw two different proportions of metrics on the same chart, which is a very good way to associate different metrics. First data point

without discussing in detail how the collector sends data to TSD, you can write a collector that sends the current values of these data points for each server to TSD on a regular basis. Therefore, TSD can aggregate data from multiple hosts, and you can mark each value with the "host" label. So, if there is a Web server, such as AMagi BJC, etc., each of them will send this content to TSD on a regular basis:

Put http.hits 1234567890 34877 host=A

Put proc.loadavg.1min 1234567890 1.35 host=A

where "1234567890" is the current era (epoch) time (date +% s) in seconds. The next number is the indicator value at this time. This is data from host A, so the label is labeled host=A. Data from host B will be marked host=B and so on. Over time, you get a bunch of time series stored in OpenTSDB.

# the first drawing point

now, let's review what we talked about at the beginning. Time series is a series of data points of a specific index quantity (and its label) over a period of time. In this example, each host sends two time series to TSD. If you have three boxes (host) to send these two time series, TSD will collect and store 6 time series. Now that we have the data, let's start drawing.

to draw HTTP clicks, you simply go to the UI interface and enter http.hits as the metric name, and then enter the time range. Check the "Rate" check box because this particular metric is a rate counter, and there is a ratio of HTTP clicks to the Web server over time.

Aggregator

The default value of the aggregate function on the UI interface is to aggregate each host's time series by adding each host's time series (sum). This means that TSD will take the metrics of the three time series (host = A _ Magi B and C) and add their values together to provide the total number of clicks for all Web servers at a given time. Note that you don't need to send the data point at the same time, TSD will find it. Therefore, if each of your hosts provides 1000 clicks per second at some point in time, the chart will show 3000. What if you want to display the number of clicks provided by each Web server? There are two ways. If you only care about the average number of services per Web server, simply change the Aggregator method from sum to avg. You can also try something else (maximum, minimum) to see the maximum or minimum value. More aggregate functions are at work (percentiles, percentages, etc.). This is done on a per-interval basis, so if, at some point in time, one of your network servers has a service time of 50 QPS, another server has a service time of 50 QPS, and the other server has a service time of 50 QPS, and the other server has a service time of 100, the Min function value for both points will be 50. In other words, it can't determine which time series is the total minimum, it just tells you the plot of the host. Another way to see the number of clicks that each Web server is serving? This is where we look at the label field.

Downsampling

can specify an interval and method of downsampling, such as 1h-avg or 1d-sum, to reduce the number of data points returned. It is also useful (for example, when using max and min) to find the best and worst-case data points within a given period of time. Downsampling is especially useful for making the drawing phase less intensive and more readable, especially when drawing more data points than screen pixels.

Label filter

in UI, you will see that TSD is populated with one or more "tags", the first of which is host. TSD says here that within this time frame, it sees the data tag with a host tag. You can filter the chart to draw only one value of host. If you fill in An in the host line, only the value of host A that changes over time is drawn. If you want to give a list of hosts to draw, fill in the list of hosts separated by pipe symbols, such as A | B, which will draw two diagrams instead of one, one for An and one for B. Finally, you can also specify the special character *, which means drawing a line for each host.

Add more metrics

Become fancy and beautiful

imagine if the server actually runs two Web servers, such as one for static content and the other for dynamic content. Instead of creating another metric, just use the server instance label http.hits metric. Have the collector send the following:

Put http.hits 1234567890 34877 host=A webserver=static

Put http.hits 1234567890 4357 host=A webserver=dynamic

Put proc.loadavg.1min 1234567890 1.35 host=A

Why would do this instead of creating another metric? So, what if sometimes you care about drawing total HTTP clicks, sometimes you care about static clicks or dynamic clicks, respectively? Using tags, it is easy to implement. With this new tag, when you draw this metric, you will see a webserver tag displayed in the UI. You can leave it blank and merge the two values into one drawing (depending on the aggregator settings), you can see the total number of clicks, or you can execute webserver = to analyze the number of static and dynamic instances passing through the web server. You can specify webserver = and host = * even more deeply to see the full classification.

A wizard for creating metrics

Now, it is not possible to merge the two indicators into a single broken line. This means that you want the metric to be the most possible aggregation point. If you want to drill down into the details of the metrics, use tags.

Labels and indicators

The metric should be a specific thing, such as an "Ethernet packet", but cannot be decomposed into a specific instance of something. In general, you do not want to collect metrics such as net.bytes.eth0,net.bytes.eth2. Collect net.bytes and mark eth0 data points with iface = eth0, and so on. Instead of bothering to create separate "in" and "out" metrics, add the tag direction = in or direction = out. In this way, you can easily view the entire network activity of a given "box" without drawing a large number of metrics. This still gives you the flexibility to drill down, showing only activity on a specific interface, or only activity data in a specific direction.

Counter and Rates

if something is a counter, or something that is born with Rate, do not convert it to Rate before sending it to TSD. There are two main reasons for this. First of all, it's stupid to do your own Rate calculations, reset / overflow handling, etc., because TSD can do it automatically. You also don't have to worry about getting the accuracy of the unit per second calculation based on slightly inaccurate or constantly changing sampling intervals. Second, if there is a loss of data points or more strongholds, if you send the current counter value, the data will not be lost, but the resolution of the data will be slightly lower. The golden rule of TSD is that if the source data is a counter (some counters are output to / proc or SNMP), leave it as it is and don't convert it. If you are writing your own collector (for example, to calculate how often specific error messages appear at the end of the log), do not reset the counter at each sampling interval. Let TSD do this kind of work for you.

Your friend-label

in any environment above a small environment, you may have some clusters or a group of machines doing the same thing. But that will change over time, it doesn't matter. You only need to use tags when sending data to TSD to pass on this cluster information. Add cluster = webserver to all data points sent by each Web server, cluster = db tag to all databases, and so on.

now, when you plot CPU activity for a Web server cluster, you will see all of this aggregated into one drawing. Then, let's assume that you add a web server and even change it from a web server to a database. All you have to do is make sure that the correct label is sent when its role changes, and that the CPU activity of the "box" is now counted in the correct cluster. More importantly, all the historical data are correct! This is the real power of OpenTSDB. Not only will you not lose the resolution of data points over time, as RRD-based systems do, but historical data will not be lost as the "box" shifts. You also don't have to put a lot of cluster or group awareness (awareness) logic into the dashboard.

Accuracy of indicators and labels

The maximum number of tags allowed on a data point is defined by a constant (Const.MAX_NUMTAGS), which is 8 when written. The scalar name, label name, and label value must be alphanumeric characters, "-", ",". " And "/", as enforced by the package private function Tags.validateString.

OpenTsdb official documentation-date and time

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.