How to interpret OpenTSDB Design 07/09 Update SLTechnology News&Howtos

How to interpret OpenTSDB Design

2025-07-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

Today, I will show you how to interpret OpenTSDB design. The content of the article is good. Now I would like to share it with you. Friends who feel in need can understand it. I hope it will be helpful to you. Let's read it along with the editor's ideas.

OpenTSDB is an open source database for storing time series data based on HBase. To be exact, it is just an application of HBase, and its processing of time series data can be used as a reference for other systems. The following will carry out some exploration and discussion on the design of the database.

It is based on OpenTSDB's earliest stable version 1.0.0. After the download and deployment is completed, the first thing we need to know is its database Schema, which has two main tables: tsdb-uid and tsdb. The former describes the metadata related to the index (metrics), while the latter stores time series data. First of all, let's take a look at the concept of "metrics". To put it simply, an indicator is a data item that needs to be collected, but only indicators can not fully describe the relevant background information generated by a piece of data. For example, if we want to count the utilization rate of cpu, we can set up a metrics called proc.stat.cpu, if we collect a lot of cpu information from different machines and users. Without identifying a piece of information, we can't tell which data comes from which machine and which user, so we also need to create some "Tag" to identify a piece of data. Strictly speaking, there is no necessary dependency between metrics and tags, just as the data of two different metrics may have host tags indicating which host they come from, but one thing is certain, that is, for a piece of data, it should contain at least one indicator and one label, which is meaningful. Therefore, in the table design of OpenTSDB The "metrics" and "Tag" are stored in the tsdb-uid table in the format: RowKey (self-increasing ID,3 byte array): name:metrics,name:tagk,name:tagv. At the same time, the reverse relationship between them is also stored.

In fact, we can see that for data, indicators to data is an one-to-many parent-son relationship, and labels are also an one-to-many parent-son relationship to data. The design of OpenTSDB here is very typical. In fact, this is also a common "Pattern" in HBase table design: expand the relationship between tables and store data for RowKey with the results of JOIN! Including forward association and reverse association of two types of data! (please refer to figure 1 carefully for understanding)

Let's insert two metrics:proc.stat.cpu and proc.stat.mem, as well as a record: proc.stat.cpu 1297574486 54.2 host=foo type=user to observe the structure of the data table:

The first is the tsdb-uid table:

Figure 1

As can be seen from the records in the table:

1. The first record: rowkey is\ X00 and contains three fields: metrics,tagk,tagv, whose value is the number of all metrics, label signatures, and label values that have been added. This piece of data is generated and maintained by the system. There are two metrics:cpu and mem, two key:host and type, two value:foo and user, so the value of the three data whose rowkey is\ x00 is 2.

two。 In OpenTSDB, each metric, tagk, or tagv is assigned a unique identity called UID when it is created, and together they can create a sequence of UID or TSUID. In OpenTSDB storage, there is a counter starting at 0 for every metric, tagk, or tagv, and each new metric, tagk, or tagv is incremented by 1. When data point writes to TSD, UID is automatically assigned. You can also assign UID manually, as long as auto metric is set to true.

With regard to UID, let's look at another diagram:

Then we look at the tsdb table:

Figure 2

Let's take a look at the rowkey of the record sheet:

Metric UID (a combination of metrics + tags) + data generation time (rounding time) + UID+ tag of tag 1-Key, 1-Vlaue tag of UID+...+ tag N-Key, UID of UID+ tag N-Vlaue

Let's take the picture record as an example, focusing on the processing of time:

1297574486 = 2011-02-13 13:21:26

MWeP = 01001101 01010111 01100101 01010000 = 1297573200 = 201102-13 13:00:00 (intercept hours on the hour)

Competition = 01010000 01101011 = 1286 (the second deviation from the hour on the hour to the recording time, 1286 seconds is 21 minutes 26 seconds)

1297573200 12861297574486

Competition, that is, the number of seconds in an hour is regarded as Column

Some design techniques:

1. Coping strategies for Hot Spot

OpenTSDB deals with typical time-serialized data and is bound to face "hot" issues. With regard to its handling of hot issues, it is specifically mentioned in HBase's official document http://hbase.apache.org/book/rowkey.design.html:

However, the difference is that the timestamp is not in the lead position of the key, and the design assumption is that there are dozens or hundreds (or more) of different metric types. Thus, even with a continual stream of input data with a mix of metric types, the Puts are distributed across various points of regions in the table.

In general, if you use time to do rowkey, you must precede the "hash" field (that is, salted processing). But OpenTSDB does not have a special hash field, which is handled wisely: first, the time field will not be placed at the beginning of the rowkey, and second, the rowkey start position picks its own ideal business field "metrics" instead of the "hash" field.

From the processing of OpenTSDB, we can sum up a point: when dealing with time series data, if there are "ideal" and "natural" fields in the system that play a hashing role, priority should be given to them as the initial component of rowkey, followed by time fields, but if such fields cannot be found, artificial hash fields should be set.

2. The design idea of rowkey

one。 In order to be able to retrieve the data point of a particular metrics,tag name,tag name, it is obvious to program metrics,tag name,tag name into rowkey, but there are two obvious problems with using them directly to make up rowkey:

1. Takes up a lot of storage space (because these values are repeated in a lot of rowkey)

two。 Since the length of each metrics,tag key,tag value is not fixed, it is not convenient to locate them directly by byte offset. (otherwise, specific delimiters need to be used, and in order to avoid parsing errors caused by specific delimiters in the input information, all input delimiters need to be escaped.)

Around a performance indicator, there will be a variety of additional "attributes" (or "tags") to explain and describe it, so the query for indicators is naturally based on these tags or tag values, so the rowkey of an indicator record must include these tags and tag values. However, because the tags and tag values are of variable length, which brings trouble to the design of rowkey, it is necessary to assign a fixed-length ID to these tags and tag values, and use their ID to refer to them in rowkey, so that rowkey can be normalized, which is convenient to intercept the needed "parts" directly from rowkey.

II. The combination of Tall-Narrow and Wide-Flat table design styles

The above is how to interpret the whole content of OpenTSDB design, more content related to how to interpret OpenTSDB design can search the previous articles or browse the following articles to learn ha! I believe the editor will add more knowledge to you. I hope you can support it!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.