OpenTsdb official documents-query filter 04/11 Update SLTechnology News&Howtos

OpenTsdb official documents-query filter

2025-04-11 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

The key function of any database system is to use some form of filtering to obtain a subset of the complete data set. OpenTsdb has provided filtering since version 1.x and extended functionality since 2.2. The filter currently runs on tag values, which means that any metrics and label Key must be specified in exactly the same way that they appear in the database when pulling data.

Sample data

since each filter is described below, the following datasets are used. It consists of a single indicator and defines multiple time series on each label. As an example, only one data point is given at T1.

TS#MetricTagsValue@T11sys.cpu.systemdc=dal host=web0132sys.cpu.systemdc=dal host=web0223sys.cpu.systemdc=dal host=web03104sys.cpu.systemhost=web0115sys.cpu.systemhost=web01 owner=jdoe46sys.cpu.systemdc=lax host=web0187sys.cpu.systemdc=lax host=web024 grouping

grouping, or group by, is the process of combining multiple time series into one using the desired aggregate functions and filters. By default, OpenTSDB groups everything by metrics so that if the query returns 10 time series and uses the sum aggregator, all 10 sequences will be added to a value over time. For more information about how time series aggregate merges, see aggregating.

to avoid grouping and fetching each underlying time series without any aggregation, use the aggregator included in version 2.2. Alternatively, you can disable OpenTSDB2.2 and later grouping based on each filter. Refer to the API documentation for how to do this.

OpenTSDB 1.x-2.1

In the original OpenTSDB version of , there were at most two types of filters available, and they were implicitly configured for grouping. The two operators allowed are as follows:

*: the asterisk (or wildcard, wildcard) returns a separate result for each unique tag value detected. For example, if the label key host has the values of web01 and web02, two groups will issue. One group is web01, the other group is web02. |: the pipe (or literal_or) only returns a separate result for the specified exact tag value. That is, it only matches the time series with a given tag value and aggregates each matching group.

multiple filters can be provided to a query, and the filters are joined using AND to return results that meet the criteria at the same time. These filters are available in version 2.x and later.

Example

The following example of uses the v1 version of the HTTP URI stack, the parameter m of the aggregator, the colon, followed by the indicator and the label filter separated by the equal sign in the brackets.

example 1: http://host:4242/q?start=1h-ago&m=sum:sys.cpu.system{host=web01}

Included time series # Tags aggregation label Value@T11,4,5,6host=web0116

in this case, the aggregate label set will be empty because time series 4 and 5 have different labels from the entire collection.

The actual operation of the new version of API: (inconsistent with the document description)

query criteria:

{"start": "23h-ago", "end": "10h-ago", "queries": [{"metric": "sys.cpu.system", "rate": "false", "aggregator": "sum", "tags": {"host": "web01"}]}

query results:

[{"metric": "sys.cpu.system", "dps": {"1521689251": 8, "1521689113": 3, "1521689167": 1, "1521689211": 4}, "aggregateTags": ["dc"], "tags": {"owner": "jdoe" "host": "web01"}}]

example 2: http://host:4242/q?start=1h-ago&m=sum:sys.cpu.system{host=web01,dc=dal}

Included time series # Tags aggregation label Value@T11dc=dal host=web013

The actual operation of the new version of API: (consistent with the document description)

query criteria:

{"start": "1h-ago", "queries": [{"metric": "sys.cpu.system", "aggregator": "sum", "tags": {"host": "web01", "dc": "dal"}]}

query results:

[{"metric": "sys.cpu.system", "dps": {"1521773430": 3}, "aggregateTags": [], "tags": {"host": "web01", "dc": "dal"}}]

Example 3: http://host:4242/q?start=1h-ago&m=sum:sys.cpu.system{host=*,dc=dal}

Included time series # Tags aggregation label Value@T11dc=dal host=web0132dc=dal host=web0223dc=dal host=web0310

this time we provide * wildcards for host tags and display matching dc tags. This groups the host tags and returns a time series for each unique host tag value, in this case three sequences.

The actual operation of the new version of API: (consistent with the document description)

query criteria:

{"start": "1h-ago", "queries": [{"metric": "sys.cpu.system", "aggregator": "sum", "tags": {"host": "*", "dc": "dal"}]}

query results:

[{"metric": "sys.cpu.system", "dps": {"1521773430": 3}, "aggregateTags": [], "tags": {"host": "web01", "dc": "dal"}}, {"metric": "sys.cpu.system" "dps": {"1521773431": 2}, "aggregateTags": [], "tags": {"host": "web02", "dc": "dal"}}, {"metric": "sys.cpu.system", "dps": {"1521773432": 10} "aggregateTags": [], "tags": {"host": "web03", "dc": "dal"}]

example 4: http://host:4242/q?start=1h-ago&m=sum:sys.cpu.system{dc=dal|lax}

Included time series # Tags aggregation label Value@T11,2,3dc=dalhost156,7dc=laxhost12

is here, and the | operator is only used to match the tag values provided in the dc query. Therefore, TSD will aggregate any time series groups that have these values. The host tag is moved to the list of aggregate tags, each set sequence has one host tag, and the tag has multiple values.

The actual operation of the new version of API: (consistent with the document description)

query criteria:

{"start": "1h-ago", "queries": [{"metric": "sys.cpu.system", "aggregator": "sum", "tags": {"host": "*", "dc": "dal"}]}

query results:

Warning

due to the limitations of these filters, if users write time series like # 1, # 4, and # 5, they may return abnormal results due to aggregation of time series, which may have a common label but with different tags. This problem is resolved in 2.3 and explicit tags.

OpenTSDB 2.2

has added a more flexible filtering framework in OpenTSDB 2.2, allowing you to disable grouping and add filter types such as regular expressions and wildcards. The filtering framework is pluggable to allow attempts to access external systems such as asset management or provisioning systems.

During processing, there can be multiple filters on the same tag Key, such as connecting them with AND. If there are two filters host=literal_or (web01) and host=literal_or (web02), the query will return empty. If the same tag Key contains two or more filters, and one filter has groups enabled and the other does not, then group by will actually be true for all filters on that label Key.

Warning

Some types of filters in may cause queries to be slower than others, especially regexp,wildcard and case-insensitive filters. These filters are processed to create database filters based on UID before pulling data from storage, so using case-sensitive literal_or filters is always faster than regexp because strings can be parsed to UID and sent to the storage system for filtering. Conversely, if you require regular expressions or wildcards filtered by pre, post, or infix, TSD must use the tag key UID to retrieve all rows from the store, then parse the UID to a string for each unique row, and then run the filter on the result. In addition, filters with a large number of literals lists are processed after storage to avoid creating a large number of filters for background storage. This limit defaults to 4096 and can be configured through the tsd.query.filter.expansion_limit parameter.

Show label

starts with version 2.3 and later, and query latency can be greatly reduced by using the explicitTags feature if all the tag values of a given metric. There are two benefits:

For high cardinality metrics, the back end can switch to more efficient queries to get a smaller subset of data from storage. For metrics with different labels, this can be used to avoid aggregating time series that should not be included in the final result.

The display tag creates an underlying storage query that fetches only those rows with a given tag Key. This allows the database to skip irrelevant rows and respond in a shorter time.

example:

example 1: time series contained in http://host:4242/q?start=1h-ago&m=sum:explicit_tags:sys.cpu.system{host=web01} # Tags aggregation label Value@T14host=web011

this solves the problem of inconsistent tags key, so that we can only filter out the time series # 4.

example 2: http://host:4242/q?start=1h-ago&m=sum:explicit_tags:sys.cpu.system{host=*}{dc=*}

Included time series # Tags aggregation label Value@T11,6host=web01dc112,7host=web02dc63host=web03,dc=dal10

this query uses the v2 version URI syntax to avoid grouping the dc tag key in the second set of curly braces. At this point, only time series with both host and dc tag key are filtered, but only grouped according to the value of host. It skips time series # 4 and # 5.

Note:

When uses HBase (0.98 or later) or Bigtable, make sure that tsd.query.enable_fuzzy_filter is enabled in the configuration (enabled by default). It provides a special filter for the back end to skip the rows we need to query instead of traversing every rowkey for regular expression matching comparison.

Note:

uses version 2.4, and TSDB will send multiple get requests to the backend instead of one scan request. At this time, the query time can be reduced by a variety of factors, especially for high cardinality time series. However, filters can only be made up of literal_or.

V2.2 built-in filter

The following list of is the filter built into OpenTSDB. Additional filters can be loaded as plug-ins. Each heading is the type used in the URI or JSON query. When writing an URI query, you use the filter by placing the filter name to the right of the equal sign of the label key and placing the filter value in parentheses. For example, {host=regexp (web [0-9] + .lax.mysite.com)}. For JSON queries, simply use the filter name as the type parameter and the filter value as the filter parameter. For example:

The following example of uses URI syntax:

Literal_or

If uses a literal_or or | pipe character concatenation value list, a case-sensitive result matching time series will be returned. This is a very efficient filter because it parses strings to UID and sends them to the storage layer for pre-filtering. It is similar to SQL's IN predicate.

Example:

Host=literal_or (web01 | web02 | web03), similar to SQL:where host in ('web01','web02','web03') host=literal_or (web01), similar to SQL:where host='webb01'iliteral_or

(note that the official text is ilteral_or, error)

Similar to literal_or but not case-sensitive. Note that it is not as efficient as literal_or, or it has to post-process all rows in storage. (that is, TSDB fetches all rows from the storage for filtering)

Not_literal_or

Like literal_or, it is case-sensitive and returns a time series that does not match the given values list. Because it can be preprocessed through storage.

Not_iliteral_or

has the same filtering effect as not_literal_or, but is not case-sensitive

Wildcard

provides case-sensitive suffix, prefix, infix (infix), and multiple infix (multi-infix) filters. The wildcard is the asterisk "*". If only an asterisk is given, the filter effectively returns any time series containing the tag key (and is an efficient filter that can be preprocessed). In the field field of SQL, it is similar to the LIKE predicate, but with more flexibility.

example:

Host=wildcard (* mysite.com), SQL: where host='%mysite.com'host=wildcard (web*) host=wildcard (web*mysite.com) host=wildcard (webmysite) host=wildcard (*), which is equivalent to the v1 basic group by operator and is very efficient. Iwildcard

is the same as wildcard, but is not case-sensitive.

Regexp

is fetched from the store and filtered using a filter that conforms to the POSIX standard regular expression. This filter uses regular expression operations built into Java. Depending on how the query is used, pay attention to escaping special characters.

example:

Regexp (web.), SQL: where host regexp 'web.'regexp (web [0-9] .mysite.com) load filter

displays loaded filters in OpenTSDB2.2 and later, please call the HTTP API / api/config/filters. It lists the loaded plug-ins as well as instructions and sample usage.

Plug-in

will be listed here as developers add plug-ins.

if you want to develop a plug-in, simply extend the net.opentsdb.query.filter.TagVFilter class, then create a JAR package based on the plug-in documentation and place it in the plug-in directory. When TSD starts, the plug-in is searched and loaded. If an error occurs during execution, TSD will not start and log the exception.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.