Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Chinese version of OpentTsdb official document-downsampling

2025-01-30 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/03 Report--

   downsampling (or in signal processing, decimation) is a process that reduces the data sampling rate or resolution. For example, suppose the temperature sensor sends data to the OpenTSDB system every second. If users query data within an hour, they will get 3600 data points, which can be plotted fairly easily. But now, if users ask for a whole week of data, they will get 604800 data points, and suddenly the graphics can become very confusing. Using the drop sampler, multiple data points of a single time series within a time range are aggregated into a single value with a mathematical function in an aligned timestamp. In this way we can reduce the quantity from 604800 to 168s.

The    downsampler requires at least two components:

Time interval (interval)-A time range (or bucket) used to aggregate these values. For example, we can aggregate multiple values for 1 minute or 1 hour or even a whole day. The interval is specified in format, for example, 1 hour for 1 hour or 30 minutes for 30m. Starting with 2. 3, you can now use "all" to reduce all results in the time range to a single value. For example, 0all-sum summarizes all values from the beginning to the end of the query. Note that a numeric value is still required, but it can be zero or any other value. Aggregate function-A mathematical function that determines how to merge values in an interval. Consistent with the aforementioned aggregator.

   examples: the following time series An and B. The data point covers a time range of 70 seconds, with a value every 10 seconds. Suppose we want to reduce it to 30 seconds because the user is looking at a graph with a wider time span. In addition, we use the sum aggregator to group the two sequences into one. We can specify a drop sampler 30s-sum, which creates a 30-second bucket and accumulates all the data points in each bucket. This will provide us with three data points for each sequence:

Time series T0T0+10sT0+20sT0+30sT0+40sT0+50sT0+60sA5510152051A sum downsampling 5 "5" 10 "20    15" 20 "40    1B10520151005B sum downsampling 10" 5 "20" 35    15 "10" 25    5sum aggregation results 55    65    6

   as you can see, for each time series, we generate a standardized interval boundary (every 30 seconds), so we have to merge the values of the series in the timestamp t0minute 30s and t0room60s. Each interval or bucket will contain a data point that contains a bucket timestamp (start), and does not include a bucket timestamp (end), that is, the [start, end) half-open and semi-closed interval. In this case, the first bucket extends from t0 to t0values 29.9999s, merging all values into a new value using the provided aggregator. For example, for sequence A, we sum the values of t0century t0distinct 10s and t0exact 20s to get a new value of 20 at t0. Finally, the query is grouped using sum so that we can accumulate two composite time series. At this point, OpenTSDB always performs packet aggregation after downsampling is performed.

Note:

   for earlier versions of OpenTSDB, the actual timestamp of the new data point will be the average of the timestamp of each data point in the interval range. Starting with version 2.1 and later, the timestamp of each point is aligned with the beginning of the bucket based on the module of the current time and the downsampling interval.

The    downsampling timestamp is normalized based on the rest of the original data point timestamp (difference) divided by the following sampling interval (in milliseconds, that is, modulus). The code in Java is: timestamp-(timestamp% interval_ms). For example, given a timestamp of 1388550980000 or a UTC,1 interval of 04:36:20 in 2014 (equivalent to 3600000 milliseconds), the resulting timestamp will be rounded to 1388548800000. All data points between 4 and 5 UTC will end in 4 AM barrels. If you query data downsampling for a day at an hour interval, you will receive 24 data points (assuming data is available for all 24 hours).

When    uses the "0all -" interval, the start time of the query becomes the timestamp of the result.

   normalization (normalization) is very effective for common queries, such as downsampling a day's data to 1 minute or 1 hour. However, if you try to downsample at odd intervals (such as 36 minutes), the timestamp may look a little strange due to the nature of modulus calculations. Given a 36-minute interval and our example above, the interval is 2160000 milliseconds, resulting in a timestamp of 1388549520 or 04:12:00 UTC. All data points between 04:12 and 04:48 will end up in one bucket.

Calendar boundary

   starts with OpenTSDB 2.3, where users can specify a calendar-based downsampling method instead of a quick sampling method. This is more useful for reporting purposes, such as looking at values related to human readable time, such as months, weeks, or days. In addition, downsampling can take into account the time zone and include daylight saving time offset and time zone offset.

For    to use calendar boundaries, please check the interface document you are querying. For example, the V2 version of the URI interface has parameters that specify the specific time zone to use, such as & timezone=Asia/Kabul, and calendar-based downsampling can be enabled such as & m=sum:1dc-sum:my.metric by appending c to the interval unit. For JSON queries, separate fields timezone and useCalendar Boolean identifiers are used at the top level. If no time zone is provided, the calendar uses UTC time.

   downsamples through the calendar, capturing the first interval to 00:00:00 on January 1 in the query year of the specified time zone. From there, the interval bucket is calculated until the end of the query. Each bucket is marked with the timestamp of the beginning of the bucket (inclusive, closed interval) and includes all values until the next bucket starts.

Filling strategy

   downsampling is typically used to align (adjust) timestamps to avoid interpolation when grouping is performed. Because OpenTSDB does not impose constraints on time alignment or when values exist, these constraints must be specified at query time. When using downsampling to perform grouping aggregation, if all sequences lack the value of the expected interval, no data is emitted. For example, if a sequence writes data at intervals from t0 to t0room6m per minute, but for some reason the source fails to write data at t0room3m and only five values will be serialized, the user may want to have six values. In 2.2 and later padding strategies, you can now choose to emit any value at t0times3m, and the user (or application) will see the missing value of a specific timestamp without having to find out which timestamp is missing. As long as the sample bucket is empty, the filling policy simply emits a predefined value.

The policies available to    include:

None (none)-default behavior that does not emit missing values during serialization and performs linear interpolation (or other specified interpolation) when aggregating the sequence. NaN (nan)-when all values in the sequence are missing, NaN is emitted in the serialized output. Skip the sequence in the aggregation when the value is missing, instead of converting the entire group calculation to a NaN group. Null (null)-it has the same behavior as NaN, except that it emits a null instead of NaN during serialization. Zero (zero)-replace with 0 when a timestamp is missing. The zero value is merged into the aggregate result.

To use a padding policy for   , append the policy name (the term in parentheses) to the end of the downsampling aggregate function separated by a hyphen. Such as 1h-sum-nan or 1m-avg-zero.

   in this example, we report the data every 10 seconds, and we want to execute the query reported for 10 seconds by downsampling every 10 seconds and filling the missing values with NaN-time policy 10s-sum-nan: time series T0T0+10sT0+20sT0+30sT0+40sT0+50sT0+60sA155   B10201520A sum downsampling NaNNaNNaN15NaN5NaNB sum downsampling 10NaN2015NaNNaN20sum aggregation result 10NaN 10NaN 2015NaN   520

   if we ask for output without populating the policy, no value or timestamp will be issued at t0room20s or t0room40s. In addition, the values in the B sequence at t0values 30s and 50s will be linearly interpolated to fill the values to be added to sequence A.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report