What is the DTW algorithm? 04/16 Update SLTechnology News&Howtos

What is the DTW algorithm?

2025-04-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

This article is to share with you about what the DTW algorithm is. The editor thought it was very practical, so I shared it with you as a reference. Let's follow the editor and have a look.

Dtw algorithm is based on the idea of dynamic programming DP, which solves the problem of template matching with different pronunciation lengths. It is an early and classical algorithm in speech recognition. DTW algorithm is still widely used in isolated word recognition.

There are many kinds of similarity or distance functions in time series data, among which DTW is the most prominent. In isolated word speech recognition, the most simple and effective method is to use DTW (dynamic time warping) algorithm. Based on the idea of dynamic programming (DP), this algorithm solves the problem of template matching with different pronunciation length. It is an early and classical algorithm for isolated word recognition. The HMM algorithm needs to provide a large amount of speech data in the training stage, and the model parameters can be obtained by repeated calculation, while the DTW algorithm hardly needs additional calculation in the training. Therefore, DTW algorithm is still widely used in isolated word speech recognition.

No matter in the stage of training and establishing template or in the stage of recognition, the endpoint algorithm is used to determine the starting point and end point of the speech. Each entry stored in the template library is called a reference template, and a reference template can be expressed as R = {R (1), R (2), …... , R (m),. , R (M)}, m is the time sequence label of the training speech frame, massi1 is the starting speech frame, and mcm M is the end speech frame, so M is the total number of speech frames contained in the template, and R (m) is the speech feature vector of the m frame. The speech of an input entry to be recognized is called a test template, which can be expressed as T = {T (1), T (2), …... , T (n),. , T (N)}, n is the time sequence label of the test speech frame, n is the starting speech frame, and n is the end speech frame, so N is the total number of speech frames contained in the template, and T (n) is the speech feature vector of the nth frame. The reference template and the test template generally use the same type of feature vector (such as MFCC,LPC coefficient), the same frame length, the same window function and the same frame shift.

Suppose the test and the reference template are represented by T and R respectively, in order to compare the similarity between them, we can calculate the distance D [TMagre R] between them. The smaller the distance, the higher the similarity. In order to calculate this distortion distance, it should start from the distance between the corresponding frames in T and R. Let n and m be arbitrarily selected frame numbers in T and R respectively, and d [T (n), R (m)] denotes the distance between the feature vectors of these two frames. The distance function depends on the actual distance measure, and Euclidean distance is usually used in DTW algorithm.

If Numm can be calculated directly, otherwise, it is necessary to consider aligning T (n) and R (m). Alignment can be done by linear expansion, if N

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.