How to decompose the tensor Tucker of Python and its application 07/19 Update SLTechnology News&Howtos

How to decompose the tensor Tucker of Python and its application

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

This article shows you how to carry out the tensor Tucker decomposition of Python and its application, the content is concise and easy to understand, it will definitely brighten your eyes. I hope you can get something through the detailed introduction of this article.

Tensor tensor decomposition Decomposition

Artificial intelligence, deep learning, convolution neural network, reinforcement learning. These are revolutionary advances in the field of machine learning, making many impossible tasks gradually possible. Despite these advantages, there are also some shortcomings and limitations. For example, because the neural network requires a large number of training sets, it is easy to lead to overfitting. These algorithms are usually designed for specific tasks, and their capabilities can not be well transplanted to other tasks.

One of the main challenges of computer vision is the amount of data involved: an image is usually represented as a matrix with millions of elements, while video contains thousands of such images. In addition, noise often occurs in this kind of data. Therefore, the unsupervised learning method which can reduce the data dimension is a necessary magic weapon to improve many algorithms.

In view of this, tensor decomposition is very useful in the application background of high-dimensional data. Using Python to realize tensor decomposition to analyze video can get important information of data, which can be used as preprocessing of other methods.

High dimensional data

High-dimensional data analysis involves a set of problems, one of which is that the number of features is larger than the number of data. In many applications (such as regression), this can lead to speed and model learning problems, such as over-fitting or even unable to generate models. This is common in computer vision, material science and even business, because too much data is captured on the Internet.

One of the solutions is to find a low-dimensional representation of the data and use it as a training observation in the model, because dimensionality reduction can alleviate the above problems. The lower-dimensional space can usually contain most of the information of the original data, so the reduced-dimensional data is enough to replace the original data. Splines, regularization and tensor decomposition are examples of this method. Let's study the latter method and practice one of its applications. Project 3D data onto a 2D plane, image source: May Morrison. Mathematical concept the core concept of this paper is the tensor. To put it simply, it is a multi-dimensional array: the number is the 0-dimensional tensor, the 1-dimensional tensor matrix is the 2-dimensional tensor.

In addition, it will directly refer to the dimension of the tensor.

This data structure is particularly useful for storing images or videos. In the traditional RGB model [1], a single image can be represented by a three-dimensional tensor:

Each color channel (red, green, blue) has its own matrix, and the value of a given pixel in the matrix encodes the intensity of the color channel.

Each pixel has coordinates in the matrix, and the size of the matrix depends on the resolution of the image.

The representation of 3D tensor. For an image, the sum represents the image resolution. Image source: Kolda,Tamara G. And Brett W. Bader.

Further, the video is just a series of frames, each of which is an image. Make it difficult to visualize, but it can be stored in a 4D tensor: three dimensions are used to store a single frame, and the fourth dimension is used to encode the passage of time.

Each slice is a 3D tensor representing a frame, with multiple slices along the timeline. Image source: Kamran Paynabar.

To be more specific, let's take a 60-second video with 60 frames per second (frames per second) with a resolution of 800x600. The video can be stored in the 800x600x3x3600 tensor. So it will have 5 billion elements! This number is too large for building a reliable model. This is the need for tensor decomposition to save the emergency.

There are many literatures on tensor decomposition, and I recommend a review of Kolda and Balder to interested readers [2]. In particular, Tucker decomposition has many applications, such as tensor regression, using tensors as objective [3] or predictive [4] variables. The key point is that it allows you to extract a kernel tensor, a compressed version of the original data. If this reminds you of PCA, that's right: one of the steps in Tucker decomposition is actually an extension of SVD, that is, higher-order singular value decomposition. Existing algorithms allow extraction of kernel tensors and decomposition matrices (not used in our applications). The hyperparameter is rank n. Needless to say, the main idea is that the higher the value of n, the more accurate the decomposition. Rank n also determines the size of the kernel tensor. If n is small, the reconstructed tensor may not exactly match the original tensor, but the lower the data dimension: the tradeoff depends on the current application. B and C are decomposition matrices, while G is a kernel tensor whose dimension is specified by n. Image source: Kolda,Tamara G. And Brett W. Bader. Extracting this kernel tensor is of great use, as you will see in the following practical application examples. As a toy example, I captured three 10-second videos on my phone: cars driving on the highway during an afternoon commute in my favorite cafe terrace parking lot. I uploaded them and the notbook code to GitHub. The main goal is to determine whether we can strictly rank potential video pairs according to similarity under the premise that parking lots and commuter videos are most similar. Before analyzing, use the OpenCV Python library to load and process this data. The steps are as follows: create a VideoCapture object and extract the number of frames for each object

I used shorter videos to truncate the other two videos for a better comparison.

# Import libraries

Import cv2

Import numpy as np

Import random

Import tensorly as tl

From tensorly.decomposition import tucker

# Create VideoCapture objects

Parking_lot = cv2.VideoCapture ('parking_lot.MOV')

Patio = cv2.VideoCapture ('patio.MOV')

Commute = cv2.VideoCapture ('commute.MOV')

# Get number of frames in each video

Parking_lot_frames = int (parking_lot.get (cv2.CAP_PROP_FRAME_COUNT))

Patio_frames = int (patio.get (cv2.CAP_PROP_FRAME_COUNT))

Commute_frames = int (commute.get (cv2.CAP_PROP_FRAME_COUNT))

Randomly sample 50 frames from these tensors to speed up the later operation # Set the seed for reproducibility

Random.seed (42)

Random_frames = random.sample (range (0, commute_frames), 50)

# Use these random frames to subset the tensors

Subset_parking_lot = parking_lot_ tensor [random _ frames,:]

Subset_patio = patio_ tensor [random _ frames,:]

Subset_commute = commute_ tensor [random _ frames,:]

# Convert three tensors to double

Subset_parking_lot = subset_parking_lot.astype ('d')

Subset_patio = subset_patio.astype ('d')

Subset_commute = subset_commute.astype ('d')

After completing these steps, we get three 50x1080x1920x3 tensors. Result

To determine the degree of similarity between these videos, we can rank them. The L2 norm of the difference between two tensors is a common measure of similarity. The smaller the value, the higher the similarity. Mathematically, the norm of a tensor can be

Each represents a given dimension and is a given element.

Therefore, the norm of difference is similar to Euclidean distance.

The result of this operation using a complete tensor is not satisfactory. # Parking and patio

Parking_patio_naive_diff = tl.norm (subset_parking_lot-subset_patio)

# Parking and commute

Parking_commute_naive_diff = tl.norm (subset_parking_lot-subset_commute)

# Patio and commute

Patio_commute_naive_diff = tl.norm (subset_patio-subset_commute)

Look at the similarity:

Not only is there no clear ranking between the two videos, but the parking lot and patio videos seem to be the most similar, in sharp contrast to the original assumption.

All right, let's see if Tucker decomposition can improve the results. The TensorLy library makes it relatively easy to decompose the tensor, albeit slowly: all we need is the tensor and its rank n. Although the AIC criterion is a common way to find the best value for this parameter, it does not really need to be optimal in this particular case, because the purpose is to make a comparison. We need all three variables to have a common rank n. Therefore, we choose n-rank = [2, 2, 2, 2], which is a good tradeoff between precision and speed. Incidentally, n-rank = [5, 5, 5, 5] exceeds the function of LAPACK (underlying linear algebra package), which also shows that these methods are computationally expensive. After the kernel tensor is extracted, the same comparison can be made. # Get core tensor for the parking lot video

Core_parking_lot, factors_parking_lot = tucker (subset_parking_lot, ranks = [2Jing 2jue 2pm 2])

# Get core tensor for the patio video

Core_patio, factors_patio = tucker (subset_patio, ranks = [2Jing 2jue 2pm 2])

# Get core tensor for the commute video

Core_commute, factors_commute = tucker (subset_commute, ranks = [2Jing 2jue 2pm 2])

# Compare core parking lot and patio

Parking_patio_diff = tl.norm (core_parking_lot-core_patio)

Int (parking_patio_diff)

# Compare core parking lot and commute

Parking_commute_diff= tl.norm (core_parking_lot-core_commute)

Int (parking_commute_diff)

# Compare core patio and commute

Patio_commute_diff = tl.norm (core_patio-core_commute)

Int (patio_commute_diff)

Look at the similarity again.

These results make sense: although balcony videos are different from parking and commuting videos, the latter two videos are more similar by an order of magnitude.

In this article, I show how unsupervised learning methods provide insights into data. The comparison between them is meaningful only when the dimension is reduced by Tucker decomposition to extract the kernel tensor from the video. We confirm that the parking lot is most similar to the commuter video.

As video has become a more and more common data source, this technology has many potential applications. The first thing that comes to mind (because of my passion for television and how video streaming services use data) is to improve the existing recommendation system by checking the similarities between trailers or some key scenes of movies / TV programs. The second is material science, which can be classified according to the similarity between the infrared video of heated metal and the benchmark. To make these methods fully scalable, you should solve the computational cost: on my computer, Tucker breaks down very slowly, even though there are only three short 10s videos. Parallelization is a potential way to speed up processing. In addition to these direct applications, this technique can also be used in conjunction with some of the methods introduced in the introduction. Using the core tensor instead of the complete image or video as the training point in the neural network can help solve the over-fitting problem and speed up the training speed, so as to enhance the method by solving these two main problems. The above content is how to decompose the tensor Tucker of Python and its application. Have you learned the knowledge or skills? If you want to learn more skills or enrich your knowledge reserve, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.