What is the deep learning structure suitable for a small amount of data in big data 04/18 Update SLTechnology News&Howtos

What is the deep learning structure suitable for a small amount of data in big data

2025-04-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

In this issue, the editor will bring you about what is the deep learning structure applicable to a small amount of data in big data. The article is rich in content and analyzes and narrates it from a professional point of view. I hope you can get something after reading this article.

Guide reading

Some of the most commonly used few shot learning solutions are introduced and compared.

Traditional CNNs (AlexNet, VGG, GoogLeNet, ResNet, DenseNet …) It performs well when there are a large number of samples for each class in the dataset. Unfortunately, when you have a small dataset, they usually don't work very well. However, in many real-world scenarios, collecting data is challenging. For example, in face recognition systems, there are usually few images of everyone, or in the medical field, the number of cases of rare diseases is also very limited.

So, what can deep learning provide when there are only five samples in your category, or even one sample per category? This problem is called few-shot learning. This is an active research field, and there are many successful methods that can be used. In this article, I will only mention some of the most promising architectures.

This article will not explain the architecture in depth, because it will make the article very long. Instead, I will only cover the main ideas of the architecture so that anyone who wants to deal with small datasets can have a general understanding of the model.

Siamese Neural Networks

The structure of Siamese Neural Networks

Siamese neural network takes two samples as inputs and outputs the probability (or loss) of whether a given input belongs to the same class. The input samples pass through the same network (shared weights), and their embedding is compared in the loss function (usually using a measure based on embedded differences). During the training process, the Network learns to encode the input in a more robust manner. First, the model is trained on the support set (verification step) to learn the same / different pairing. Then, the test samples are compared with each sample in the training set to get the similarity between the coded test samples based on learning and each class (one-shot task). It was one of the first successful models in the field of few-shot learning and became the basis for other models.

Steps for Siamese Neural Networks

Triplet Network and Triplet Loss

Triplet Networks

Triplet Network is an extension of the Siamese network. Instead of using two samples, the Triplet network uses three samples as input: positive, anchor and negative samples. Positive samples and anchor samples come from the same class, while negative samples come from different classes. The arrangement of Triplet loss makes the embedding of anchor close to positive and away from negative. In this way, the network becomes more robust when extracting embedded information. Triplet Networks has been applied to face recognition datasets and shows very good performance.

Triplet Loss

Matching Networks

The matching network combines embedding and classification to form an end-to-end differentiable nearest neighbor classifier. For the prediction of the model, y traits is the weighted sum of tags, and y bands is the training set. The weight is the pairwise similarity function a (x similarity), the similarity between querying (testing) samples and supporting (training) samples. The key of matching network is the differentiability of similar function.

Where C represents the cosine similarity function, k is the total number of samples in the training set, and the functions f * and g are embedded functions. In general, the similarity is calculated between the embedding of test samples and the embedding of training set samples x samples. The main innovation of this work is to optimize the embedded function to get the maximum classification accuracy.

Prototypical Networks

The prototype network does not compare the test samples with all the training samples, but with the class prototype (or average class embedding). The key assumption is that for each category, there is an embedding, and the representation of the cluster sample is distributed around the embedded c clusters of the prototype. In their paper, it is proved that its performance is better than that of matching network.

Meta-Learning

Model agnostic Meta-Learning

Meta-learning means learning to learn. Meta-learning attempts to train the parameters of the model to perform best in new tasks through one or more gradient steps (like humans). The parameters of the model are updated according to the updated task-specific parameters, so that any task has the highest performance after completing a single step.

The purpose of model-independent meta-learning (MAML) is to learn a general model that can easily fine-tune many tasks with only a few iterative steps. For each task in the meta batch, initialize a model with the weight of the base model. The random gradient descent (SGD) algorithm is used to update the weights of specific tasks. Then, the sum of the lost weights after updating is used to update the weights of meta-learners. The goal here is that the loss of these parameters will be small for several different tasks.

Model agnostic Meta-Learning algorithm

Bonus: MetaFGNet

MetaFGNet

In addition to the target task network, MetaFGNet also uses auxiliary data to train the network. The two networks share the initial layer (basic network) to learn general information. This method is also known as multitasking learning. The auxiliary data (S) and the target data (T) are trained to produce regularization effect on the target training. MetaFGNet also uses a process called sample selection. The samples in the auxiliary data score the similarity of the target classifier through the network, and also calculate the source classifier. If the similarity is high, the score will be high. Only the samples above the score threshold are selected for training. It is mainly assumed that the auxiliary data S should have a distribution similar to that of the target set T. The results show that the process improves the overall performance. The training effect of using meta-learning method has been improved.

The above is what is the deep learning structure of big data for a small amount of data shared by the editor. If you happen to have similar doubts, you might as well refer to the above analysis to understand. If you want to know more about it, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.