StartDT AI Lab | Visual intelligence engine-- Re-ID enables offline scene customer digitization 07/09 Update SLTechnology News&Howtos

StartDT AI Lab | Visual intelligence engine-- Re-ID enables offline scene customer digitization

2025-07-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/02 Report--

The idea of people goods yard is the core of the whole new retail digital link, people are the starting point of the whole business life cycle, and the primary goal of image algorithm is to get "people" from images. In the previous article, we mainly talked about the development of Face ID. Face ID helps merchants empower offline user portraits and bring the enthusiasm of visual computing to all walks of life such as payment and security.

Admittedly, there are many ways to analyze people. Among all kinds of algorithms for analyzing pedestrians, Face algorithm has unique advantages. It is the most accurate in verifying identity (it can be accurate to payment verification), but Face algorithm is definitely not optimal in quantitative statistics. Because Face ID in many scenes is affected by light occlusion and the quality is uneven, the recognition appeal based on pedestrian body becomes stronger and stronger.

We often say that a set of algorithm system is a trade-off, fish and bear's paw can not be perfect, where the fish refers to the accuracy rate_accurate identification of people (face technology), bear's paw refers to the recall rate_comprehensive catch pedestrians (body technology). Only when face technology and body technology are closely combined can we fully cover the analysis of the whole scene, and the dimensional analysis of people can be more valuable. So I'm going to focus on the whole technical link of pedestrian recognition.

01 Human body detection

The first step is usually human detection, which is structured extraction of human related information through computer vision signals. Because the related products and systems enabled by Singularity Cloud are widely spread in a wide range of commercial scenarios, high requirements are put forward for the robustness of human detection. In unconstrained visual scenes, the main problems encountered by human detection are:

Scales vary greatly: people vary in size, and the ratio of adults and children will vary greatly. Pedestrians 'distance from the camera can also cause large scale changes. In particular, the superposition of these two scale problems creates even greater difficulties.

Human body posture changes greatly: upright walking, bending over labor, sitting and resting, three or five companions, etc., will bring great changes in human body shape.

Camera-induced distortion: The current enabling scenarios for pedestrian detection and commercial profitability requirements impose significant limits on the cost of pedestrian detection-related hardware. Therefore, image distortion caused by the imaging quality and deployment of the camera itself is very common, and this brings great interference to human detection.

Image blur: In pedestrian detection training, due to the wide range of scenes where pedestrians are located, the background as negative samples often has shapes, textures, and appearances equal to those of human bodies, and the influence of illumination angles makes some negative samples false.

Occlusion: In scenes with dense pedestrians, pedestrians will block each other and shuttle through different scenes, which often makes it impossible for cameras with fixed deployment positions to detect human bodies completely, which also brings great difficulties to human body detection.

Speed requirements: The scenes enabled by human body detection are often security-like scenes, and the amount of data to be processed is huge. Therefore, in order to ensure the real-time requirements of some services, higher requirements are put forward for the detection speed of human body detection models. However, due to the difficulty of human body detection itself and the large model, if you want to improve the speed without reducing the accuracy, the technical requirements are higher.

StartDT AI Lab has carried out targeted work on the above series of problems:

In terms of data preparation, a large cost has been incurred. Through self-labeling and sorting, a million-level data sample library has been formed, especially pedestrian labeling in retail commercial scenarios, which is very rich in accumulation.

In the aspect of model algorithm, it fully draws lessons from the current mainstream anchor frame laying and the new breakthrough key point detection method. Through continuous iteration and experiments, the accuracy and recall of the current algorithm can fully meet the human detection task in the current business scenario.

In terms of improving the speed of model inference, StartDT AI Lab mainly starts from two aspects to reduce the complexity of model calculation. On the one hand, compressed backbone neural network reduces the backbone size without reducing the feature extraction performance as much as possible. On the other hand, optimize the detector head module to ensure that the overall performance of the detector does not decrease. Through continuous version iteration, the current model size has been less than one tenth of the first generation model, and the processing efficiency of the model has been greatly improved under the same computing resources.

02 Pedestrian recognition

Pedestrian recognition technology itself is derived from metric learning, and Face Recognition needs to solve a class of problem-retrieval. Through retrieval, we hope that the spatial information and temporal information of pedestrians can be correlated and clustered together, so that re-identification is easy to understand. If we can find traces appearing in one camera, then we have completed a cross-border tracking.

Imagine a scenario where you get separated from your child at a Disneyland/airport/college campus, except for the passive way of broadcasting "xx kids your parents are waiting for you in the radio room." We can flip through the dots and find the kid. The real point map can be realized by re-recognition: actively input the photos of children, retrieve the current frame under multiple cameras at different positions, and find the camera of the bear child. Finally, contact the location of the camera, you can locate the child. This app can also be used to find thieves/protect vips etc. Such imagination space does give people great excitement, but such a futuristic picture is not achieved overnight. A lot of technical support behind StartDT AI Lab can make pedestrian recognition technology play its due role:

1.Body-tracking mechanism: Pedestrian tracking in video structure can be reduced to multi-target tracking problem. We mainly integrate the information associated with each pedestrian ID by means of filtering and greedy algorithm. In a short time range, tracking can quickly match the pedestrian frame of a certain pedestrian according to the correlation between the previous and subsequent frames. This has two advantages: One is to increase the continuity of space, there is pedestrian spatial information between the front and back frames in a video frame, and tracking can unify their spatial information together; the other is to save the calculation cost, and only need to analyze a frame data representatively in the whole tracking process, so that the overall information can be controlled in a higher dimension.

2. Human skeleton point analysis: For pedestrian recognition, the skeleton points of human body are obtained through computer vision technology, and these key points can provide key prior knowledge for pedestrian recognition. Firstly, not all detected pedestrians are suitable for re-identification, among which incomplete pedestrians and pedestrians with low resolution will have certain influence on the model. In order to avoid the influence of these dirty data, skeleton points can provide certain filtering effect. Through the number of skeleton points, we have a qualitative evaluation of pedestrian integrity. At the same time, skeleton position information is also the key to pedestrian alignment. Different pedestrian postures and positions need to be aligned through skeleton key points. By aligning pedestrian features, the dislocation of body parts can be reduced, which has a great impact on the result.

3. Person Re-identification: Pedestrian re-identification is to extract features from pedestrian pictures by using Re-ID model in surveillance video. The features presented by this feature are similar pedestrians with close distance and different pedestrians with far distance. This high-dimensional embedding feature can help us find the same pedestrians under different cameras. Although the technology is very new and advanced, in actual scenarios, when we analyze pedestrian images, it is inevitable that incomplete pedestrians will occur. If we filter out these pedestrians directly, the systematic errors generated in the higher statistical dimensions of the data will have a greater impact on the recall rate. When pedestrians are incomplete, we are forced by the business to use incomplete human bodies for comparison. We intentionally add such noise data to model training, and improve the robustness of the algorithm to incomplete pedestrians by aligning body features in an unsupervised manner.

03 Sample generation

Data samples are the basis of artificial intelligence technology, but the accumulation of data is an extremely time-consuming, laborious and expensive work. Although there are some public data sets with considerable data, these data sets themselves have problems such as unbalanced sample distribution and poor sample diversity. In addition, there are certain differences between the data distributions in different application scenarios, which leads to the generalization ability of the model will be seriously reduced, so it is necessary to label the field data, especially the labeling of Re-ID samples.

In the project, because there is a huge domain difference between the pedestrian data distribution of the live camera images and the public dataset, the Re-ID model trained by the public dataset has a low accuracy in this scene and cannot meet the actual requirements. To solve this problem, we use generative adversarial network (GAN) to transform pedestrians in public datasets into image styles in real scenes. After re-training, the accuracy of the model improves by more than 50%. In addition, we also implement pedestrian pose change through GAN to improve the diversity of the dataset; through attention mechanism, we strengthen the learning of pedestrian features other than clothing (head, limbs, etc.) to solve the problem of accuracy reduction caused by pedestrian changing clothes.

Style Migration:

Before Style Migration

After style transfer

Pedestrian clothing change:

Through the above technical display, readers have a comprehensive understanding of the technology of pedestrian re-identification (Re-ID), and also have a new understanding of the extreme pursuit of technology by singularity cloud. The sparse user behavior outside the VIP system is bagged into the whole passenger flow portrait through pedestrian recognition technology, which provides the possibility for dynamic line analysis and thermal analysis. At the same time, it also makes up for the shortcomings that Face ID cannot act on the generalized statistical dimension. StartDT AI Lab produces subtle chemical reactions through various algorithms, and finally redefines the passenger flow system, bringing the analysis dimension of merchants to a new height.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.