How to analyze semantic SLAM and depth camera 07/01 Update SLTechnology News&Howtos

How to analyze semantic SLAM and depth camera

2025-07-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

In this issue, the editor will bring you about how to analyze semantic SLAM and depth camera. The article is rich in content and analyzed and described from a professional point of view. I hope you can get something after reading this article.

On the Application of semantics in SLAM

Simultaneous Localization And Mapping (simultaneous mapping and positioning)

▪ can be divided into four parts: initialization, tracking, local mapping, global optimization

The main knowledge involved in visual SLAM

▪ Multi-View Geometry: projection Geometry, camera Model

▪ image processing: feature extraction, feature point tracking

▪ optimization algorithm: nonlinear optimization algorithm (Levenberg-Marquardt method)

▪ INITIALIZATION

▪ generally takes the camera position and pose when the first image enters the system as the reference coordinate system of the V-SLAM system.

After selecting a certain image input by ▪, the corners of the first image are matched and triangulated to obtain the depth to generate the candidate initialization MAP.

▪ reprojection calculation error, if the error is too large, reselect the image

▪ until the error is less than the threshold, optimize once to get the initialization MAP

▪ TRACKING

▪ calculates the relationship between the current image and the previous similar image (feature point matching, optical flow method, edge tracking, etc.)

According to the calculated relationship, ▪ estimates the approximate position of the current camera through the corresponding algorithm.

▪ in which the "direct method" merges the first two parts into one step to execute.

▪ LOCAL MAPPING

▪ can slide the window or select certain keyframes to build a local map.

▪ when an image is selected as a Keyframe according to the strategy algorithm, it is added to the Keyframe queue of the local map

▪ manages point clouds that exist in local maps

▪ performs local Bundle Adjustment (BA)

Finally, ▪ manages the key frames.

▪ GLOBAL OPTIMIZATION

▪ uses BOW to select candidate loopback frames.

▪ validates the candidate loopback frame to ensure that it is the correct loopback

▪ calculates the cumulative drift error based on the loop frame

▪ finally carries on the global optimization.

Semantic SLAM

Why does ▪ need semantics?

Robustness of ▪ to environment (dynamic environment)

▪ acquires the prior information of the map to get higher precision (adding semantic constraints)

Better Loopback Detection by ▪

▪ human-computer interaction (such as CAD drawing)

Traditional loop detection

At present, the mainstream loopback detection methods of ▪ generally rely on BoW (bag of visual words) method.

▪ open source library: DBoW2: https://github.com/dorian3d/DBoW2( and, of course, DBoW3,fbow)

▪ continuous frame matching DLoopDetector: https://github.com/dorian3d/DLoopDetector

▪ ORB-SLAM,VINS and others all use DBoW2.

Other ▪ retrieval methods include LSH (Locality-Sensitive Hashing) and

LLC (Locality-constrained Linear Coding)

The benefits that semantics can bring to SLAM systems

▪ supports medium and long-term tracking

▪ is more adaptable to the environment (robust)

Potential Human-computer interaction characteristics of ▪

Comparison between semantic SLAM system and traditional SLAM system

Because we humans have seen a large number of images, we have formed a natural intuition and have an intuitive sense of distance (sense of space) for most scenes, which can help us to judge the relationship between the near and far of the objects in the image.

1. Depth can be calculated only after translation.

two。 The true scale cannot be determined.

The reason is that the depth cannot be determined by a single image.

The most important feature of the depth camera (called RGB-D camera) is that it can measure the distance between the object and the camera through infrared structured light or Time-of-Flight (ToF) principle, like a laser sensor, by actively emitting light to the object and receiving the returned light. This part is solved by physical measurement, so it does not require a lot of calculation.

ToF

The basic principle is to continuously emit light pulses (usually invisible light) to the observed object, then use the sensor to receive the light returned from the object, and obtain the distance of the target by detecting the flight (round-trip) time of the light pulse.

Structured light

Structured light ranging uses a light source (usually infrared) to project a certain pattern onto an object, and then uses the camera to collect the deformed pattern for depth calculation.

The advantage of this method over pure binocular matching is that the reference image is not obtained, but a specially designed pattern, so the feature points are known and are easier to extract from the test image. The structured light uses triangular parallax ranging, and the longer the baseline (the distance between the light source and the optical center of the lens), the higher the accuracy. Because it is active light, it can not be used outdoors.

Whether it is general structured light or speckle structured light, the disadvantage is that the coded spot emitted by the laser is easy to be flooded by sunlight, the working distance is short, and it does not work in outdoor sunlight.

The laser in the structured light scheme has a short lifetime, which is difficult to meet the long-time working requirements of 7x24 hours, and it is easy to be damaged when it works continuously for a long time. Because the monocular lens and the laser need to be calibrated accurately, once damaged, it is very difficult to re-calibrate them when replacing the laser, so it often leads to the whole module to be replaced together.

At present, most RGB-D cameras still have many problems, such as narrow measurement range, large noise, small field of view, vulnerable to daylight interference, unable to measure transmission materials, and so on. In the aspect of SLAM, it is mainly used indoors, but it is difficult to apply outdoors.

Stereo vision

The distance estimation of the binocular camera is obtained by comparing the images of the left and right eyes and does not depend on other sensing devices, so it can be used both indoors and outdoors.

Binocular stereo vision is a pure vision method, which needs to calculate the matching pixel by pixel. At the same time, in order to ensure the robust matching results, a large number of error elimination strategies will be added in the algorithm, so the algorithm requires a high amount of computation.

VSLAM (Mono)

Advantages: low cost, easy to build

Disadvantages:

1) Special initialization is required

2) scale problem

In a single picture, it is impossible to determine the true size of an object. It may be a large but distant object, or it may be a very near, very small object.

3) there are defects in depth calculation.

A. 3D false graph problem

b. The sample problem of Machine Learning

VSLAM (Stereo)

Advantages:

1) No special initialization is required

2) the depth can be calculated

3) can be used indoors and outdoors

Disadvantages:

1) Calibration is complicated.

2) Parallax calculation is resource-consuming and needs the assistance of GPU/FPGA or special ASIC chips.

The above is how to analyze semantic SLAM and depth camera shared by the editor. If you happen to have similar doubts, you might as well refer to the above analysis to understand. If you want to know more about it, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.