"Technical Review" chatting about image segmentation 04/18 Update SLTechnology News&Howtos

"Technical Review" chatting about image segmentation

2025-04-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

Recently, I met a lot of people who asked me how the development of image segmentation technology is. Have you landed yet? Three just want to say that they are not only on the ground, but are all rubbed on the ground. The old rule is to talk about it in a few parts.

1 chatting about the past

Back in those days, when I was in college. The students went in and out of the library with calculus, Ming Dynasty and other books in their hands. And I for a few years, it's all adobe photoshop,adobe premiere,adobe after effects,adobe flash, in short, it's adobe family bucket. When others are playing games in Internet cafes, I often play games for seven or eight hours, premiere,nero and so on.

If it were not for the poverty, I would have gone to Wu Dudu to have a double degree in animation, and I had all done well in Japanese. I might have gone to my mysterious science and technology at this time, where the code is still here.

It seems to be a little far. I just want to say that a continuous relationship has been established with audio and video.

Let's get back to business. When I learned Adobe, my favorite game was matting, and the magnetic noose once made me unable to extricate myself, which was the enlightenment of image segmentation technology for me.

Later, from the traditional method to cnn, it has never been separated from the relationship.

This time, instead of talking about advanced technology, we will talk about history, talk about stories, speak more easily, and try to be as short as possible. after all, we are tired of reading a long article. Leave a few hands and say it next time.

2 those years without in-depth learning

Image processing algorithms start from traditional methods, so let's simply understand them as non-deep learning methods.

What is image segmentation? If you make an academic definition, it is to divide the image into several sub-regions that are the same semantically as you want. Look at the autopilot segmentation task above. The road is the road, the car is the car, and the tree is the tree.

These subregions make up a complete subset of the image and do not overlap each other. Image segmentation can be regarded as a pixel-by-pixel image classification problem.

There is no in-depth learning in those years, but also developed a lot of image segmentation techniques, let's pick the key points to make a long story short.

2.1 Edge and threshold method

Simple edge detection has also been used for image segmentation, but because of complex post-processing and overlap with other methods, we will not talk about it here, but start with the threshold method.

The basic idea of threshold method is to calculate one or more gray thresholds based on the gray characteristics of the image, and compare the gray value of each pixel in the image with the threshold.

Among them, the most widely used and representative property is the OTSU [1] method, which is used for grayscale image segmentation, and the core idea is to maximize the variance between classes.

This method is very simple and requires that the color and texture of the segmented object is relatively compact and the intra-class variance is small, so it is only suitable for some text image processing, such as license plate, such as fingerprint.

If you are interested, there are many reviews, you can pay attention to. Although it is simple, I also used it to write an article and patent when I was studying in graduate school.

2.2 Regional growth, fragmentation

A hard wound of the threshold method is too rough and simple, even if it is an adaptive local threshold method, it can not escape the fate that the target with large variance within the class can not be divided. It does not make good use of the spatial information of pixels, resulting in the segmentation results are extremely vulnerable to noise interference, often appear broken edges, which need post-processing.

Therefore, the region growth method appears, which expands the region through some seed points and similarity criterion until it reaches the boundary of the category, and the segmentation results are continuous.

Regional division is a reverse process and will not be discussed in detail. The leader of regional growth method is the watershed algorithm [2].

The watershed algorithm is a mathematical morphology segmentation method based on topology theory. The gray value of each pixel in the image represents the altitude of the point, each local minimum is called the catchment basin, and the boundary of the catchment basin is the watershed. There are many kinds of implementation algorithms of watershed algorithm, and the flooding simulation method is commonly used.

The watershed algorithm has a good response to weak edges, so it is often used to segment material images and to generate superpixels to improve the segmentation efficiency of other methods. During my master's degree, I also fiddled with my elder sister in the division of semiconductor materials, which was not bad.

At this point, super-pixel is also a very important method, to some extent, it can also be classified as a method of image segmentation. SLIC,Meanshift and other methods are very classic methods, go to the article [3] for more detailed interpretation. The author has used it in many practical projects, and it is a perfect match with the methods to be discussed below.

2.3 figure cut

The basic idea of Graphcut is to create a graph, look at the following picture, in which the image pixels or super-pixels as image vertices, and then the goal of optimization is to find a cut, so that the sub-images are not connected to achieve segmentation, as long as the edges are removed and the weight is minimized.

Later, the graph cut method developed from MRF to CRF, that is, conditional random field. It usually contains two optimization objectives, one is the regional similarity, which is called regional energy term, namely piecewise energy. One is the similarity of the cut-off edge, which is called the edge energy term, or pairwise energy. It pursues the maximization of regional energy term and the minimization of edge energy, that is, the more similar within the region, the better, and the lower the similarity between regions, the better.

The graph cutting method is very general, and it is also good for image segmentation with complex texture. The disadvantage is that the time complexity and space complexity are high, so super-pixels are usually used to accelerate the calculation, and the above watershed algorithm can be used.

The iterative version of graphcut, that is, grabcut [5] is more useful, its basic idea is to use the Gaussian mixture model (Gaussian Mixture Model,GMM) instead of the graphcut grayscale model, the initial Gaussian mixture model construction, through user interaction to specify, only need to specify the deterministic background pixel region, usually draw a box.

Many years later, look at the following picture, the effect is still amazing, the edge is very good.

The graph cutting method is very easy to use, which must be mastered by every student who wants to do image segmentation. I have used it from the master's thesis, from the first internship project to now.

2.4 Contour Model

Most people may not know that the basic idea of the contour model is to use continuous curves to express the target contours and to define an energy functional whose independent variable is a curve to transform the segmentation process into a process of solving the minimum value of the energy functional. The numerical realization can be realized by solving the Euler-Lagrange equation corresponding to the function. It includes parametric active contour model represented by snake model and geometric active contour model represented by level set method.

When the energy reaches the minimum, the position of the curve is in the correct target profile.

This kind of segmentation method has several remarkable characteristics: (1) because the energy functional is realized in a continuous state, the final image contour can achieve high accuracy; (2) by constraining the target contour to be smooth, at the same time, with other prior information about the target shape, the algorithm can have strong robustness. (3) the smooth closed curve is used to represent the contour of the object, and the complete contour can be obtained, thus avoiding the pre-/ post-processing process in the traditional image segmentation method.

However, the shortcomings are also obvious, more sensitive, easy to fall into the local extremum.

The following is the tumor segmented by the level set method in my master's thesis [6], which is the white one. This method has a strong mathematical flavor, considering that we are chatting, we will not put out the formula, after all, there are a lot of things behind.

The traditional method is much more than that, but we should move on to the method of deep learning.

3 after in-depth study

The first article recognized by everyone to use deep learning method to do image segmentation is FCN [7].

This picture is also shown, which means that the feature image of the same resolution in the convolution process is fused in the process of sampling from the minimum resolution featuremap. This idea of merging shallow and deep information of the network is necessary to divide the network. with regard to the structure of the network, in fact, we see segnet [8] will feel more, symmetrical, beautiful, more in line with my aesthetic.

With FCN, all kinds of methods show their magic. Different up-sampling methods, convolution with holes and other methods to increase the receptive field, multi-scale information fusion of images and features, adding crf and other post-processing methods.

The final segmentation result often depends on the representation ability of the network, the simplification of the problem, and good labeling data. I won't say much about the skills of the game if I'm not rich.

For more details, click on the table first, and the next issue will be devoted to it. If you can't wait, see article [9]. The last picture will simply have a good time.

4 Segmentation is not just a matter of classification.

As we said earlier, segmentation is still interpreted as a classification problem, that is, each pixel should be classified into a clear category.

However, the ultimate goal of segmentation is not just this one, such as matting for background replacement.

With regard to the problem of the second classification, it is no wonder that the front background can be perfectly integrated and cannot stand careful observation. Therefore, we need to segment with transparency channel, or segment it out first and then use Poisson fusion and other techniques for edge fusion.

This is an image matting problem. The image matting problem can be expressed by a simple mathematical expression, I = aF + (1mura) B, where F is the foreground, B is the background, and an is transparency. A graph can be regarded as the linear fusion of the foreground and the background under the control of the transparent image.

However, it is a bit morbid to solve this problem. For a three-channel RGB image, there are only three equations, but six variables need to be solved. Therefore, methods such as closed matting have to make local region color invariant constraints in order to find the analytical solution.

The image matting problem has also developed from traditional methods to in-depth learning. There is a comparison of mainstream methods here in www.alphamatting.com, although take a look.

The reason why people think about this again is that the end to end program deep image matting [10] carried out by adobe a few years ago has revived everyone's appetite.

On this basis, it may not be impossible to pick your head to make a meme [11], but the actual effect is still lacking.

Of course, technology will not stop there, and friends who are interested in this field will naturally see that someone in siggraph3018 [12] has come out to make trouble. It turns out that it is MIT and Adobe, and there is no Adobe who has this ability.

We only look at a few key words in this paper, spectral segmentation (spectral segmentation), Laplacian matrix, soft transitions and layers,SLIC and so on. Spectral segmentation and laplacian matrix are the core of Normalized Cut, the normalized version of graphcut, while soft transitions and layers are the core ideas of photoshop, and SLIC is a super-pixel method to reduce the amount of computation.

Coupled with deep learning, a thorough and powerful image segmentation method is greatly integrated.

Finally, traditional methods come together with deep learning. It's time for our chat to come to an end, and we'll talk about the technical details next time.

reference

[1] Otsu N. A Threshold Selection Method from Gray-Level Histograms [J]. IEEE Trans.syst.man. & Cybern, 2007, 9 (1): 62-66.

[2] L. Vincent and P. Soille, "Watersheds in digital spaces: an efficient algorithm based on immersion simulations," IEEE Trans. Patt. Anal. Mach. Intell., vol. 13, pp. 583-598, 1991.

[3] Stutz D, Hermans A, Leibe B. Superpixels: an evaluation of the state-of-the-art [J]. Computer Vision and Image Understanding, 2018: 1-27.

Boykov Y Y, Jolly M P. Interactive Graph Cuts for Optimal Boundary & Region Segmentation of Objects in Nmuri D Images [C] / / IEEE International Conference on Computer Vision. IEEE Computer Society, 2001:105.

[5] Rother C, Kolmogorov V, Blake A. GrabCut: interactive foreground extraction using iterated graph cuts [C] / / ACM SIGGRAPH. ACM, 2004Vera 309-314.

[6] long Peng. A new method for MRI medical image enhancement and segmentation [D]. University of Chinese Academy of Sciences, 2015.

[7] Shelhamer E, Long J, Darrell T. Fully Convolutional Networks for Semantic Segmentation [J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2014, 39 (4): 1-1.

[8] Kendall A, Badrinarayanan V, Cipolla R. Bayesian segnet: Model uncertainty in deep convolutional encoder-decoder architectures for scene understanding [J]. ArXiv preprint arXiv:1511.02680, 2015.

[9] Garcia-Garcia A, Orts-Escolano S, Oprea S, et al. A review on deep learning techniques applied to semantic segmentation [J]. ArXiv preprint arXiv:1704.06857, 2017.

[10] Xu N, Price B L, Cohen S, et al. Deep Image Matting[C] / / CVPR. 2017, 2: 4.

[11] Zhu B, Chen Y, Wang J, et al. Fast Deep Matting for Portrait Animation on Mobile Phone[C] / / Proceedings of the 2017 ACM on Multimedia Conference. ACM, 2017: 297305.

[12] OH T An E H, MATUSIK W. Semantic Soft Segmentation [J]. 2018.

Https://www.toutiao.com/a6706310240607928835/

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.