Director Sogou big data and the co-founder of Polarr share and exchange on deep learning 04/19 Update SLTechnology News&Howtos

Director Sogou big data and the co-founder of Polarr share and exchange on deep learning

2025-04-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

Architect group meeting: each issue selects one of the hottest technical topics to share practical experience.

Invited Polarr co-founder Gong Enhao, Sogou big data Director Gao Jun, and Qiniuyun AI Lab Director Peng Jun to exchange views on deep learning framework selection and future trends.

Free communication

Polarr Gong Enhao

I'm Gong Enhao. I'm studying at Stanford now. I mainly do deep learning research, especially medical imaging related research. At the same time, it is also in a start-up company, which is called spiteful Xiu Tu in China. In the United States, it is Polarr, big data who makes pictures, some in the cloud, some in the mobile phone, and the one in PC. Our App collects data and establishes optimized algorithms. We provide image collation, image selection and image information recognition in the cloud. We compress the identified model and put the whole deep learning on the phone's App to achieve picture adjudication, recognition, selection and rendering. I am mainly responsible for all the parts of AI.

Sogou Gao Jun

I am Gao Jun. I am currently in charge of precision advertising algorithm research and big data related technology research and development in Sogou. Sogou from the user-side products, there are two products have a strong relationship with deep learning, one is speech recognition, applied in the input method, used for voice input to text, and the other is image search. In my team, deep learning is mainly used in the field of advertising, such as CTR estimation, advertising retrieval, and advertising relevance evaluation. In the future, we hope to do some valuable work on NLU, and we also hope to make some achievements in the direction of network compression.

Qi Niu Peng Xun

I am Peng Xun, head of Qiniuyun AI Lab. Qiniu is a cloud storage company with a wide range of image, video and audio data on our cloud platform, and there are many rich media customers, so the main responsibility of our AI lab is to analyze a large amount of rich media data, and do some content review, identification and other applications in related fields to serve the customers on our platform.

Topic communication

Share about network compression?

Polarr Gong Enhao: there are several parts of network compression.

The first part is to find the most suitable architecture, which I personally think should be determined according to the specific application and performance requirements.

The second part is network compression, which reduces the model parameters as much as possible without changing the model effect. One of my classmates is doing this work called Deep Compression, and then I also participate in some new deep learning algorithms research with this classmate, based on Deep Compression to further optimize the model. Recently, some studies have found that a depth model can be compressed to tens or even hundreds of times, which shows that there is a lot of redundancy. Based on this, we can choose some appropriate model trade off, which can significantly improve the performance on the basis of maintaining the model volume. For example, its own network is Dense Network, compressed to Sparse Network, and then grown into Dense Network can be optimized step by step. You can imagine that this Network will be fat and thin, and eventually reach a state of better size and performance. what I am mainly doing is the statistical analysis of this method.

The third part is model coding. Our company wants to put the Network of image recognition on the mobile phone, so first of all, it is based on model compression. The specific implementation method is: at each iteration, part of the small part into zero, and then constantly iterative optimization, the final model there are some optimizations, will be much smaller. Then Network can also optimize coding on the phone at the same time, when I experiment, the weight can also be changed from 32-bit float to 16-bit, that is, less than half, or if it becomes 8-bit, the encoding will be much smaller, based on these (optimizing the model structure, threshold so that the model is sparse, coding to reduce storage) can continue to compress. But it mainly depends on the demand. For example, in the cloud, it may not have to be very deep, but when we move to the mobile end, we need to put a lot of pressure on it, otherwise the App will be too big. At the same time, after you have compressed it, you have to do the decompression work, which takes a certain amount of time.

Sogou Gaojun: this year, I saw paper talking about teach student, and then I have such a hypothesis, because the feature space involved in advertising is very large. With this way of thinking of teach student, can we put a network of more than 10 million orders of magnitude and find a way to reduce it to the order of millions, and at the same time, its performance can still be maintained in a good performance.

Polarr Gong Enhao: let me tell you, first of all, I don't think the smaller the model, the faster it is. It may have something to do with the architecture. I think we can try it with some simple examples. You can first take a look at smaller and faster models that have been validated by others. See if it makes sense based on that, and whether it can meet your needs. Then if not, sacrifice some accuracy, but also depends on your specific accuracy of various aspects of performance about how much can be used.

Dr. Gong, you are now doing model compression, which is mainly used on mobile phones. After compression, can your computing energy consumption be reduced with a certain degree of accuracy?

Polarr Gong Enhao: computing energy consumption, if you directly use its Framework, it is actually the same. But you can hack something more and he can improve. For example, low-precision multiplication can be used.

I think iOS's Metal is very good. For example, the AlexNet can probably be on a mobile phone, it can be 30 to 42 fps, and then the inception model is about 10 fps. At the same time, some of them have just been optimized, so I think there will be a lot of companies using mobile to solve problems in the future, which is very promising.

Which framework are you using?

Polarr Gong Enhao: it is very practical. Its Metal Framework must be used on iOS. Then the rest is the backend, many of which will be used, and Caffe,Tensorflow has had some contact.

Does Sogou's advertising recommendation mainly recommend structured data or unstructured data?

Sogou Gao Jun: both, there are search advertising problems, as well as display advertising problems, for search advertising, it is a clear query word, you can understand as a structured. If the text is understood as structured, then advertisements for display classes are very complicated. In order to improve the online CTR, you need to identify users' interests. In the process of dealing with users' interests, the data varies greatly, and you will definitely use search, but you will also use some browsing behavior in its site. For example, we get all the data in the customer station, and the source of the whole data is very complex. Therefore, for display advertising, it can be considered that all the processed data are basically heterogeneous, which can be understood as an unstructured problem.

What is the application of deep learning in the field of advertising?

Sogou Gao Jun: in fact, the work in this field is quite different from the image, the big reason is that the academic circles do not pay attention to advertising. Another important reason is that there are not so many advertising data, so it is hard to see that some paper will focus on the application of deep learning in the advertising field, so the practices in the industry are crossing the river by feeling the stones. In my case, deep learning, at least in ranking problems, will be at least 10 points higher than our existing strategic base. Deep learning, in the field of advertising, Baidu used relatively early. Now Ali is developing rapidly and has a lot of applications in the field of commodity recommendation. Therefore, from the perspective of application, I feel that there is a profit, but the input is not proportional to the income at present.

In the field of advertising, what we see is that the GPU machine has never shown an advantage in speedup, probably because we have a lot of CNN in advertising, unlike in the field of graphics. In the field of advertising, I have compared to some speedup problems on a small scale. GPU's machine has no advantage, so I always have a question in my mind. Why on earth do people consider using GPU's machine in image and voice? is it because of convolution network? Do not consider any problems with CPU at all.

Host: the core is these functions, these equations, a large number of matrix calculation, so the matrix calculation CPU certainly does not have the advantage, because GPU can simultaneously account for a number of data, so its advantages are very obvious. So in image and voice, including NLP processing, the advantage of GPU is very obvious. Basically, the calculation contribution of CPU is very small, and then, like many advertisements, it is not matrix computing, so it is normal to lead to low speedup, which may not be as fast as CPU.

Qi Niu Peng Yan: actually, I have also used CPU to run some tests. Some customers have also used our porn detection system before. at the beginning, they said that they could not purchase GPU machines, so I tested them with CPU. His efficiency was very low, about 20 times more than a single GPU and CPU.

Sogou Gao Jun: I still have a small problem. I don't know how big the cluster problem will be if you do in-depth learning on multiple computers in parallel. At least when we are doing some multi-machine parallelism, we moved from Tensor to MXNet, and then we found that there seems to be a problem with the efficiency of Tensor. I don't know if there is any better problem in the industry in terms of multi-machine and multi-card, which can effectively improve acceleration. Dr. Gong, are you aware of any new developments on the issue of parallelism in the United States?

Polarr Gong Enhao: I saw a deep learning caffe franework variant based on Spark on the CPU cluster before, but then I didn't pay much attention to it. I think it might be feasible. Spark uses a little more in data processing. But I personally did not deal with multi-machine and multi-card for the time being. But I think since Amazon pushes mxnet so much, they will definitely come up with better multi-machine and multi-card things.

Qi Niu Peng Yan: I have investigated many computers and multi-cards before, including Tensorflow and Caffe. Tensorflow itself does not provide a good paramter server design. The framework provides you with a better way to design parameter servers according to your application. However, I think Caffe poseidon provides a good design of Paramter server, including how to transmit when it synchronizes the matrix, and how to transform the matrix to make it smaller, so that it can do synchronization more efficiently.

Do you feel that training with Tensorflow is much slower than MXNet and Caffe? have you ever encountered such a problem?

Sogou Gao Jun: yes, and there is a very big gap in multiple computers, so we also modified a small part of the traditional part involving multi-level parallel strategy, but there were not many changes, but on the basis of CPU, we took a look at it at that time, and the effect was pretty good.

Qiniu Peng: has anyone ever used Torch, because I heard some friends say that Torch has better convergence and accuracy than Caffe when running the same dataset and network, which may be the reason why he has some tricky in the underlying algorithm.

Polarr Gong Enhao: when I did DSD research before, I used Torch, which is Torch based on resnet. First of all, the feeling about Torch is that it is too troublesome, because too few people use it, and it is not easy to ask any questions. But it has some advantages, for example, I want to change some regularization and modify weights in the iterative process, it is relatively convenient to change on Torch, because many of its underlying operations are more expose, which is more convenient than in Caffe. For example, if we want to make an adjustment at each step and get the latest adjustment, we can go through Torch. Relatively speaking, similar to Python, it is relatively easy to implement, which is a feeling.

What do you think of the development of deep learning in the field of application?

Qiniu Peng: content auditing, such as porn detection, is the identification of video, which greatly simplifies the work of porn examiners. There are some content tags, especially for social networking sites, we will give social networking sites, live streaming, short videos, some tags to help customers understand the content of image videos.

Sogou Gao Jun: let me ask you a small question. Did you just mention that to do some work for social networking sites is to do something in the direction of video understanding?

Qiniu Peng: for example, according to the needs of our customers, we did a face test to check whether the uploaded photos have avatars or not. If there are no avatars in the photos he uploads, the user is actually a bad user. For example, if we collect pictures from a social networking site, they are actually disorganized, so we make an application to tag all the pictures, including face clustering and scene recognition. Social customers can classify photo albums according to our tagging application, so that they can do some data analysis and analyze the number of selfies of each user on the site. Is to do some crowd analysis from the image.

Sogou Gao Jun: a more interesting in-depth learning application that I have heard this year is applied to video recommendation, and traditional video recommendation uses text to deal with it. There are very few text messages in the fast hand, and it is entirely for users to upload videos, so this year they use in-depth learning to understand the content of the video, and then make recommendations, which is also quite interesting.

Qiniu Peng: I think this piece is equivalent to tagging some unstructured data for customers. Then after tagging, there are actually a lot of things that can be done, such as sorting, searching, recommending, and there are a lot of things that can be done. I can even mark every slice, such as a video slice every 10 seconds, and then you can do a lot of things. For example, the editing of a newsreel, that is, every place in the newsreel, it will label it. For example, in this piece of news, I have my host showing up, and then he will check the text of the following topic and put the text on OCR, and he will label those news paragraphs one by one, so that it is convenient for editing, post-processing, and so on.

Sogou Gao Jun: Qiniu's AI is mainly to do to B services, that is, to help some enterprises to solve their internal needs, using machine learning to deal with problems?

Qiniu Pengyi: we started with a content audit system for porn detection, and then we went to do a variety of tagging systems and customized identification applications.

Sogou Gao Jun: under this model of Qiniu AI, will you regard this business model as a long-term business model? because I have come into contact with some companies in Beijing, even large companies, such as China Merchants Bank, I have not seen a strong ability to pay at present, it is very difficult for them to put forward this method of machine learning to solve problems, and it is also very difficult for them to form a valuation. Is the valuation. I've always been curious about this. Can this model really make a really profitable model?

Qiniu Pengyi: it depends on the group of customers, for example, porn detection helps customers save a lot of costs. it turns out that they need a lot of basic manpower, and the labor cost is very high, so it is actually very happy to do this thing. moreover, porn examiner is very difficult to do, he has to be skilled, and then he quit after six months and a year. In fact, the labor cost is very high. There are other applications, we are committed to a large number of labor-saving applications.

Does Qiniu have some AI strategies?

Qiniu Peng: later, we will mainly do some articles in the video direction, including things like video analysis, including some things about general video detection. We will devote ourselves to solving the practical problems of customers on our platform to invest in this area of research, mainly in the field of video analysis, because we store a lot of videos above, and fine-grained video detection is also one of the key directions.

What are your prospects for deep learning?

Sogou Gao Jun: I ask a small open question, because after deep learning, Amazon did the thing of echo. Will there really be a family secretary like Iron Man in five years' time, just like the original iPhone has done all the phones, or will such a thing happen five years later? What do people think of this?

Amazon's echo now provides a very full API, docking some home devices, etc., or some of your App functions. Well, I am thinking that if there is such a trend in the future, it is likely to become a necessary device for the family. Well, if in this scenario, it can derive a lot of services, for example, it can dock the camera, it also has voice, it can become omnipotent. That is, everything we can do now can be killed by it. Because it can completely change my life, I have been wondering if this will happen.

Host: I think this thing, if it is just a smart home, I think it should be OK. If you are a special geek, put some lights at home, or robots at home, I think there is no problem. But many people may be more concerned about privacy protection, he may not be happy to put the robot at home. I think there should be no problem with this problem on a small scale, but there will still be a problem on a large scale.

Polarr Gong Enhao: I think echo has been quite popular recently, but I think everyone can have this service on their mobile phones in the future, in fact, it is more direct, and now there seem to be a lot of startup working as personal assistants. Their main idea is to become AI assistants. For example, if you call a taxi for me, you don't have to bother to take a taxi by yourself. Mobile phone assistants can connect with Internet services through AI. I think all this can be realized in the near future.

Sogou Gao Jun: are there many companies starting a business in this direction in the United States?

Polarr Gong Enhao: I have seen some recently, including in China. I had a classmate who came back to China to work as a personal assistant, and eventually I definitely wanted to do it, that is, voice recognition, that is, artificial intelligence. Now it may be manual implementation of services from the very beginning. I think it is still a direction that has just started. I want to do it on AI.

Sogou Gao Jun: I remember that there are similar teams in China, which are very similar to Amazon echo. Even some people who make car rearview mirrors seem to be shooting in this direction.

I usually use Microsoft Xiaobing, and sometimes I use it to adjust some programs and do some small things.

Host: personal habits, from the general public, how many people will use these things, I think they may not use much.

Polarr Gong Enhao: I think there are mainly a few problems, that is, the recognition accuracy. In addition, he and other things, such as what he sends on Wechat, he is unable to achieve this function. For example, he is currently charging for one thing.

Host: I think chat robots can be discussed. Now I think there is not a particularly good application, and the algorithm may not be particularly mature.

Sogou Gao Jun: a friend told me about chatbot before. He mentioned that the corpus is a very troublesome thing. I don't know how you handled it.

Host: the core is the construction of knowledge graph. In the chat robot, the technology is not a problem, in fact, it is the problem of the means of production, that is, how to build the chat knowledge graph in the professional field, which is the difference between whether the chat robot is good or not. How to follow the industry in-depth application, this is a future trend. There is no threshold for technology, and any few people can create a robot chat company.

Sogou Gao Jun: if you do a vertical domain automatic question and answer, there is a domain-level knowledge base, which may be of great help to solve these problems. For example, making Xiao Bing is very broad. I have always been curious about a question. For example, there are a lot of dialogues in movies and TV dramas. So is it really valuable to use this dialogue in this scene to help this chatbot's algorithm become better? If it is only from the perspective of this kind of QA, it takes a lot of manpower to collect this kind of matching relationship. But sometimes the chatbot just wants to make people feel like a person. So why can't you get a lot of dialogue from TV dramas and movies?

Qi Niu Peng Yan: I think it is relatively easy to do a customer service robot, but I think it is more difficult to make it look like a human being. I have been exposed to an example before, is to let the robot to learn the usual chat content. For example, "I am sick, I am not feeling well today", and then do manual marking, for example, five answers, one of which is "what's wrong". As a result, it made several groups of people to mark, and most people chose "what's wrong?" in fact, the saying "what's wrong" is common to you in any scene, and this robot gives you everything. In fact, it still doesn't come to the point where it can understand everything in the context of integration.

You can explore some new areas.

Polarr Gong Enhao: in addition to the company, my personal research mainly does medical imaging, which is a relatively new application, such as using deep learning to help doctors make some diagnoses, or seeing diagnoses that some people can't see, that is, the quality of the pictures they provide is actually related to highlighting. At the same time, I think NLP can also be used in this kind of medical diagnosis. Recently, it seems that some people have come to use all kinds of unstructured data to predict, which means that I am personally interested and may be making some smaller attempts.

Sogou Gao Jun: when I was a doctoral student, my laboratory was a cv lab, and many of my brothers and brothers were doing some image-related entrepreneurship. The medical imaging mentioned by Dr. Gong just now is something I am currently paying attention to, and I am really interested, because there are also several small start-ups in China, such as deep care, and then they also seem to be doing similar work. It seems that many people are learning IBM, that Watson, which seems to be that method, and there are a number of domestic companies that are doing it. There is another group of companies that do use the NLP method to do disease judgment and triage, both in medicine. I feel that there are many start-up companies now, but I haven't heard anything about pharmaceuticals at the moment, so I usually focus on medical images after the advertising is over. I will talk to the brothers and brothers in this industry. Listen to some of their ideas, because I think this matter seems to be of great commercial value.

Qiniu Peng: this kind of project is generally very large. For this kind of project, it is actually to solve some very general problems. As long as you solve some of the problems in a department, such as medical imaging, then you will actually solve this very general problem.

Sogou Gao Jun: however, at present, I am not optimistic about doing this in China, because one of my basic judgments is that it is not very reliable if you want the hospital to bring out serious and useful data, because they have told me such a thing. They got tens of thousands of cases, and then there was the relevant data. after deletion, there were probably thousands of data that could be used, and how I felt at that time. This is the industry, not to mention the use of in-depth learning, you can not do a logical return, the amount of data is too small. In fact, it is very difficult to have so much time in China to allow you to do this, so you may seldom do it in the United States, but I think there should be an opportunity in China for a long time.

Polarr Gong Enhao: in fact, there are still many such cases in China, mainly because hospitals and schools can also cooperate. For example, Tsinghua has a lot of resources in this area. If you want to do this again in the future, it will start with every patient. Basically, the number of patients in China for more than a week is about the same as the number of patients a month and a year in the United States.

Qiniu Peng Xun: yes, there are many affiliated hospitals in medical research institutes like Tsinghua and Zhejiang Jiaotong University, and the data is still very large. Just now Dr. Gong also said that it is given to some universities, and these image centers in universities can flow out, so there are actually a lot of opportunities in this area. The question is which diseases are the breakthrough, which can be explored again.

Polarr Gong Enhao: recently, with some progress in the development of segmentation based on CNN, we can do a lot of medical applications.

Sogou Gao Jun: but it is a bit like medical imaging. Even if we can get good data at present, it is unlikely to become a major tool, is it? I think so, after all, it has an error rate, if a machine to do the main advice, once there is an accident will be more troublesome, I feel like this. So I think this tool will only serve as a reference for doctors in the future. I don't know if people will have more expectations for this.

Polarr Gong Enhao: the main reason for these medical ethics and management problems is that no matter how well you do it, at present you cannot have a machine to do this or that for you. It is the final signature that someone is responsible for. But for a doctor, for example, he needs to see many layers of pictures, many different layers of films, then if you can tell him, you can look at this layer, that is, the main layer, which reduces his workload. In essence, it's very good. I talked to a medical school teacher about this the other day, and he thought he needed it.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.