Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How does Letv Video deal with big data with the help of Open Source Technology

2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)06/01 Report--

This article mainly explains "Letv video is how to use open source technology to deal with big data". The content of the explanation in the article is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought. Let's study and learn "Letv video is how to borrow open source technology to deal with big data"!

From the content point of view, the super IP strength created by Letv and big data is really strong. In 2013, Letv first used big data to win the promotion file to make the most accurate prediction and analysis for the upcoming "tiny Times", carried out a beautiful film marketing, and opened the mode of big data film marketing. Later, domestic film and television began to set off a wave of IP, and 2015 was the hottest year for IP speculation. At the end of the year, Letv presented satisfactory answers to users, and his self-made "King of Ten years of Drama" and "Legend of Miyue" set a record with a total broadcast volume of more than 20 billion. The online thunder drama "the Promotion of the Crown Princess" has been broadcast nearly 1.5 billion times online. Big data + super content IP has injected unlimited power for Letv to enter the global market.

However, this is not the point, "content +" is the focus of Letv's ecology. After the broadcast of Legend of Miyue, Letv simultaneously launched customized super TV, smartphone, wine, mobile phone shell, ringtone and other personalized products of the classic line version of Legend of Miyue, and cooperated with Tmall to build a flagship store of derivatives of Legend of Miyue, and mobile games have also been released. Note, please note that this is a 360 °non-dead-angle IP layout, Letv Super IP has formed a perfect closed loop.

If IP is software, then Letv cloud is hardware. In the era of big data, what is most indispensable is the "cloud". Letv Cloud is a cloud computing platform focused on the video field. In 2016, Letv Cloud has reached a two-year global strategic cooperation with Dell to join hands with the world's leading big data operator Equinix. France's largest telecom operator Orange, Australia's largest telecommunications company Telstra, Spain Telecom, the world's leading integrated international telecommunications company, Hong Kong Hutchison Global Crossing Co., Ltd. and other top operators in the world, break the data isolated island and accelerate the construction of video ecosystem. Letv and big data's tentacles will extend to more industries in the future.

Looking at the market capitalization, Leeco was founded in 2004 and listed in 2010, when the market capitalization was only 5 billion. Over the past five years, Letv has been deeply engaged in vertical fields such as Internet video, film and television production, intelligent terminals, and e-commerce relying on big data's platform. up to now, the total market capitalization has been close to 110 billion. However, if you look at the global posture of Letv's overall upgrade, it seems that Letv's story has just begun!

Bai Dexin is now engaged in data mining for Letv Super TV. He said that the original initial business can not adapt to the current business development, so it has to evolve. The main thing to do, when building everything, to analyze the data platform, to provide data mining services for Leeco's super TV. How to develop from the initial business to the present, including real-time analysis and user offline mining on super TV, and provide data mining support to many business departments through data mining.

Bai Dexin mentioned that he is a Google fan and is currently doing the first batch of super TV data mining. When cloud video figured out the player, at that time, there were only tens of thousands of data, and there was less data at that time, so I did some daily data, such as the number of players. Do it in the business and do the calculation in the data node.

Later, the performance became lower and lower, because the amount of data was getting larger and larger. At that time, the amount of data per day was very large, and there were only tens of millions of rows of data every day. At this time, I feel that I need to try some new technologies, so I use Cassandra for storage, store daily solstice, do simple processing and split it, then put it into it, use Hodoop to calculate, and stuff the results into MySQL. Calculating data every day is an intermediate data for yourself. You can make some reports. There are many combinations of data, at first just boxes, applications, and then some video content, and began to try to analyze the daily data into MySQL and Kettle. But it has been changed after three or four months, using Kafka, Storm, Hodoop, Hpase, Hive, Oozie, Sqoop, the only modification is that there are some major changes, follow the open source community to do, upgrade accordingly, and try to keep consistent with the community.

The starting point of big data of Leeco

Bai Dexin said that at the beginning, there was only one data analyst, grasping some data, and this was the analysis done. The TV box, including the TV on, what TV programs have been watched, because Leeco does video content, what video programs are clicked on, and how long the video programs are done through a heartbeat, three minutes a heartbeat, this is recorded in the terminal. The playback has a beginning and a heartbeat, and the TV is better, but some users in the box look at the power outage directly, and it is gone when it is over, so they can only dig back from the heartbeat.

Since the release of super TV, at first the box was expensive and no one bought it, but then it sold 299 and sold a lot. The amount of data is according to the three-minute heartbeat. Hundreds of thousands of users watch video. There is a heartbeat on the computer and there is a heartbeat when the video is played. The amount of data is so large that there is no way. Letv was on four sets at that time. Four sets of data, let's see how it's released. Then output, and then analyze. Let's see what's wrong.

At that time, a person was doing these things when he was doing this. Then someone left, and then there was no one in the Cassandra area, and it was finished when it was handed over, which had a great impact on the system. At this time, no one took over, he left the technology is also gone, a radish hole, radish hole is not easy to fill. Later, I wanted to find someone with better skills to help Letv do it, but now I haven't found it for more than a month. The system has to continue to do, the data is also crazy growth, there is no way to remove the Cassadnra, put in the MySQL.

In another way, analysts have a lot of opinions about Letv. He has two more hours, depending on the number of machines turned on today, compared with yesterday, and then divided into different periods, and two hours will pass. He said the system crashed all the time, but I said the query was too slow. I hope the system can be bigger.

Changes in the amount of data at the present stage

Bai Dexin himself said that Letv's data volume doubled from three months at the beginning of the year, to now I wrote a little earlier, to double every week, and now 100 gigabytes of data a day, super TVs and boxes are selling very fast. Rapid change from device behavior to user behavior. What is the amount of my plan? people begin to think about business actions according to the way of the Internet. Users take my box to see what to do, whether to watch movies or TV dramas, so at this time a lot of user behavior is analyzed. Now the TV version and the box version are one a week, and this version is updated every week. Whether users accept it or not, all these are analyzed here by Letv.

Another is that Letv did some tests here, because in the box, Letv asked UI to do some tests, make a poster recommendation today, and add an analysis tomorrow to see whether the number of users is high or low.

The amount of user data is growing rapidly, and there are no people. On the one hand, I asked Tuhao developers to help Letv solve this problem, and in addition, I have to solve this problem myself. For those who look at the data analysis, the team transferred from within, and the data analysis recruited one. Another person who is more awesome is a person poached from other parts of the company. Starting from the beginning of this year, it will be carried out from the original one. And then change it the way it is now. This is what I am doing. There are two people in the R & D team, and now there are also two people.

Almost half a year to build a new platform, the new platform is built through Kafka, through many business systems, VOD, third-party, including some logs, storage data, and data that need to be analyzed by users. There is also some metadata for some processing and processing. After the integration, in fact, all the requests in the previous paragraph are called here. Storme is the last data, in addition, Hodoop has been written to change, after a large amount of data, for other databases, just started to choose, the company began to do their own database. Through the construction and processing of Hodoop and data service wAD-HOC, real-time query, open data platform also do query, report, some real-time data analysis system, and do a portal to provide data services for each business, which data should be transferred. Operators need to know the amount of video on demand in Sichuan Telecom and a telecom department. These are all excavated from the inside.

Letv's data source through the front end, from three to six, all the data are called here, the advantage is that with the open source community to upgrade the system, it can still receive data without affecting the business. And then do it casually. The data is processed a little and put into STORE, where it is calculated in real time, and then split. Now there is no use of PEED, user interaction is OEE, combine multiple tasks, put it into, and finally a result, is a business process management tool.

After the data is output, it is queried, then provided to others, and fed back to the front-end data. It's called matrix business. This is the test being carried out. Server two 4Core cpu, 6G, about 380000 users, 380000 effective data. It's a little worse than the official website 500000. Because the performance of Letv's machine is much worse than theirs.

What the two nodes did at that time was sent by one node and sent synchronously. The message format was 30 bytes, 50 bytes and 200 bytes. If 30 bytes are 380000, the rest of the data is basically around 300000. This is a single piece of data. It's about 300000 per unit. This is Letv's Spout cluster, which does some business splitting. For example, some data needs to be organized, on demand and heartbeat. In fact, some messages are random. Although they are sent continuously, each machine sends out the data and writes the data into it. For example, the playing time and the playing time of each play are calculated according to the data of the time period.

In this picture borrowed from the official website, I used more than 0.9. in fact, the standard was one, but Letv was afraid of data loss, so he made two. In fact, it was relatively simple to copy the data. Some of the stored data, there is not much to talk about, at that time to do some simple optimization, this does not say much. The default garbage is turned off and automatically recycled. You don't want it to be recycled automatically. You have to do some manual processing. This is Sqoop, where the original data is stored. Letv imports it into the cluster, and Letv uses this to do it. Something was done at that time to extract and integrate the data.

Not written in the above OLD, including queries, some data processing intermediate results, the intermediate results to the end, it is impossible to use the stage.

ROI analysis

The amount of data doubles every month, and now it's not just that. Letv's original structure can't meet the new needs. It used to be a daily newspaper, but now there's a lot of real-time data every day. Kettle mode data integration time is getting longer and longer, using hadoop-Storm scheme, will not have too much impact on data mining, resources are a little richer, cluster is not enough to add machines, data run once a day, run once every night to collect data to generate reports, up to now real-time query, the time is still relatively long, about five minutes. Intel recommended one to Letv, but its memory requirements are too high to do so for the time being. Now they are all 6 gigabyte memory servers. Hardware devices are supported.

At present, there is still a lot of work to be done. In fact, Letv chose different plans at different stages. At the beginning, he sold tens of thousands of boxes a year, and at the beginning of 3100 boxes, it was impossible to build data of more than a dozen nodes. The loss of personnel will lead to the loss of technology. Technical reserves and internal referrals are faster than hiring awesome people. Recruitment is also very difficult now, and there are fewer people in the hadoop circle. New business platforms should be treated with care. Otherwise, it will be difficult to solve the problem. This piece of data is better, if there is no front end, it will have a great impact on the business.

Data security is very important. Letv has more than 40 T of data, which is put on two backup stores. More servers is cool, four is not enough to add eight, the amount of calculation can be very fast.

Thank you for your reading. the above is the content of "how Letv Video uses open source technology to deal with big data". After the study of this article, I believe you have a deeper understanding of how Letv video uses open source technology to deal with big data, and the specific use still needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report