How to evaluate whether a technology is worth long-term investment 04/27 Update SLTechnology News&Howtos

How to evaluate whether a technology is worth long-term investment

2025-04-27 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

Photo @ intheblack.com

Wen | Jian Feng

"everyone's time is limited, and it becomes particularly important to choose a technology that is worth investing in in a limited time."

The author has been working for 12 years since 2008. I have been dealing with data all the way. I have done a lot of development (Hadoop,Pig,Hive,Tez,Spark) of big data's underlying framework kernel, as well as upper data computing framework (Livy, Zeppelin) and data application development for many years, including data processing, data analysis and machine learning. It is now the PMC of Apache Member and multiple Apache projects. Joined Alibaba Real-time Computing team in 2018 to focus on the research and development of Flink.

Today I want to combine my past career experience to talk about how to evaluate whether a skill is worth learning. I have been in big data's circle, from the initial Hadoop to the later Hadoop ecological project Pig,Hive,Tez, then to the new generation of computing engine Spark, and then to the recent Flink, big data computing engine throughout my career. Personally, I am lucky to be doing more popular technology at every stage. At that time, I was more likely to choose the type of technology based on my own interests and intuition. In retrospect, I think it is necessary to assess whether a technology is worth learning from the following three major latitudes. 1. Technological depth 2, ecological breadth 3, evolutionary ability

Technical depth

Technical depth refers to whether the foundation of this technology is solid, whether the moat is wide and deep enough, and whether it can be easily replaced by other technologies. In popular terms, it is whether this technology solves a valuable problem that other technologies cannot solve. Here are two main points:

1. No one can solve this problem, and it was this technology that solved it first. 2. Solving this problem can bring great value. Take the Hadoop I learned at the beginning of my career as an example. Hadoop was a revolutionary technology when it first came out, because no other company in the industry had a complete set of massive data solutions except that Google claimed to have a set of GFS and MapReduce systems in-house. With the development of Internet technology, the amount of data is increasing day by day, so the ability to deal with massive data is extremely urgent. The birth of Hadoop solves this urgent need. With the development of technology, the advantages of Hadoop's ability to deal with large amounts of data are gradually getting used to. On the contrary, the shortcomings of Hadoop are constantly criticized (poor performance, complex MapReduce writing, etc.). At this time, Spark arises at the historic moment and solves the stubborn problem of Hadoop MapReduce computing engine. The computing performance of Spark far surpassed that of Hadoop and the extremely elegant and simple API catered to the needs of users at that time and was warmly welcomed by the vast number of big data engineers. Now I am engaged in the research and development of Flink in Alibaba, mainly because I see the demand for real-time in industry and the dominance of Flink in the field of real-time computing. The biggest challenge big data encountered before is the large scale of data (so people will call it "big data"). After years of efforts and practice in industry, the problem of large scale has basically been solved. Over the next few years, the bigger challenge will be speed, that is, real-time. The real-time performance of big data does not mean simple real-time data transmission or data processing, but real-time from end to end. If the speed of any step is slow, it will affect the real-time performance of the whole big data system. In Flink's opinion, Everything is stream. Flink's architecture with Stream as its core is unique in the industry. The resulting features such as superior performance, high scalability and end-to-end Exactly Once make Flink a well-deserved king in the field of stream computing. At present, there are three mainstream stream computing engines: Flink, Storm and SparkStreaming.

Note: Spark Streaming can only choose search terms, which is not rigorous in theory. But as a trend, we pay more attention to its change curve, and the actual impact should be small.

From the above Google trends curve, we can see that Flink is in a period of rapid growth, the heat of Storm is decreasing year by year, and Spark Streaming has almost entered a plateau. This proves the deep foundation of Flink in the field of stream computing. At present, no one can surpass the dominant position of Flink in the field of stream computing.

Ecological span

Technical depth is not enough for a technology, because a technology can only focus on doing one thing well. if it wants to solve complex problems in real life, it must be integrated with other technologies, which requires that this technology has a wide ecological breadth. The breadth of ecology can be measured in two latitudes: 1. Upstream and downstream ecology. Upstream and downstream ecology refers to the upstream and downstream of data from the perspective of data flow. 2. Vertical domain ecology. Vertical domain ecology refers to the integration of a subdivided domain or application scenario.

When Hadoop first came out, there were only two basic components: HDFS and MapReduce, which solved the problems of mass storage and distributed computing, respectively. However, with the development, the problems that need to be solved become more and more complex. HDFS and MapReduce can not easily solve some complex problems. At this time, other ecological projects of Hadoop emerge as the times require, such as Pig,Hive,HBase and so on, which solve the problems that Hadoop is not easy or can not solve from the perspective of vertical domain ecology. The same is true of Spark. At first, Spark was to replace the original MapReduce computing engine. Later, Spark developed various language interfaces and upper frameworks, such as Spark SQL,Spark Structured Streaming,MLlib,GraphX, etc., which greatly enriched the usage scenario of Spark and expanded the vertical domain ecology of Spark. Spark's support for a variety of Data Source makes Spark, a computing engine, allied with storage, building a strong upstream and downstream ecosystem, laying the foundation for end-to-end solutions.

The ecology of the Flink project I am working on is still in its infancy. At that time, I joined Alibaba not only to see the dominance of Flink as a streaming computing engine, but also because of the opportunity for Flink ecology. If you look at my career, you will find some changes. From the beginning, I focused on big data's core framework layer and slowly developed to the surrounding ecological projects. One of the main reasons is my judgment on the entire big data industry: big data's first half of the battle focused on the bottom framework, and now it is coming to an end, and there will no longer be so many new technologies and frameworks in the bottom big data ecosystem in the future. each subdivision field will be the survival of the fittest, mature, more centralized. The focus of the second half of the battle is from the bottom to the top, to ecology. The previous big data innovation is more inclined to IAAS and PAAS, in the future you will see more SAAS type big data products and innovations.

Every time I talk about big data's ecology, I take out the picture above. This picture basically includes all the big data scenes you need to deal with every day. From the leftmost data producer, to data collection, data processing, and then to data applications (BI + AI). You will find that Flink can be applied to every step. It involves not only big data, but also AI, but the strength of Flink lies in stream computing processing. The ecology in other areas is still in its infancy. What I am personally doing is to improve the end-to-end ability of Flink on the above chart. Evolutionary ability

If there is no problem with the depth and ecological breadth of a technology, it at least shows that the technology is worth learning at present. But investing in a technology also needs to be considered in terms of time. You certainly don't want the technology you learn to become obsolete soon, and you have to learn a new one every year. So a technology worth investing in must have the ability to evolve permanently. It has been more than 10 years since I first learned Hadoop, and it is still widely used. Although there are many public cloud vendors seizing the Hadoop market, you have to admit that if a company wants to set up a big data department, the first thing to do is to build a Hadoop cluster. When we talk about Hadoop now, he is no longer the original Hadoop, he is more of a general name for the Hadoop ecosystem. You can take a look at this article of Cloudera CPO Arun [1] when you are free, and I very much agree with the views in it.

[1]: not to mention the https://medium.com/@acmurthy/hadoop-is-dead-long-live-hadoop-f22069b264acSpark project. After 14 years of outbreak, Spark has now entered a stable period. But Spark is still evolving and embracing change. Spark on K8s is the best proof that Spark embraces cloud nativism. Now the hot Delta,MLFlow in the Spark community is a testament to Spark's strong evolutionary ability. Today's Spark is not only the Spark that will replace MapReduce, but also a general-purpose computing engine suitable for a variety of scenarios. It has been almost a year and a half since I joined Alibaba in 18 years. In this year and a half, I have just witnessed the evolutionary ability of Flink.

First of all, after the release of several large versions, Flink integrates most of the functions of Blink, which greatly improves the ability of Flink SQL.

Secondly, Flink's support for K8s, Python and AI all prove the strong evolutionary ability of Flink itself.

Little Tips

In addition to the above three dimensions, I would also like to share some of my tips when evaluating a new technology.

1. Using Google trends. Google trends can well reflect the development momentum of a technology. The trend chart mentioned above makes a good comparison of the three high-flow computing engines Flink, Spark Streaming and Storm. It is not difficult to draw a conclusion: Flink is the king in the field of stream computing.

2. Check the awesome on GitHub. One indicator of the popularity of a technology is awesome list on GitHub, and you can look at the GitHub star number of this awesome list. In addition, you can take a weekend to look at the content on this awesome list, because it is basically about the essence of the technology, from which you can roughly judge the value of the technology.

3. See if there are any technical evangelists on the technology website who endorse the technology (I often read medium.com personally). There is usually a group of people in the technology circle who are dedicated to technology and have good taste. If a technology is really good, then some technical evangelists will endorse the technology for free and share their experience on how to use the technology.

Summary

Everyone's time is limited, and it is particularly important to choose a technology worth investing in in the limited time.

These are some of my thoughts on how to evaluate whether a technology is worth learning, and it is also a small summary and review of my own career in technology selection. I hope my thoughts will be helpful to your career.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.