Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

The Great changes in big data's Field in the past six years

2025-01-21 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/03 Report--

For the past six years, the author of this article has been following Data Eng Weekly (formerly Hadoop Weekly).

For the past six years, the authors of this article have been following Data Eng Weekly (formerly Hadoop Weekly), which is an important source of content related to big data and data engineering, covering a wide range of technical articles, product announcements and industry news.

This year, the author intends to analyze the trends and changes of big data in the past six years by analyzing the contents of Data Eng archives (which date back to January 2013) as his personal project.

To this end, the author crawled and cleaned up more than 290 issues (using the Python crawler) and retained snippets of articles related to technology, news, and announcements. Next, he did some basic natural language processing and applied some basic filtering to the article fragments, and finally generated keywords and the following list.

The main trends of the past seven years

The author plots the monthly rolling average of the number of times specific keywords are mentioned and plots them on the same chart. The chart below shows about when these technologies became more and more popular.

Hadoop and Spark

From the moment Spark took over Hadoop in 2013, Hadoop began to decline steadily.

Hadoop and Kafka

Kafka has become the main building block of all big data technology stacks.

Hadoop and Kubernetes

The rise of Kubernestes, although Data Eng Weekly does not pay much attention to DevOps, has also witnessed the comprehensive development of Kubernetes in various fields since 2017.

Hot keywords of the year

I simply draw the 10 keywords that are mentioned most often in a given year.

2013: the golden age of Hadoop!

All the original Hadoop projects are here: HDFS, YARN, MR, PIG... And the two major distributions CDH and HDP, and nothing else!

2014: the rise of Spark!

Hadoop has generally continued its dominance, but the first version of Spark released this year became the hottest topic in 2014!

2015: here comes Kafka!

Spark replaces the one-bit placement of Hadoop, and Kafka enters the top three. Most old projects (HDFS, YARN, MR, PIG... ) did not make it into the top ten.

2016: streaming is hot!

In 2016, the year of streaming, Kafka replaced Hadoop in second place, while Spark (streaming) continued to dominate.

2017: everything is the same as streaming!

It's the same as the 2016 lineup, except that it joined Flink.

2018: get back to basics!

When Kubernetes made its debut, we went back to the basics, trying to figure out how to manage (K8S), schedule (airflow), and run (Spark, Kafka, storage... (our stream.

2019: …

It's too early to draw any conclusions for 2019, but it looks like K8s will be mainstream in 2019!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 264

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report