Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Big data engineer micro-position learning sharing

2025-01-21 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/03 Report--

Article source

Big data Weiwei position ~ Lin's personal Center (https://blog.51cto.com/battosai/1962958)

With the rapid growth of data in various industries, higher and higher requirements are put forward from the aspects of data storage, analysis, processing and mining. The IT industry is gradually changing to a "DT" industry, and the future is data-driven. So I think big data is a mainstream direction in the future. Understanding and learning big data will be helpful to our work and life in the future. Recently, I have personally studied the micro-position course of big data engineer, and have passed all the examinations. Let's get to the point and share my learning experience, because of the length, it doesn't involve specific knowledge points. Because this course is partial to big data analysis, basically does not involve the development of big data components, so this course puts aside the lengthy java course. In addition, because it is related to the construction of big data platform, we need a certain Linux foundation, and this part of the foundation can actually be mastered quickly, so it is needless to say that we began to learn big data after systematically learning a whole set of Linux. Of course, if you have the foundation of java or oracle, you will have a certain efficiency bonus to learn. 1. Big data platform is built. We can focus on how the memory structure of Linux works and can be combined with the features of jvm. Some of the command parameters of the file system should be familiar, which can be compared to HDFS later. In addition, the loading sequence and time configuration of Linux environment variables also need to be mastered. 2.MapReduce . You can learn about its computing framework, such as the resource scheduling and processing processes of MapReduce and YARN, how to execute a MapReduce program, and what intermediate processes such as reducer and partitioner do. 3.HDFS . It is necessary to understand the architecture of the HDFS distributed file system, the relationship between data and metadata, and the security model, and to master the way HDFS+zookeeper implements HA. Construction of hadoop cluster, including system preparation and initialization

, hardware selection, parameter configuration, cluster fault diagnosis and so on. Finally, you can learn about the optimization of HDFS components.

4.Hbase . Many enterprises may not specifically use Hbase, but this depends on the specific scenario. We can first systematically learn the concept and some basic operations, while understanding the data model and characteristics of NoSQL and distributed database, and some typical application scenarios. 5.Flume and Kafka. We have heard a lot about streaming computing, but we may not be clear about the details. Here we can understand the computing framework of streaming computing. Through examples, we can easily understand how flume and kafka work together to achieve an application log real-time analysis system. At the same time, when we learn spark streaming, we can also go to the difference between analogy and Storm/Flink real streaming computing, application scenarios and their respective advantages and disadvantages. 6.Hive . Learn about the motivation of hive. Comparison with the use of traditional sql statements. What are the functions of hive and what are the complex data types? How to use hive for query and analysis, such as creating library tables, how to load the data on hdfs into hive, how to import mysql data into hive table through sqoop, and so on. You need to know how to use partition tables and how to optimize and extend the use of hive. 7.spark . Understand the motivation of spark sql and the RDD principle behind spark. Deployment of stand-alone and cluster versions. The relationship between RDD, DataFrame and DataSet and how it evolved. You can focus on the process of running spark programs, including the concepts of parallel processing and data localization. Finally, we should master the common methods of spark performance optimization, broadcast variables and accumulators.

8. Data science and machine learning. Some mathematical foundations such as statistics, probability theory and linear algebra are still needed here. You can master the 3C in machine learning, the principle of the recommendation system, and how to assist the recommendation system through the component spark Mlib of spark. Here you will really feel how important it is to learn math well.

It is suggested that everyone take notes while studying, otherwise the knowledge points are too scattered to be easy to remember. Then do more hands-on practice for the operation in the demonstration. After all, it takes more hands-on to consolidate the data analysis. In addition, the practice of many knowledge points depends on the continuous practice of individuals. After all, the speed of updating is relatively fast, and we cannot rely entirely on the teacher's explanation. It is best to read more official documents and learn more about new and old features and application scenarios. The writing is relatively rough. I hope it can bring you a little bit of assists in your study. Finally, I wish you all have something to gain after learning this course.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report