Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to enter the hot field of big data, and what is the learning route?

2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/03 Report--

Big data is not a major or a programming language, in fact, it is a combination of a series of technologies. Someone gives the definition of big data through the equation below. Big data = programming skills + data structures and algorithms + analytical capabilities + database skills + mathematics + machine learning + NLP + OS + cryptography + parallel programming although this equation looks long and has a lot to learn, pay is proportional to reporting, at least in proportion to salary. Since there is a lot of knowledge to learn, a correct learning order is very important.

. In the process of getting started learning big data, I have encountered learning, industry, lack of systematic learning route, systematic learning planning, welcome you to join my big data learning communication skirt: 529867072, skirt files have my big data learning manual, development tools, PDF documents and books, you can download them by yourself.

Big data has formulated a professional learning path, hoping to help you avoid detours. It is mainly divided into seven stages: introduction knowledge → Java basic → Scala basic → Hadoop technology module → Hadoop project actual combat → Spark technology module → big data project actual combat. Among them, stage 1 to stage 5 are free courses, specifically: stage 1: learning introduction knowledge this part is mainly aimed at beginners, need to master the basic database knowledge before learning. MySQL is a DBMS (database management system), is the most popular relational database management system (relational database, is a database based on the relational database model, with the help of set algebra and other concepts and methods to deal with the data in the database). MongoDB is a very popular non-relational database (NoSQL) in IT industry, and its flexible data storage method is favored by current IT employees. Redis is an open source, network-enabled, memory-based, key-value pair storage database. It is very necessary to understand both.

To learn big data, first of all, we have to learn the Java language and the Linux operating system, which are the basis for learning big data.

Java everyone knows that the directions of Java are JavaSE, JavaEE and JavaME. Which direction do you want to learn from big data? Only need to learn the standard version of Java JavaSE on it, like Servlet, JSP, Tomcat, Struts, Spring, Hibernate,Mybatis are JavaEE direction of the technology used in big data technology is not much, just need to understand it, of course, Java how to connect to the database or to know, like JDBC must be mastered. Some students said that Hibernate or Mybites can also connect to the database, ah, why not learn? I am not saying that learning these is not good, but that it may take you a lot of time to learn, and it is not often used in your final work. I have not seen who uses these two things to deal with big data. Of course, if you have a lot of energy, you can learn the principles of Hibernate or Mybites, not just API, which can increase your understanding of Java operation of the database. Because the core of these two technologies is the reflection of Java and the various uses of JDBC.

Linux because big data related software is running on Linux, so Linux to learn some solid, learn Linux for you to quickly master big data related technology will be of great help, can let you better understand hadoop, hive, hbase, spark and other big data software operating environment and network environment configuration, can step on a lot of holes, learn shell can understand the script so that it is easier to understand and configure big data cluster. It will also make you learn faster about the new big data technology in the future. Okay, when we're done with the basics, let's talk about what other big data skills we need to learn. We can learn them in the order I wrote.

Hadoop this is now the popular big data processing platform has almost become synonymous with big data, so this is a must. Hadoop includes several components HDFS, MapReduce and YARN,HDFS are the places where data is stored, just like our computer's hard disk, files are stored on this, MapReduce is the data processing calculation, it has a characteristic is that no matter how big the data is, as long as you give it time, it can run the data, but the time may not be very fast, so it is called data batch processing.

YARN is an important component that embodies the concept of Hadoop platform. With its big data ecosystem, other software can run on hadoop, so that we can make better use of the advantages of HDFS large storage and save more resources. For example, we no longer have to build a separate spark cluster, just let it run on the existing hadoop yarn.

In fact, you can do big data's treatment by learning these components of Hadoop, but you may not have a clear concept of how big "big data" is right now. Listen to me and don't worry about this. In the future, when you work, there will be a lot of scenarios where you will encounter dozens of T / hundreds of T of large-scale data, and then you will not think that the big data is really good, and the bigger it is, the more you will have a headache. Of course, don't be afraid to deal with such a large scale of data, because this is your value, let those who do Javaee php html5 and DBA envy it.

Remember that you can learn here as a node for you to learn from big data.

Zookeeper this is a panacea, it will be used when installing Hadoop's HA, and it will be used in future Hbase. It is generally used to store some cooperative information, which is relatively small, generally no more than 1m, and the software that uses it depends on it. For us personally, we just need to install it correctly and make it normal run.

Mysql We have finished learning big data's processing, and then we will learn the small data processing tool mysql database, because when we install hive, what layer does mysql need to master? You can install it on Linux, run it, configure simple permissions, change the root password, and create a database. The main thing here is to learn the syntax of SQL, because the syntax of hive is very similar to this.

Sqoop is used to import data from Mysql into Hadoop. Of course, you can not use this, directly export the Mysql data sheet to a file and then put it on HDFS, of course, the use of the production environment should pay attention to the pressure of Mysql. Hive this thing for SQL grammar is an artifact, it allows you to deal with big data becomes very simple, no longer bother to write MapReduce programs. Some people say that Pig? It can be mastered almost as much as Pig.

Now that Oozie has learned Hive, I'm sure you need this thing. It can help you manage your Hive or MapReduce, Spark scripts, check whether your program is executed correctly, send you an alarm if something goes wrong, help you retry the program, and most importantly, it can also help you configure the dependencies of tasks. I'm sure you'll love it, otherwise you don't feel like shit when you look at that pile of scripts and the dense crond. Hbase this is the NOSQL database in the Hadoop ecosystem, its data is stored in the form of key and value, and key is unique, so it can be used for data weight, it can store a much larger amount of data than MYSQL. So he is often used as the storage destination after big data's processing is completed.

Kafka this is a relatively easy to use queuing tool, what is the queue for? Do you know to wait in line to buy tickets? If there is too much data, you also need to wait in line to deal with it, so that the other students you work with won't cry. Why do you give me so much data (for example, hundreds of gigabytes of files)? how can I handle it? don't blame him because he doesn't work with big data. You can tell him that I put the data in the queue and you take it one by one when you use it, so that he will stop complaining about the gray flow to optimize his program. Because it's his business that he can't handle it. Not the question you gave me. Of course, we can also use this tool to store or HDFS online real-time data, and you can use it with a tool called Flume, which is designed to provide simple data processing and write to a variety of data recipients (such as Kafka).

Spark is used to make up for the shortcomings of data processing speed based on MapReduce, which is characterized by loading data into memory instead of reading slow hard drives. It is especially suitable for iterative operations, so algorithm streams are particularly fond of it. It is written in scala. Either the Java language or Scala can operate on it because they all use JVM.

. In the process of getting started learning big data, I have encountered learning, industry, lack of systematic learning route, systematic learning planning, welcome you to join my big data learning communication skirt: 529867072, skirt files have my big data learning manual, development tools, PDF documents and books, you can download them by yourself.

Can these things you become a professional big data development engineer, the monthly salary of 2W is drizzle follow-up improvement: of course, there is still a lot of room for improvement, such as learning python, you can use it to write web crawlers. In this way, we can create our own data, and you can download all kinds of data on the network to your cluster for processing.

Finally, learn the principles of recommendation, classification and other algorithms so that you can better communicate with algorithm engineers.

In this way, your company will be more inseparable from you, and everyone will not want what you like.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report