Big data's learning direction. If you know this, you will know what you can do. 04/22 Update SLTechnology News&Howtos

Big data's learning direction. If you know this, you will know what you can do.

2025-04-22 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

What kind of talents do enterprises need?

Enterprises need two kinds of big data talents, one is data platform construction talents, the other is data mining application talents. Big data is responding to the combination of application to reflect its value. For example, promote the application of big data technology in finance, meteorology, administration and other fields, and promote personal credit and health care based on big data technology.

Three kinds of abilities to achieve big data talents

First, technology-related personnel, including IT, systems, hardware and software; second, the number of relevant talents, including statistics, mathematics, modeling, algorithms; third, business, is to have certain professional domain knowledge. The establishment of big data's data storage itself requires technical ability, but how to do analysis through the data? This requires quantitative capacity.

Here I still want to recommend the big data Learning Exchange Group I built myself: 529867072, all of them are developed by big data. If you are studying big data, the editor welcomes you to join us. Everyone is a software development party. Irregularly share practical information (only related to big data software development), including the latest big data advanced materials and advanced development tutorials sorted out by myself. Welcome to join us if you want to go deep into big data.

Popular occupations in big data era

1. Data planner

Before a product design, provide key data support for enterprise decision-making, maximize the value of enterprise data, better implement differential competition, and help enterprises get the first opportunity in the competition.

2. Data engineer

Big data infrastructure designers, builders and managers have developed architectures that can analyze and provide data according to the needs of the enterprise. At the same time, their architecture ensures that the system runs smoothly.

3. Data architect

Good at dealing with scattered data, all kinds of irrelevant data, proficient in statistical methods, can obtain the original data through the monitoring system, and interpret the data from a statistical point of view.

4. Data analyst

The responsibility is to transform the data into information that the enterprise can use through analysis. They find the problem through the data, find the cause of the problem accurately, and find the key point for the next improvement.

5. Data application division

Restore data to the product for use by the product. They can express the information contained in the data in a language that ordinary people can understand, and promote adjustments within the enterprise according to the conclusions of data analysis.

6. Data scientist

Big data's leaders have a variety of cross-scientific and business skills to translate data and technology into the business value of the enterprise.

Big data only needs to learn the standard version of Java JavaSE. Technologies such as Servlet, JSP, Tomcat, Struct, Spring and Hibernate,Mybites are all in the direction of JavaEE. Not much is used in big data's technology. You only need to know. Of course, you still need to know how Java connects to the database. Like JDBC, you must master it. Some students said that Hibernate or Mybites can also connect to the database. Why not learn it? I am not saying that it is not good to learn these. But that learning these may take you a lot of time, to the final work is not often used, I have not seen who did big data to deal with these two things, of course, if you have plenty of energy, you can learn the principles of Hibernate or Mybites, do not just learn API, this can increase your understanding of Java operation of the database, because the core of these two technologies is the reflection of Java plus various uses of JDBC.

Linux: because big data-related software runs on Linux, Linux should learn more solidly. Learning Linux well will be of great help for you to quickly master big data-related technology, and can make you better understand the running environment and network environment configuration of big data software such as hadoop, hive, hbase, spark, and so on. You can step on many holes less, and learn shell to understand scripts so that you can understand and configure big data cluster more easily. It will also make you learn faster about the new big data technology in the future.

Okay, when we're done with the basics, let's talk about what other big data skills we need to learn. We can learn them in the order I wrote. #

Hadoop: this is now the popular big data processing platform has almost become synonymous with big data, so this is a must. Hadoop includes several components HDFS, MapReduce and YARN,HDFS are the places where data is stored, just like our computer's hard disk, files are stored on this, MapReduce is the data processing calculation, it has a characteristic is that no matter how big the data is, as long as you give it time, it can run the data, but the time may not be very fast, so it is called data batch processing.

YARN is an important component that embodies the concept of Hadoop platform. With its big data ecosystem, other software can run on hadoop, so that we can make better use of the advantages of HDFS large storage and save more resources. For example, we no longer have to build a separate spark cluster, just let it run on the existing hadoop yarn. In fact, you can do big data's treatment by learning these components of Hadoop, but you may not have a clear concept of how big "big data" is right now. Listen to me and don't worry about this.

In the future, when you work, there will be a lot of scenarios where you will encounter dozens of T / hundreds of T of large-scale data, and then you will not think that the big data is really good, and the bigger it is, the more you will have a headache. Of course, don't be afraid to deal with such a large scale of data, because this is your value, let those who do Javaee php html5 and DBA envy it.

Zookeeper: this is an one-size-fits-all oil. You will use it when installing Hadoop's HA, and you will use it in future Hbase. It is generally used to store some cooperative information, which is relatively small, generally no more than 1m, and the software that uses it depends on it. For us personally, we just need to install it correctly and make it normal run.

Mysql: we have finished learning big data's processing, and then we will learn the small data processing tool mysql database, because when we install hive, what layer does mysql need to master? You can install it on Linux, run it, configure simple permissions, change the root password, and create a database. The main thing here is to learn the syntax of SQL, because the syntax of hive is very similar to this.

Sqoop: this is used to import data from Mysql into Hadoop. Of course, you can not use this, directly export the Mysql data sheet to a file and then put it on HDFS, of course, the use of the production environment should pay attention to the pressure of Mysql.

Hive: this thing is an artifact for those who know SQL grammar. It makes it easy for you to deal with big data without having to write MapReduce programs. Some people say that Pig? It can be mastered almost as much as Pig.

Oozie: now that you've learned Hive, I'm sure you need this thing. It can help you manage your Hive or MapReduce or Spark scripts, check whether your program is executed correctly, send you an alarm if something goes wrong, help you retry the program, and most importantly, it can help you configure the dependencies of tasks. I'm sure you'll love it, otherwise you don't feel like shit when you look at that pile of scripts and the dense crond.

Hbase: this is the NOSQL database in the Hadoop ecosystem, its data is stored in the form of key and value, and key is unique, so it can be used for data weight, it can store a much larger amount of data than MYSQL. So he is often used as the storage destination after big data's processing is completed.

Kafka: this is an easy-to-use queuing tool. What is the queue for? Do you know to wait in line to buy tickets? If there is too much data, you also need to wait in line to deal with it, so that the other students you work with won't cry. Why do you give me so much data (such as hundreds of gigabytes of files)? how can I handle it? don't blame him because he doesn't work with big data. You can tell him that I put the data in the queue and you take it one by one when you use it, so that he will stop complaining and go to optimize his program immediately. Because it's his business that he can't handle it.

Not the question you gave me. Of course, we can also use this tool to store or HDFS online real-time data, and you can use it with a tool called Flume, which is designed to provide simple data processing and write to a variety of data recipients (such as Kafka).

Spark: it is used to make up for the shortcomings of data processing speed based on MapReduce, which is characterized by loading data into memory for computing rather than reading slow, slow-evolving hard drives. It is especially suitable for iterative operations, so algorithm streams are particularly fond of it. It is written in scala. Either the Java language or Scala can operate on it because they all use JVM.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.