In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/03 Report--
On the whole, the development process of big data can be divided into three important stages: budding period, mature period and large-scale application period. the beginning and application of a number of business intelligence tools and knowledge management technologies have gone through the data sprouting, and the first decade of the 21st century is a mature period, which is mainly marked by the gradual maturity of big data solution. The two core technologies of parallel computing and distributed systems have been formed. Big data technologies such as Google's GFS and MapReduce have been sought after, and the Hadoop platform has become popular. After 2010, it will be a period of large-scale application, marked by data application * * various industries, data-driven decision-making, and the rapid improvement of intelligence in the information society.
The arrival of the data era has also promoted the development of the data industry, including the use of data by enterprises to obtain value, prompting a large number of people to engage in data learning, learning big data needs to master basic knowledge, and then from my point of view, to make a brief explanation for everyone.
To learn the knowledge that big data needs to master, to understand the concept at the initial stage, and to learn data technology at a later stage, mainly include:
1. Big data's concept
two。 The influence of big data
3. The influence of big data
4. The Application of big data
5. Big data's industry
6. Big data handles Architecture Hadoop
7. Big data's key technology
8. Big data's calculation Model
The last three technologies involved are a little more complicated, and we can elaborate on them in detail:
1. Big data deals with the characteristics of Hadoop:Hadoop, Hadoop ecosystem, installation and use of Hadoop
two。 Big data's key technologies: data acquisition, data storage and management, data processing and analysis, data privacy and security
3. Big data processing calculation mode: batch calculation, flow calculation, graph calculation, query analysis calculation
. In the process of getting started learning big data, I have encountered learning, industry, lack of systematic learning route, systematic learning planning, welcome you to join my big data learning communication skirt: 251956502, skirt files have my big data learning manual, development tools, PDF documents and books, you can download them by yourself.
To learn big data well, you need to master the following skills:
Java programming technology
Java programming technology is the basis of big data's learning. Java is a strongly typed language with high cross-platform ability. It can write desktop applications, Web applications, distributed systems and embedded system applications. It is big data engineer's favorite programming tool. Therefore, if you want to learn big data well, it is essential to master the foundation of Java!
2.Linux command
The development of big data is usually carried out in the Linux environment. Compared with the Linux operating system, the Windows operating system is a closed operating system, and the open source big data software is very limited. Therefore, if you want to engage in big data development related work, you need to master the basic Linux operation commands.
Hadoop
Hadoop is an important framework developed by big data, its core is that HDFS and MapReduce,HDFS provide storage for massive data, MapReduce provides computing for massive data, so we need to focus on mastering, in addition, we also need to master Hadoop cluster, Hadoop cluster management, YARN and Hadoop advanced management and other related technologies and operations!
Hive
Hive is a data warehouse tool based on Hadoop, which can map structured data files to a database table, provide simple sql query function, and transform sql statements into MapReduce tasks to run, which is very suitable for statistical analysis of data warehouse. It is necessary to master the installation, application and advanced operation of Hive.
Avro and Protobuf
Both Avro and Protobuf are data serialization systems, which can provide rich data structure types, which is very suitable for data storage, and can also be used for data exchange between different languages. To learn from big data, you need to master its specific usage.
6.ZooKeeper
ZooKeeper is an important component of Hadoop and Hbase. It is a software that provides consistency services for distributed applications. The functions provided include configuration maintenance, domain name service, distributed synchronization, component services and so on. In the development of big data, it is necessary to master the common commands of ZooKeeper and the implementation methods of its functions.
HBase
HBase is a distributed, column-oriented open source database, which is different from the general relational database. It is more suitable for the database of unstructured data storage. It is a distributed storage system with high reliability, high performance, column-oriented and scalable. Big data's development needs to master the basic knowledge, application, architecture and advanced usage of HBase.
8.phoenix
Phoenix is an open source SQL engine based on JDBC API operation HBase written in Java, which has the characteristics of dynamic column, hash loading, query server, tracking, transaction, user-defined function, secondary index, namespace mapping, data collection, row time stamp column, paging query, jump query, view and multi-tenant. Big data development needs to master its principle and usage.
Redis
Redis is a key-value storage system, which largely compensates for the deficiency of key/value storage such as memcached, and can supplement the relational database on some occasions. It provides clients such as Java,C/C++,C#,PHP,Java,Perl,Object-C,Python,Ruby,Erlang, which is very convenient to use. Big data's development needs to master the installation, configuration and related usage of Redis.
Flume
Flume is a highly available, highly reliable and distributed massive log collection, aggregation and transmission system. Flume supports customizing various data senders in the log system for data collection; at the same time, Flume provides the ability to simply process data and write to various data receivers (customizable). The development of big data needs to master its installation, configuration and related usage.
SSM
The SSM framework is integrated by three open source frameworks, Spring, SpringMVC and MyBatis, and is often used as a framework for web projects with simple data sources. The development of big data needs to master the three frameworks of Spring, SpringMVC and MyBatis respectively, and then use SSM for integration operation.
12.Kafka
Kafka is a high-throughput distributed publish and subscribe messaging system. The purpose of big data's development and application is not only to unify online and offline message processing through Hadoop's parallel loading mechanism, but also to provide real-time messages through clusters. Big data development needs to master the principle of Kafka architecture and the role and use of components and the realization of related functions!
13.Scala
Scala is a multi-paradigm programming language. Big data designed the important framework Spark using Scala language. If you want to learn the Spark framework well, it is essential to have the foundation of Scala. Therefore, big data needs to master the basic knowledge of Scala programming!
14.Spark
Spark is a fast and general computing engine specially designed for large-scale data processing. It provides a comprehensive and unified framework for managing the needs of big data processing of different datasets and data sources. Big data development needs to master the basic knowledge of Spark, SparkJob, Spark RDD, spark job deployment and resource allocation, Spark shuffle, Spark memory management, Spark broadcast variables, Spark SQL, Spark Streaming and Spark ML.
15.Azkaban
Azkaban is a batch workflow task scheduler, which can be used to run a set of workflows and processes in a specific order. Azkaban can be used to complete big data's task scheduling. Big data development needs to master the relevant configuration and syntax rules of Azkaban. If you want to learn big data well, you can follow the official account programmer Daniel has video resources to share and learn together.
16.Python and data Analysis
Python is an object-oriented programming language with rich libraries, easy to use and widely used. It is also used in the field of big data, mainly for data acquisition, data analysis and data visualization. Therefore, big data needs to learn some knowledge of Python.
Only by completing the above skills can we be regarded as big data development talents, and only when we really engage in the work related to the development of big data, can we have more confidence in our work, and there will be no problem with promotion and salary increase.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.