In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/03 Report--
1. Big data
Big data: the technology to solve the problem of massive data. Big data is composed of giant data sets, which can be merged and analyzed to get a lot of additional information and data relationships.
Big data refers to the data collection that can not capture, manage and process its content with conventional software tools within a certain period of time.
Big data technology refers to the ability to quickly obtain valuable information from various types of data, which is suitable for big data, including big data technology, including massively parallel processing database, data mining power grid, distributed file system, distributed database, cloud computing platform, Internet and scalable storage system.
Big data's characteristics:
Bulk: sizes that can range from hundreds of TB to hundreds of PB or even EB
Diversity: big data includes data in various forms
Timeliness: need to be dealt with in a timely manner within a certain time limit
Accuracy: the result of processing should ensure a certain accuracy.
Great value: big data contains a lot of deep value. Big data's analysis, mining and utilization will bring great commercial value.
II. Hadoop
Hadoop is a software platform for analyzing and processing large amounts of data. It is an open source software that can be developed using Java to provide a distributed infrastructure.
Hadoop features: high reliability, high scalability, high efficiency, high fault tolerance, low cost
Common components of hadoop:
-HDFS (Hadoop distributed file system)
-Mapreduce (distributed computing framework)
-Zookeeper (distributed collaboration Service)
-Hbase (distributed inventory database)
-Hive (Hadoop-based data warehouse)
-Sqoop (data synchronization tool)
-Pig (Hadoop-based data flow system)
-Mahout (data Mining algorithm Library)
-Flume (log collection tool)
Hadoop core components:
-HDFS: distributed file system
-Yarn: cluster resource management system
-MapReduce: distributed computing framework
The role and concept of HDFS
-NameNode:Master node, manages HDFS namespace and block mapping information, configures replica policy, and handles all client requests.
-Secondary NameNode: regularly combines fsimage and fsedits, and pushes them to NameNode. In case of emergency, it can help restore NameNode.
-DataNode: data storage node, which stores the actual data and reports the storage information to NameNode.
-Client: split files, access HDFS, interact with NameNode, get file location information, interact with DataNode, read and write data.
There are three deployment models for Hadoop:
-stand-alone
-pseudo-distributed (all roles are installed on one machine)
-fully distributed (different roles install different machines)
Third, stand-alone mode:
1. Get the software
Http://hadoop.apache.org
Download: hadoop-2.7.6.tar.gz
Decompress: tar-xf hadoop-2.7.6.tar.gz
Installation: mv hadoop-2.7.6
two。 Install java environment, jps tools
Yum-y install java-1.8.0-openjdk
Yum-y install java-1.8.0-openjdk-devel
3. Set environment variabl
Vim / usr/local/hadoop/etc/hadoop/hadoop-env.sh
Export JAVA_HOME= "/ usr/lib/jvm/java-1.8.0-openjdk-1.8.0.131-11.b12.el7.x86_64/jre"
Export HADOOP_CONF_DIR= "/ usr/local/hadoop/etc/hadoop"
Analyze the number of times words appear
. / bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.6.jar wordcount oo xx
Fourth, fully distributed:
-distributed file system: the physical storage resources managed by the file system are not necessarily directly connected to the local node, but are connected through the computer network node. the design of the distributed file system is based on the client / server mode. Distributed file system can effectively solve the problem of data storage and management. a file system fixed in a location is extended to any number of locations / file systems, and many nodes form a file system network. each node can be distributed in different locations to communicate and transfer data between nodes through the network.
Conditions for cluster formation:
ALL: can ping each other (configuration / etc/hosts) ALL: install java-1.8.0-openjdk-develNN1: can log in to all cluster hosts without secret ssh, including yourself (cannot prompt for yes)
Ssh secret-free login: deploy sshkey
Do not enter yes: modify / etc/ssh/ssh_config
60 lines add StrictHostKeyChecking no
Configuration file format
Profile reference URL http://hadoop.apache.org
Cd / usr/local/hadoop/etc/hadoop
1. Configure the environment variable file hadoop-env.sh (see III, 3)
two。 Core profile core-site.xml
Vim core-site.xml
Fs.defaultFS
Hdfs:///nn01:9000
Hadoop.tmp.dir
/ var/hadoop
Create / var/hadoop on all hosts
two。 Fully distributed configuration hdfs-site.xml
Vim hdfs-site.xml
Dfs.datanode.http-address
Nn01:50070
Dfs.namenode.secondary.http-address
Nn01:50090
Dfs.replication
two
3. Configure slaves
Vim slaves
Node01
Node02
Node03
4. Synchronize configuration to all hosts
5. Format namenode (operation on nn01)
. / bin/hdfs namenode-format
6. Start the cluster (operation on nn01)
. / sbin/start-dfs.sh
You can use. / sbin/stop-dfs.sh to stop the cluster
7. Verify role jps (all host operations)
8. Verify whether the cluster is set up successfully (operation on nn01)
. / bin/hdfs dfsadmin-report
Service startup log path / usr/local/hadoop/logs
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.