How to build Hadoop ecological SparkStreaming platform 04/10 Update SLTechnology News&Howtos

How to build Hadoop ecological SparkStreaming platform

2025-04-10 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article introduces the relevant knowledge of "how to build a Hadoop ecological SparkStreaming platform". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

Operating system: Centos7

The framework used:

Flume1.8.0

Hadoop2.9.0

Kafka2.11-1.0.0

Spark2.2.1

HBase1.2.6

ZooKeeper3.4.11

Maven3.5.2

The overall development environment is based on JDK1.8 and Scala, so you have to prepare the environment for java and Scala in advance, and then start to build the basic platform:

I. configure the development environment

Download and extract JDK1.8, download and extract Scala, and configure the profile file:

2. Configure zookeeper and maven environment

Download and extract zookeeper and maven and configure the profile file

Then configure the relevant configurations in zoo.cfg, specify the dataDir directory, and so on.

Start zookeeper:

/ usr/local/zookeeper-3.4.11/bin/zkServer.sh start

If no error is reported, jps will check to see if the startup is successful.

3. Install and configure Hadoop

The installation and configuration of Hadoop was mentioned in the previous article (portal). For the convenience of the following steps, here is a simple configuration description for a stand-alone version:

Download the hadoop to extract and configure the environment:

First configure hadoop-env.sh and yarn-env.sh, and modify JAVA_HOME to the specified JDK installation directory / usr/local/java/jdk1.8.0_144

Create a working directory for hadoop

Mkdir / opt/data/hadoop

Edit core-site.xml, hdfs-site.xml, yarn-site.xml and other relevant configuration files. Please see the previous article for the specific configuration. Remember to execute hadoop namenode-format after the configuration is completed, otherwise the hdfs startup will report an error. After startup, the browser will visit port 50070 and see the hadoop page.

4. Install and configure kafka

Again, download kafka first, and then configure:

Enter the config directory of kafka, configure server.properties, and specify log.dirs and zookeeper.connect parameters; configure the dataDir of zookeeper in the zookeeper.properties file, and start kafka after the configuration is completed

Kafka-server-start.sh-daemon $KAFKA_HOME/config/server.properties

You can use jps to check whether there is a kafka process, and then test whether kafka can send and receive messages normally. Open two terminals, one for producer and one for consumer to receive messages. First, create a topic.

Then enter the command in the first terminal:

Kafka-console-producer.sh-broker-list localhost:9092-topic testTopic

Enter the command in the second terminal:

Kafka-console-consumer.sh-zookeeper 127.0.0.1 topic testTopic

If the startup is normal, then the two terminals will enter the blocking listening state, and any message entered in the first terminal will be received by the second terminal.

5. Install and configure HBase

Download and extract HBase:

To modify the configuration file under hbase, first modify hbase-env.sh, mainly JAVA_HOME and related parameters. Here, you need to explain the parameter HBASE_MANAGES_ZK. Because you use your own zookeeper, it is set to false, otherwise hbase will start a zookeeper by itself.

Then modify the hbase-site.xml, we set the hbase file in hdfs, so set the hdfs address, where tsk1 is the hostname,hbase.zookeeper.quorum parameter of the machine where I installed hadoop is the address where zookeeper is installed, and the various addresses here are better to use the machine name

Start hbase after the configuration is complete, and enter the command:

Start-hbase.sh

If no error is reported in the log after checking the log, test the hbase and test it with hbase shell:

At this point, the hbase has been successfully built. Visit the page of the following hadoop and view the file system (menu bar Utilities- > Browse the file system). At this time, you can see that the relevant files of base have been loaded into the file system of hadoop.

6. Install spark

Download spark and extract it

VII. Testing

At this point, the environment is basically built, and the above environment is only part of the server production environment. Server information, specific tuning information and cluster construction are not written here. Let's write a piece of code to test the receipt of the kafka production message to spark streaming, and then process the message and write it to HBase. First write a HBase connection class HBaseHelper:

Write a test class, KafkaRecHbase, for spark-submit submission.

Compile and submit to the server and execute the command:

If you report correctly, execute the producer of kafka, enter a few lines of data and you can see the result in HBase!

Install a Flume to collect Nginx logs in real time and write them to Kafka

Flume is a framework for log collection, easy to install and configure, and can support multiple data sources and outputs. For more information, please refer to the Flume documentation and write a full portal.

Download Flume and configure the environment

Write a configuration file for Flume in the conf directory of flume:

Kafka creates a topic named flumeKafka to receive, and then launches flume:

If there is no error, Flume will start to collect the logs generated in opt/data/nginxLog/nginxLog.log and push them to kafka in real time, and then write a spark streaming processing class as above to deal with it accordingly.

This is the end of the content of "how to build a Hadoop ecological SparkStreaming platform". Thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.