Example Analysis of big data Crawler installation 04/27 Update SLTechnology News&Howtos

Example Analysis of big data Crawler installation

2025-04-27 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article mainly introduces the example analysis of big data crawler installation, the article is very detailed, has a certain reference value, interested friends must read it!

Crawler pre-installation preparation: big data platform installation completed, zookeeper, redis, elasticsearch, mysql and other components installed and started successfully.

1. Modify the crawler installation configuration file (preferably offline and then upload the platform)

2. Modify the crawler\ dkcrw\ jdbc.properties configuration file (only modify the contents in the image and other contents by default)

The address entered by Hbase.zookeeper.quorum should be viewed on the DKM monitoring platform:

For configuration related to Redis, please see the following interface:

3. Replace the jdbc.properties configuration file under the modified crawler\ dkcrw\ with\ crawler\ dkcrw-tomcat-7.0.56\ webapps\ ROOT\ WEB-INF\ classes (there is a direct replacement that has not been modified below)

After modification, compress the modified crawler file into a compressed file.

4. Upload the master node of the platform and decompress it (here we will not describe how to upload it. This example is to upload it to the root directory. If you upload the installation package to any directory, you can select the root directory)

Unzip decompression command. After decompressing singing, there will be an extra cuawler folder.

Use the cd crawler command to enter the crawler folder

Use mysql-uroot-p123456

< numysql.sql 命令添加numysql.sql数据库 5、分发爬虫文件每个节点都需要有dkcrw文件, dkcrw-tomcat-7.0.56文件只能放在一个节点上,不能放在主节点上(推选放在从节点) 命令: scp -r {要分发的文件名可填写多个,如果不在要分发文件的目录下请添加路径} {分发到的服务器ip或名称:分发到的路径} 例如: cd /opt/dkh scp -r dkcrw dk2:/opt/dkh/ scp -r dkcrw dkcrw-tomcat-7.0.56/ dk2:/opt/dkh/ 6、在分发了dkcrw-tomcat-7.0.56文件的节点上给文件添加权限命令: chmod -R 755 {需要给权限的文件等} 例如: cd /opt/dkh chmod -R 755 dkcrw dkcrw-tomcat-7.0.56/ 7、启动爬虫界面命令: cd /opt/dkh/dkcrw-tomcat-7.0.56/bin/ ./startup.sh 启动界面之后再浏览器中输入启动界面节点的IP,来打开爬虫界面看是否启动成功(账号密码是默认的) 8、启动每个节点的dkcrw.jar 命令: 主节点运行 cd /opt/dkh/dkcrw/ nohup java -jar dkcrw.jar master >

Dkcrw.log 2 > & 1 &

Run from Node

Cd / opt/dkh/dkcrw/

Nohup java-jar dkcrw.jar slave > dkcrw.log 2 > & 1 &

Note: you can first use the foreground to start the crawler to make sure the crawler is not wrong.

The foreground launch command java-jar dkcrw.jar master/slave

The above is all the contents of this article "sample Analysis of big data Crawler installation". Thank you for reading! Hope to share the content to help you, more related knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.