Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Data crawler technology example: Daxuai online crawler installation tutorial

2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/03 Report--

Online crawler is an important part of big fast big data integration development framework, this article focuses on sharing the installation of online crawler.

Preparations before crawler installation: the installation of big data platform is completed, zookeeper, redis, elasticsearch, mysql and other components are successfully installed and started.

1, modify the crawler installation configuration file (preferably modified offline and then upload the platform)

2. Modify crawler\dkcrw\jdbc.properties configuration file (only modify the contents in the picture and other contents by default)

The address entered in Hbase.zookeeper.quorum should be viewed on the DKM monitoring platform:

Redis related configuration see the following interface:

3. Replace the jdbc.properties configuration file under the modified crawler\dkcrw\with\crawler\dkcrw-tomcat-7.0.56\webapps\ROOT\WEB-INF\classes (there is a direct replacement that has not been changed)

After modification, the modified crawler file is compressed into a compressed file

4, upload the platform master node and extract (here will not introduce how to upload, this example is uploaded to the root directory, the installation package uploaded to any directory can be selected root directory)

unzip decompression command, decompression after singing will be more than a cuawler folder

Use cd crawler to enter the crawler folder

Use mysql -uroot -p123456

< numysql.sql 命令添加numysql.sql数据库 5、分发爬虫文件 每个节点都需要有dkcrw文件, dkcrw-tomcat-7.0.56文件只能放在一个节点上,不能放在主节点上(推选放在从节点) 命令: scp -r {要分发的文件名可填写多个,如果不在要分发文件的目录下请添加路径} {分发到的服务器ip或名称:分发到的路径} 例如: cd /opt/dkh scp -r dkcrw dk2:/opt/dkh/ scp -r dkcrw dkcrw-tomcat-7.0.56/ dk2:/opt/dkh/ 6、在分发了dkcrw-tomcat-7.0.56文件的节点上给文件添加权限 命令: chmod -R 755 {需要给权限的文件等} 例如: cd /opt/dkh chmod -R 755 dkcrw dkcrw-tomcat-7.0.56/ 7、启动爬虫界面 命令: cd /opt/dkh/dkcrw-tomcat-7.0.56/bin/ ./startup.sh 启动界面之后再浏览器中输入启动界面节点的IP,来打开爬虫界面看是否启动成功(账号密码是默认的) 8、启动每个节点的dkcrw.jar 命令: 主节点运行 cd /opt/dkh/dkcrw/ nohup java -jar dkcrw.jar master >

dkcrw.log 2>&1 &

Run from node

cd /opt/dkh/dkcrw/

nohup java -jar dkcrw.jar slave > dkcrw.log 2>&1 &

Note: You can use the foreground to start the crawler first, to make sure that the crawler has no errors.

foreground startup command java -jar dkcrw.jar master/slave

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report