Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to climb the online battery car information of Zhongguancun by node.js

2025-01-20 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/01 Report--

This article mainly introduces the relevant knowledge of "how node.js crawls Zhongguancun online battery car information". The editor shows you the operation process through an actual case. The operation method is simple, fast and practical. I hope this article "how node.js crawls Zhongguancun online battery car information" can help you solve the problem.

Steps

The first step is to introduce the required libraries

Var cheerio = require ('cheerio'); var fetch = require (' node-fetch'); / / cheerio is a browser-like jQuery used to parse HTML's / / fetch to send requests

The second step is to set the initial crawling entrance (I am in Hangzhou, so the area chose Hangzhou?)

/ / initial urlvar url = "http://detail.zol.com.cn/convenienttravel/hangzhou/#list_merchant_loc"// because there is a relative path under each a tag, a root address is needed to concatenate, as follows: var urlRoot =" http://detail.zol.com.cn" / / stores all url. Set is used to prevent repeated crawls to var urls = new Set () / / store all data var data = []

Train of thought:

Each time you get 48 links on the current page and click in, get the name and price of the battery car (other information can be obtained in the same way, just change it yourself?)

When the first page is complete, turn to the next page and continue climbing until the end of the last page.

First of all, we define a function as follows

/ / this is to get 48 links for each page and start sending the request function ad (arg) {/ / parameter arg regardless of / / localize the link to be crawled let url2 = arg | | url / / request the first page of the page, after getting the data, copy it to appvar app = await fetch (url2) .then (res= > res.text ()) / / then pretend to parse var $= cheerio.load (app) / / get the a tag var ele = $("# J_PicMode a.pic") of all the battery cars on the current page / / store the crawled url Prevent repeated crawling var old_urls = [] var urlapp = [] / / after getting all the a tag addresses, it is stored in the array and the for to start climbing later (let I = 0) I

< ele.length; i++) {old_urls.push(fetch(urlRoot+$(ele[i]).attr('href')).then(res=>

Res.text ())} / / throw a piece of URL to promise to process urlapp = await Promise.all (old_urls) / / after processing is complete, add jQuery?for (let I = 0; I) in the loop

< urlapp.length; i++) {let $2 = cheerio.load(urlapp[i],{decodeEntities: false})data.push({name:$2(".product-model__name").text(),price:$2(".price-type").text()})}// 至此,一页的数据就爬完了// console.log(data);// 然后开始爬取下一页var nextURL = $(".next").attr('href')// 判断当前页是不是最后一页if (nextURL){let next = await fetch(urlRoot+nextURL).then(res=>

Res.text () / / get the label on the next page, get the address, and go to ad (urlRoot+nextURL)} return data} ad ()

The complete code is as follows

Var cheerio = require ('cheerio'); var fetch = require (' node-fetch'); var url = "http://detail.zol.com.cn/convenienttravel/hangzhou/#list_merchant_loc"var urlRoot =" http://detail.zol.com.cn"// var url = "http://localhost:3222/app1"var urls = new Set () var data = [] async function ad (arg) {let url2 = arg | | url Var app = await fetch (url2) .then (res= > res.text ()) var $= cheerio.load (app) var ele = $("# J_PicMode a.pic") var old_urls = [] var urlapp = [] for (let I = 0; I

< ele.length; i++) {old_urls.push(fetch(urlRoot+$(ele[i]).attr('href')).then(res=>

Res.text ())} urlapp = await Promise.all (old_urls) for (let I = 0; I

< urlapp.length; i++) {let $2 = cheerio.load(urlapp[i],{decodeEntities: false})data.push({name:$2(".product-model__name").text(),price:$2(".price-type").text()})}var nextURL = $(".next").attr('href')if (nextURL){let next = await fetch(urlRoot+nextURL).then(res=>

Res.text () ad (urlRoot+nextURL)} return data} ad () about "how node.js crawls Zhongguancun's online battery car information" ends here. Thank you for reading. If you want to know more about the industry, you can follow the industry information channel. The editor will update different knowledge points for you every day.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report