How to crawl website pictures with Node.js 07/12 Update SLTechnology News&Howtos

How to crawl website pictures with Node.js

2025-07-12 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/01 Report--

This article mainly introduces "Node.js how to achieve crawling website pictures". In daily operation, I believe that many people have doubts about how Node.js crawls website pictures. The editor consulted all kinds of materials and sorted out simple and easy-to-use methods of operation. I hope it will be helpful to answer the doubts of "how Node.js can achieve crawling website pictures". Next, please follow the editor to study!

Knowledge points involved

To develop a small crawler, the knowledge involved is as follows:

Https module, mainly for users to obtain network resources, such as: Web source code, picture resources and so on.

Cheerio module, mainly used for parsing html source code, and can be accessed to find the content of html nodes.

Fs module, mainly used for file read and write operations, such as saving pictures, logs and so on.

Closures, mainly for asynchronous operations, object isolation protection.

Introduction to cheerio what is cheerio?

Cheerio is a fast, flexible and implemented core implementation of jQuery that is specially customized for the server. It is mainly used to parse html on the server side. The features are as follows:

Easy to use, syntax similar to jQuery syntax, removing all DOM inconsistencies and browser embarrassment from the jQuery library.

Parsing is fast, eight times faster than JSDOM.

Flexible, Cheerio encapsulates compatible htmlparser. Cheerio can parse almost any HTML and XML document.

Install cheerio

First, on the command line, change to the program directory, and then enter the installation command to install, as follows:

Cnpm install cheerio

The installation process is as follows:

Preparatory work

Before writing a crawler, you first need to analyze the target content. What you need to crawl this time is the image content of a star type on a certain website. After analysis, it is found that all the pictures are in the img in the a tag in every li under ul. This time, you only need to parse out the src attribute of img to get the download path of the image. As follows:

Core code

After the above analysis, the code is written through Node.js, which is divided into two steps to obtain the url path of all images, that is, to parse the src attribute of all target img elements. Then download the specific picture and save it.

Reference the required functional modules, as follows:

Var https = require ('https'); var cheerio = require (' cheerio'); var fs = require ('fs')

Get and parse the contents of the html page, as follows:

/ / crawled URL var addrs= ['https://www.*****.com/topic/show_27202_1.html','https://www.******.com/topic/show_27202_2.html','https://www.*****.com/topic/show_27202_3.html'];var logger = fs.createWriteStream ('. / download/log.txt', {flags:'a+',autoClose:'true'}) For (i in addrs) {(function (num) {var addr = addrs [num]; / / create directory var p1 = new Promise (function (resolve,reject) {fs.access ('. / download',function (err) {if (err) {fs.mkdir ('. / download',function (e) {if (e) {console.log ('creation failed') }});} else {resolve ("success");}});}); p1.then (function (datas) {var html='' Var p2 = new Promise (function (resolve,reject) {https.get (addr,function (res) {res.on ('data',function (data) {html+=data.toString ();}) res.on (' end',function () {resolve ("success");}) });}); p2.then (function (data) {/ / after the download is completed, parse const $= cheerio.load (html); var lis = $('# img-list-outer'). Find ('li'); for (var jinko Tipj)

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.