In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly shows you "how to use nodejs to climb and download more than 10, 000 pictures", the content is easy to understand, clear, hope to help you solve your doubts, the following let the editor lead you to study and learn "how to use nodejs to climb and download more than 10, 000 pictures" this article.
Crawl a picture
First initialize the project and install axios and cheerio
Npm init-y & & npm i axios cheerio
Axios is used to crawl web content. Cheerio is the jquery api on the server side. We use it to get the image address in dom.
Const axios = require ('axios') const cheerio = require (' cheerio') function getImageUrl (target_url, containerEelment) {let result_list = [] const res = await axios.get (target_url) const html = res.data const $= cheerio.load (html) const result_list = [] $(containerEelment). Each (element) = > {result_list.push ($(element). Find ('img'). Attr (' src')}) return result_list}
In this way, you can get the picture url on the page. Next, you need to download the image according to url.
How to download files using nodejs
Method 1: use the built-in modules' https' and 'fs'
Downloading files using nodejs can be done using built-in packages or third-party libraries.
The GET method is used for HTTPS to get the file to download. CreateWriteStream () is a method for creating a writable stream that takes only one parameter, the location where the file is saved. Pipe () is a method to read data from a readable stream and write it to a writable stream.
Const fs = require ('fs') const https = require (' https') / / URL of the imageconst url = 'GFG.jpeg'https.get (url, (res) = > {/ / Image will be stored at this path const path = `$ {_ _ dirname} / files/ img.jpeg` const filePath = fs.createWriteStream (path) res.pipe (filePath) filePath.on (' finish', () = > {filePath.close () console.log ('Download Completed')})}))
Method 2: DownloadHelper
Npm install node-downloader-helper
The following is the code to download the picture from the website. An object dl is created by the class DownloadHelper, which takes two parameters:
The image to be downloaded.
The path to the image must be saved after download.
The File variable contains the URL,filePath variable of the image to be downloaded, which contains the path to the file to be saved.
Const {DownloaderHelper} = require ('node-downloader-helper') / / URL of the imageconst file =' GFG.jpeg'// Path at which image will be downloadedconst filePath = `${_ _ dirname} / Files`Const dl = new DownloaderHelper (file, filePath) dl.on ('end', () = > console.log (' Download Completed')) dl.start ()
Method 3: use download
It's written by sindresorhus, the god of npm. It's very easy to use.
Npm install download
The following is the code to download the picture from the website. The download function receives files and file paths.
Const download = require ('download') / / Url of the imageconst file =' GFG.jpeg'// Path at which image will get downloadedconst filePath = `${_ _ dirname} / files`download (file, filePath). Then (() = > {console.log ('Download Completed')}) final code
Originally wanted to climb Baidu wallpaper, but the clarity is not enough, and there are watermarks, and so on, later, a small partner in the group found an api, estimated to be a high-definition wallpaper on a mobile phone APP, you can directly get the downloaded url, I directly used it.
Here is the complete code
Const download = require ('download') const axios = require (' axios') let headers = {'User-Agent':' Mozilla/5.0 (Macintosh) Intel Mac OS X 11: 1: 0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36',} function sleep (time) {return new Promise ((reslove) = > setTimeout (reslove, time))} async function load (skip = 0) {const data = await axios. Get ('http://service.picasso.adesk.com/v1/vertical/category/4e4d610cdf714d2966000000/vertical', {headers) Params: {limit: 30, / / A fixed return of 30 skip: skip, first: 0, order: 'hot',} per page ) .then ((res) = > {return res.data.res.vertical}) .catch ((err) = > {console.log (err)}) await downloadFile (data) await sleep (3000) if (skip)
< 1000) { load(skip + 30) } else { console.log('下载完成') }}async function downloadFile(data) { for (let index = 0; index < data.length; index++) { const item = data[index] // Path at which image will get downloaded const filePath = `${__dirname}/美女` await download(item.wp, filePath, { filename: item.id + '.jpeg', headers, }).then(() =>{console.log (`Download ${item.id} Completed`) return})}} load ()
In the above code, you need to first set User-Agent and set a 3s delay, which prevents the server from blocking the crawler and returns 403 directly.
Direct node index.js will automatically download pictures.
、
The above is all the contents of the article "how to use nodejs to crawl and download more than 10, 000 pictures". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.