Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to grab the chapters of a novel with node

2025-03-29 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)05/31 Report--

In this article, the editor introduces in detail "how to use node to grab novel chapters". The content is detailed, the steps are clear, and the details are handled properly. I hope this article "how to use node to grab novel chapters" can help you solve your doubts.

If you are going to use electron to make a novel reading tool to practice your hands, then the first thing to solve is the data problem, that is, the text of the novel.

Here we are going to use nodejs to crawl the novel website. If we try to climb the next novel, the data will not be stored in the database. Use txt as the text storage first.

For requests for websites in node, there are http and https libraries and request request methods inside.

Example:

Request = https.request (TestUrl, {encoding:'utf-8'}, (res) = > {let chunks =''res.on (' data', (chunk) = > {chunks + = chunk}) res.on ('end',function () {console.log (' request end');})})

But that's it, just accessing a html text data, and not being able to extract internal elements (it can also be taken regularly, but it's too complex).

I store the accessed data through the fs.writeFile method, which is just the html of the whole web page.

But what I want is also the content of each chapter, so I need to get the hyperlink of the chapter and form a hyperlink list to crawl it.

Cheerio library

In the documentation, you can use examples for debugging

Parsing HTML using cheerio

When cheerio parses html, the dom node is obtained in a similar way to jquery.

According to the html of the front page of the book, find the dom node data you want.

Const fs = require ('fs') const cheerio = require (' cheerio'); / / introduce read method const {getFile, writeFun} = require ('. / requestNovel') let hasIndexPromise = getFile ('. / hasGetfile/index.html'); let bookArray = []; hasIndexPromise.then ((res) = > {let htmlstr = res; let $= cheerio.load (htmlstr)) Map ((index, item) = > {let name = $(item) .text (), href = 'https://www.shuquge.com/txt/147032/' + $(item) .attr (' href') if (index > 11) {bookArray.push ({name, href})}) / / console.log (bookArray) writeFun ('. / hasGetfile/hrefList.txt') JSON.stringify (bookArray),'w')})

Print the message.

You can store this information at the same time.

Now that you have the number of chapters and the links to them, you can get the contents of the chapters.

Because batch crawling finally requires an IP agent, it is not ready to write a method to get the content of a chapter of the novel for the time being.

Crawling the content of a chapter is actually relatively simple:

/ / crawl the content method of a chapter function getOneChapter (n) {return new Promise ((resolve, reject) = > {if (n > = bookArray.length) {reject ('not found')} let name = bookArray [n] .name Request = https.request (bookArray [n] .href, {encoding:'gbk'}, (res) = > {let html =''res.on (' data', chunk= > {html + = chunk;}) res.on ('end', () = > {let $= cheerio.load (html)) Let content = $("# content"). Text (); if (content) {/ / write as txt writeFun (`. / hasGetfile/$ {name} .txt`, content,'w') resolve (content) } else {reject ('not found')})}) request.end ();})} getOneChapter (10)

In this way, you can create a calling interface according to the above method, pass in different chapter parameters, and get the data of the current chapter.

Const express = require ('express'); const IO = express (); const {getAllChapter, getOneChapter} = require ('. / readIndex') / / get chapter hyperlink list getAllChapter (); IO.use ('/ book',function (req, res) {/ / Parameter let query = req.query; if (query.n) {/ / get a chapter data let promise = getOneChapter (parseInt (query.n-1)) Promise.then ((d) = > {res.json ({d: d})}, (d) = > {res.json ({d: d})})} else {res.json ({d: 404})}) / / Digital IO.listen of the local host of the server. ;}) after reading this, the article "how to grab novel chapters with node" has been introduced. If you want to master the knowledge points of this article, you still need to practice and use it yourself. If you want to know more about related articles, welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report