Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How does Python climb the novels of the whole station?

2025-04-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

This article mainly explains "Python how to crawl the whole station novel". The explanation content in this article is simple and clear, easy to learn and understand. Please follow the idea of Xiaobian slowly and deeply to study and learn "Python how to crawl the whole station novel" together!

Development environment:

Version: anaconda 5.2.0 (python 3.6.5)

Editor: pycharm Community Edition

PS: If you need Python learning materials, you can add the group below to find a free administrator to receive

Click on the group to get free Python learning materials

You can receive source code, project actual video, PDF files, etc. for free

Start with code:

1. Import tools

import requestsimport parsel

2. Fake browser environment

headers = { # "Cookie": "bcolor=; font=; size=; fontcolor=; width=; Hm_lvt_3806e321b1f2fd3d61de33e5c1302fa5=1596800365,1596800898; Hm_lpvt_3806e321b1f2fd3d61de33e5c1302fa5=1596802442", "Host": "www.shuquge.com", "Referer": "http://www.shuquge.com/txt/8659/index.html", "User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.92 Safari/537.36",}

3. Analyze websites and crawl novels

def download_one_chapter(url_chapter, book): "Crawl a chapter of a novel"" #Analyzed from within the browser response = requests.get(url_chapter, headers=headers) # response.apparent_encoding #Adaptive coding, universal accuracy is 99% response.encoding = response.apparent_encoding # print(response.text) """Extract Data""" """ Tool bs4 parsel xpath css re """ #Convert html to extract object #How to do ID class twice sel = parsel.Selector(response.text) h2 = sel.css('h2::text') title = h2.get() print(title) content = sel.css('#content ::text').getall() # print(content) # text = "".join(content) # print(text) # w write """Write data""" # with open(title + '.txt', mode='w', encoding='utf-8') as f: with open(book + '.txt', mode='w', encoding='utf-8') as f: f.write(title) f.write('\n') for line in content: f.write(line.strip()) f.write ('\n')"" Crawling a novel will have many chapters """# download_one_chapter ('http://www.shuquge.com/txt/8659/2324752.html')# download_one_chapter ('http://www.shuquge.com/txt/8659/2324753.html') def download_one_book(book_url): response = requests.get(book_url, headers=headers) response.encoding = response.apparent_encoding html = response.text sel = parsel.Selector(html) title = sel.css('h3::text').get() index_s = sel.css('body > div.listmain > dl > dd > a::attr(href)').getall() print(index_s) for index in index_s: print(book_url[:-10] + index) one_chapter_url = book_url[:-10] + index download_one_chapter(one_chapter_url, title)

1. Exception will not try except

2. Error retry After reporting an error, try again, or record it and re-request

What does it take to download a novel?

download_one_book('http://www.shuquge.com/txt/8659/index.html')download_one_book('http://www.shuquge.com/txt/122230/index.html')download_one_book('http://www.shuquge.com/txt/117456/index.html')

Download each chapter according to its address Download each novel according to its catalog page Download one novel

Download the entire site of fiction-> Download all categories of fiction-> Download each page of fiction below each category

Effect of running the code:

Thank you for reading, the above is the content of "Python how to crawl the whole station novel", after the study of this article, I believe everyone has a deeper understanding of Python how to crawl the whole station novel, the specific use situation still needs everyone to practice verification. Here is, Xiaobian will push more articles related to knowledge points for everyone, welcome to pay attention!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 285

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report