In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly explains "Python how to crawl the whole station novel". The explanation content in this article is simple and clear, easy to learn and understand. Please follow the idea of Xiaobian slowly and deeply to study and learn "Python how to crawl the whole station novel" together!
Development environment:
Version: anaconda 5.2.0 (python 3.6.5)
Editor: pycharm Community Edition
PS: If you need Python learning materials, you can add the group below to find a free administrator to receive
Click on the group to get free Python learning materials
You can receive source code, project actual video, PDF files, etc. for free
Start with code:
1. Import tools
import requestsimport parsel
2. Fake browser environment
headers = { # "Cookie": "bcolor=; font=; size=; fontcolor=; width=; Hm_lvt_3806e321b1f2fd3d61de33e5c1302fa5=1596800365,1596800898; Hm_lpvt_3806e321b1f2fd3d61de33e5c1302fa5=1596802442", "Host": "www.shuquge.com", "Referer": "http://www.shuquge.com/txt/8659/index.html", "User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.92 Safari/537.36",}
3. Analyze websites and crawl novels
def download_one_chapter(url_chapter, book): "Crawl a chapter of a novel"" #Analyzed from within the browser response = requests.get(url_chapter, headers=headers) # response.apparent_encoding #Adaptive coding, universal accuracy is 99% response.encoding = response.apparent_encoding # print(response.text) """Extract Data""" """ Tool bs4 parsel xpath css re """ #Convert html to extract object #How to do ID class twice sel = parsel.Selector(response.text) h2 = sel.css('h2::text') title = h2.get() print(title) content = sel.css('#content ::text').getall() # print(content) # text = "".join(content) # print(text) # w write """Write data""" # with open(title + '.txt', mode='w', encoding='utf-8') as f: with open(book + '.txt', mode='w', encoding='utf-8') as f: f.write(title) f.write('\n') for line in content: f.write(line.strip()) f.write ('\n')"" Crawling a novel will have many chapters """# download_one_chapter ('http://www.shuquge.com/txt/8659/2324752.html')# download_one_chapter ('http://www.shuquge.com/txt/8659/2324753.html') def download_one_book(book_url): response = requests.get(book_url, headers=headers) response.encoding = response.apparent_encoding html = response.text sel = parsel.Selector(html) title = sel.css('h3::text').get() index_s = sel.css('body > div.listmain > dl > dd > a::attr(href)').getall() print(index_s) for index in index_s: print(book_url[:-10] + index) one_chapter_url = book_url[:-10] + index download_one_chapter(one_chapter_url, title)
1. Exception will not try except
2. Error retry After reporting an error, try again, or record it and re-request
What does it take to download a novel?
download_one_book('http://www.shuquge.com/txt/8659/index.html')download_one_book('http://www.shuquge.com/txt/122230/index.html')download_one_book('http://www.shuquge.com/txt/117456/index.html')
Download each chapter according to its address Download each novel according to its catalog page Download one novel
Download the entire site of fiction-> Download all categories of fiction-> Download each page of fiction below each category
Effect of running the code:
Thank you for reading, the above is the content of "Python how to crawl the whole station novel", after the study of this article, I believe everyone has a deeper understanding of Python how to crawl the whole station novel, the specific use situation still needs everyone to practice verification. Here is, Xiaobian will push more articles related to knowledge points for everyone, welcome to pay attention!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 285
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.