In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly introduces "how to use python3 to climb the comic island". In the daily operation, I believe that many people have doubts about how to use python3 to climb the comic island. The editor consulted all kinds of materials and sorted out a simple and easy-to-use method of operation. I hope it will be helpful to answer the doubts about "how to use python3 to climb the comic island". Next, please follow the editor to study!
First, the effect display
The first is the comic book page we want to climb: http://www.manhuadao.cn/.
Screenshot of the web page:
The second is the effect of climbing down:
Every time the folder goes like this: (because of the problem with the pictures on the website. So that's what it looks like)
Second, the analysis principle
1, preparation: need vscode or other software that can compile and run python, recommend python version 3.x, otherwise there may be compilation problems.
Download the required module: win+R enter the command line, enter pipinstall to download. For example:
Pip install beautifulsoup4
2. Principle: click on the simulation browser-> Open the comic page link-> get the web page source code-> locate the link for each chapter of the comic book-> simulate click-> get the picture page source code-> locate the picture link-> download the picture.
3. Actual operation (the code is attached at the end)
1. Introduce the module (I will not elaborate on it here)
2. Simulate the browser to visit the web page
(1) here we open the catalogue page of the comics as follows: url = "http://www.manhuadao.cn/Home/ComicDetail?id=58ddb07827a7c1392c234628", this link is the catalog page link.
(2) Press F12 to open the source code of this page (Google browser), select the top NetWork,Ctrl+R refresh.
(3) find the source file that loads the web page, and click Headers, as shown in the following figure: StatusCode indicates the code returned by the web page. A value of 200 indicates that the visit is successful.
(4) the parameters in headers are the following red box User-Agent.
Response = requests.get (url=url, headers=headers) # simulated visit web page print (response) # here should output print (response.text) # output web page source code
The two outputs are output respectively:
A return of 200 from the output indicates that the access was successful.
(excerpt)
(5) save the html code in data, and xpath locates the link to each chapter. Click the Element above, and click:
Move the mouse over the catalog:
Links to each chapter appear in the code area on the right:
Data = etree.HTML (response.text) # tp = data.xpath ("/ / ul [@ class=" read-chapter "] / li/a [@ class=" active "] / @ href") tp = data.xpath ("/ / * [@ class=" yesReader "] / @ href") zhang_list = tp # tp is the link list
Output zhang_list, and the result is as follows:
(6) get the link to the picture (the way to get it is the same as the previous step)
Click on the first chapter and find a link to the picture in the previous step:
I=1for next_zhang in zhang_list: # cycle i=i+1 juni0 hui_url = r_url+next_zhang name1 = "th" + str (I) + "back" file = "C:/Users/wangyueke/Desktop/" + keyword+ "/ {} /" .format (name1) # create folder if not os.path.exists (file): os.makedirs (file) print ("create folder:" File) response = requests.get (url=hui_url, headers=headers) # simulated access to each chapter link data = etree.HTML (response.text) # tp = data.xpath ("/ / div [@ class=" no-pic "] / / img/@src") tp = data.xpath ("/ / div [@ class=" main-content "] / / ul//li//div [@ class=" no-pic "] / / img/@src") # locate ye_list = tp
(7) download pictures
For k in ye_list: # cycle download_url = tp [j] print (download_url) j=j+1 file_name= "+ str (j) +" page "response = requests.get (url=download_url) # Analog access Picture Link with open (file+file_name+" .jpg) in the list of picture links in each chapter "wb") as f: f.write (response.content) 5. The code "" is used to crawl the target URL of the non-human cartoon: http://www.manhuadao.cn/ start time: 20:01:26 on 2019-8-14 completion time: 2019-8-15 11:04:56 author: kong_gu "" import requestsimport jsonimport timeimport osfrom lxml import etreefrom bs4 import BeautifulSoupdef main (): keyword= "non-human" file = "keyword / {}" .format (keyword) if not os.path.exists (file): Os.mkdir (file) print ("create folder:" File) rroomurl = "http://www.manhuadao.cn/" url=" http://www.manhuadao.cn/Home/ComicDetail?id=58ddb07827a7c1392c234628" headers = {# simulated browser access web page "User-Agent": "Mozilla/5.0 (Windows NT 10.0) Win64 X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.142 Safari/537.36 "} response = requests.get (url=url) Headers=headers) # print (response.text) # output web page source code data = etree.HTML (response.text) # tp = data.xpath ("/ / ul [@ class=" read-chapter "] / li/a [@ class=" active "] / @ href") tp = data.xpath ("/ / * [@ class=" yesReader "] / @ href") zhang_list = tp iTunes 1 for next_zhang in zhang_list: i=i+1 Jroom0 hui_url = r_url+next_zhang name1 = "th" + str (I) + "back" file = "C:/Users/wangyueke/Desktop/" + keyword+ "/ {} /" .format (name1) # here you need to set the path if not os.path.exists (file): os.makedirs (file) print ("create folder:" File) response = requests.get (url=hui_url Headers=headers) data = etree.HTML (response.text) # tp = data.xpath ("/ / div [@ class=" no-pic "] / / img/@src") tp = data.xpath ("/ / div [@ class=" main-content "] / ul//li//div [@ class=" no-pic "] / / img/@src") ye_list = tp for k in ye_list: Download_url = tp [j] print (download_url) j=j+1 file_name= "th" + str (j) + "page" response = requests.get (url=download_url) with open (file+file_name+ ".jpg" "wb") as f: f.write (response.content) if _ _ name__ = = "_ _ main__": main () so far The study on "how to use python3 to climb Comic Island" is over. I hope I can solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.