In addition to Weibo, there is also WeChat
Please pay attention

WeChat public account
Shulou
 
            
                     
                
2025-10-26 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
Most people do not understand the knowledge points of this article "how to use Python to crawl all the plots of TV series", so the editor summarizes the following content for you. The content is detailed, the steps are clear, and it has a certain reference value. I hope you can gain something after reading this article. Let's take a look at this article "how to use Python to climb all the plots of TV dramas."
[sample code]
# coding=utf-8# @ Auther: Pengge thief excellent # @ Date: 2019-8-7
From bs4 import BeautifulSoupimport requestsimport getheader
# obtain the corresponding title of each episode and the corresponding interface URL key address def get_title (): url = "https://www.tvsou.com/storys/0d884ba0dd/" headers= getheader.getheaders () r = requests.get (url, headers=headers) r.encoding =" utf-8 "soup = BeautifulSoup (r.text," lxml ") temps = soup.find (" ul ") Class_= "m-l14 clearfix episodes-list teleplay-lists"). Find_all ("li") tempurllist = [] titlelist = [] for temp in temps: tempurl = temp.a.get ("href") title = temp.a.get ("title") tempurllist.append (tempurl) titlelist.append (title) return tempurllist, titlelist
# download all the plots after episode x of 12 hours in Chang'an, which starts from the first episode by default. Def Changan (episode=1): tempurllist_b, titlelist_b = get_title () tempurllist = tempurllist_b [(episode-1):] titlelist = titlelist_b [(episode-1):] baseurl = "https://www.tvsou.com" for I, tempurl in enumerate (tempurllist): print (" downloading article {0} ".format (str (I + episode) url = baseurl + tempurl r = requests.get (url) Headers=getheader.getheaders () r.encoding = "utf-8" soup = BeautifulSoup (r.text, "lxml") result = soup.find ("pre") Class_= "font-16 color-3 mt-20 pre-content") .find_all ("p") content = [] for temp in result: if temp.string: content.append (temp.string) with open ("test.txt" "a") as f: f.write (titlist [I] + "\ n") f.writelines (content) f.write ("\ n")
If _ _ name__ = = "_ _ main__": Changan (43)
[the effect is as follows]
[knowledge points]
1. How to automatically obtain the corresponding URL address for each set?
First check the crawled content of the first episode, and find that there is a piece of information about each episode in the response, as shown below:
 
As you can see from this response message, each set corresponds to a href, and then in the first set of URL addresses, "https://www.tvsou.com/storys/0d884ba0dd/" happens to have some URL addresses that match href." Then verify the next second episode URL, and find that it is indeed the corresponding href. So you get a way to get the URL addresses of each set automatically.
2. How to climb the plot content of each episode?
Take the first episode as an example, you can see such a paragraph in the response.
 
In the class_= "font-16 color-3 mt-20 pre-content" tag, there is plot content. However, because there are multiple p tags in this response, each p tag corresponds to a piece of content. Therefore, text extraction is required for each p tag. And since the first p label is
Therefore, a non-null judgment is required
The above is about the content of this article on "how to use Python to crawl all the plots of TV dramas". I believe we all have a certain understanding. I hope the content shared by the editor will be helpful to you. If you want to know more related knowledge, please pay attention to the industry information channel.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

The market share of Chrome browser on the desktop has exceeded 70%, and users are complaining about

The world's first 2nm mobile chip: Samsung Exynos 2600 is ready for mass production.According to a r


A US federal judge has ruled that Google can keep its Chrome browser, but it will be prohibited from

Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope





 
             
            About us Contact us Product review car news thenatureplanet
More Form oMedia: AutoTimes. Bestcoffee. SL News. Jarebook. Coffee Hunters. Sundaily. Modezone. NNB. Coffee. Game News. FrontStreet. GGAMEN
© 2024 shulou.com SLNews company. All rights reserved.