In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-09-21 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/03 Report--
This article is about how python crawls movies and downloads them. The editor thinks it is very practical, so share it with you as a reference and follow the editor to have a look.
I. Overview
For an otaku, like to watch movies, every time you open the movie website, all kinds of pop-up ads are very troublesome, or you still have to copy and download links to Xunlei and paste and download them, and there is also choice difficulty in this process; this series of actions are very uncomfortable, so it's better to have a good one and just light it. As a python enthusiast, combined with a little knowledge of crawlers, I spent some time on the weekend writing a crawl for the latest movie section on a movie website using python.
Train of thought:
For a movie website, the crawler collects movie names, download links, ratings, and other information; specially prints out movies that are updated that day; at the same time, it calls Thunderbolt to download through ratings, of course, to judge whether it has already been downloaded, and then decide whether to download it; then, you can watch it.
This version is based on python3.x, and you can only call Xunlei ~ linux platform on windows to get relevant information!
Python installation and related module installation are not described here, if you do not understand, please leave me a message.
Run on jupyter as follows:
II. Code
Don't talk more nonsense. Let's talk about the code.
# coding:utf-8# version 20181027 by sanimport re,time,osfrom urllib import requestfrom lxml import etree # python xpath alone using imported import platformimport sslssl._create_default_https_context = ssl._create_unverified_context # cancel global certificate # crawler movies such as class getMovies: def _ _ init__ (self,url Thuder):''instance initialization' 'self.url = url self.Thuder = Thuder def getResponse (self Url): url_request = request.Request (self.url) url_response = request.urlopen (url_request) return url_response # return this object def newMovie (self):''get the latest movie download address and url' 'http_response = self.getResponse (webUrl) # the context object (HTTPResponse object) data after getting the http request = http_response.read (). Decode ('gbk') # print (data) # get web content html = etree.HTML (data) newMovies = dict () lists = html.xpath (' / html/body/div [1] / div/div [3] / div [2] / div [2] / div [1] / div/div [2] / div [2] / ul/table//a') For k in lists: if "app.html" in k.items () [0] [1] or "latest movie download" in k.text: continue else: movieUrl = webUrl + k.items () [0] [1] movieName = k.text.split ('") [1] .split (") [0] NewMovies [k.text.split ('") [1] .split (") [0]] = movieUrl = webUrl + k.items () [0] [1] return newMovies def Movieurl (self Url):''get score and update time' 'url_request = request.Request (url) movie_http_response = request.urlopen (url_request) data = movie_http_response.read (). Decode (' gbk') if len (re.findall (r'Douban score. +. + users',data)): # get the score Return null pingf = re.findall (r 'Douban score. +?. + users',data) [0] .split (' /') [0] .replace ("\ u3000", ":") else: pingf = "Douban score: null" desc = re.findall (r 'simple\ s + introduction. *', data) [0] .replace ("\ u3000", ") .replace ('' "). Split (" src ") [0]. Replace ('& ldquo',"). Replace ('& rdquo',"). Replace ('
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
The market share of Chrome browser on the desktop has exceeded 70%, and users are complaining about
The world's first 2nm mobile chip: Samsung Exynos 2600 is ready for mass production.According to a r
A US federal judge has ruled that Google can keep its Chrome browser, but it will be prohibited from
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
About us Contact us Product review car news thenatureplanet
More Form oMedia: AutoTimes. Bestcoffee. SL News. Jarebook. Coffee Hunters. Sundaily. Modezone. NNB. Coffee. Game News. FrontStreet. GGAMEN
© 2024 shulou.com SLNews company. All rights reserved.