Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to write the code for python to crawl Douban movie TOP250 data

2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/01 Report--

Today, the editor will share with you the relevant knowledge points about how to write the code for python crawling Douban movie TOP250 data. The content is detailed and the logic is clear. I believe most people still know too much about this knowledge, so share this article for your reference. I hope you can get something after reading this article. Let's take a look at it.

Before executing the program, create a database "pachong" in MySQL.

Import pymysqlimport requestsimport re# acquires resources and downloads def resp (listURL): # Connect database conn = pymysql.connect (host = '127.0.0.1, port = 3306, user =' root', password ='*', # database password enter database = 'pachong' according to your actual password Charset = 'utf8') # create database cursor cursor = conn.cursor () # create list t_movieTOP250 (execute sql statement) cursor.execute (' create table t_movieTOP250 (id INT PRIMARY KEY auto_increment NOT NULL) MovieName VARCHAR (20) NOT NULL Pictrue_address VARCHAR)') try: # crawl data for urlPath in listURL: # get web page source code response = requests.get (urlPath) html = response.text # regular expression namePat = raster = "(. *?)" Src=' imgPat = rascsrc = "https://atts.yisu.com/attachments/(.*?)" Class=' # matches regular (ranking [replace id in database) Automatically generate and sort], movie name, movie poster (picture address) res2 = re.compile (namePat) res3 = re.compile (imgPat) textList2 = res2.findall (html) textList3 = res3.findall (html) # iterate through the elements in the list And store the data in the database for i in range (len (textList3)): cursor.execute ('insert into t_movieTOP250 (movieName,pictrue_address) VALUES ("% s", "% s")'% (textList2 [I]) TextList3 [I]) # get results from cursors cursor.fetchall () # submit results conn.commit () print ("results submitted") except Exception as e: # data rollback conn.rollback () print ("data rolled back") # close database conn.close () # top250 all web pages URL: def page (url): urlList = [] for i in range (10): num = str (25roomi) pagePat = ritual startmakers'+ num +'& filter=' urL = url+pagePat urlList.append (urL) return urlListif _ _ name__ ='_ main__': url = r "https://movie.douban.com/top250" listURL = page (url) resp (listURL) These are all the contents of the article "how to write the code for python to crawl Douban movie TOP250 data". Thank you for reading! I believe you will gain a lot after reading this article. The editor will update different knowledge for you every day. If you want to learn more knowledge, please pay attention to the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report