Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How does python climb the ranking of Chinese Universities in 2020

2025-02-25 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

This article introduces the relevant knowledge of "how python climbs the ranking of Chinese universities in 2020". In the operation of actual cases, many people will encounter such a dilemma. Next, let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

Crawl Chinese university rankings request to get htmlbeautiful soup parsing web pages re regular expression matching content new and save excelfrom bs4 import BeautifulSoup # web page parsing get data import re # regular expressions for text matching import urllib.request Urllib.error # formulate url to obtain web page data import xlwtdef main (): baseurl = "http://m.gaosan.com/gaokao/265440.html" # 1 crawl web page datalist = getData (baseurl) savepath =" Chinese university rankings .xls "saveData (datalist,savepath) # regular expression paiming = re.compile (r'(. *). *') # create hyperlink regular expression object Represents a string pattern Rule xuexiao = re.compile (ringing. * (. *). *') defen = re.compile (ringing. Html. (. *). *. *') xingji = re.compile (rus.* (. *). *') cengci = re.compile (rus.* (. *)') # crawling web pages def getData (baseurl): datalist = [] html = askURL (baseurl) # Save the obtained web page source code # print (html) # [one by one] parse the data (one page is parsed once) soup = BeautifulSoup (html) "html.parser") # soup is the parsed tree structure object for item in soup.find_all ('tr'): # find the string formation list that meets the requirements # print (item) # Test View item all data = [] # Save all information about a school item = str (item) # ranking paiming1 = re.findall (paiming Item) # re regular expression lookup specified string 0 indicates that as long as the first is preceded by the standard and followed by the range # print (paiming1) if (not paiming1): pass else: print (paiming1 [0]) data.append (paiming1) if (paiming1 in data): # School name xuexiao1 = re.findall (xuexiao) Item) [0] # print (xuexiao1) data.append (xuexiao1) # score defen1 = re.findall (defen, item) [0] # print (defen1) data.append (defen1) # Star xingji1 = re.findall (xingji Item) [0] # print (xingji1) data.append (xingji1) # hierarchy cengci1 = re.findall (cengci Item) [0] # print (cengci1) data.append (cengci1) # print ('-'* 80) datalist.append (data) # put a processed school information into datalist return datalist# to specify a url web page information content def askURL (url): # my initial visit user agent head = {# simulated browser header information "User-Agent" used to send messages to Douban server in disguise: "Mozilla/5.0 (Windows NT 10.0) WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.116 Safari/537.36 "} # user agent means to tell Douban server what kind of machine we are-the essence of the browser is to tell the browser what level of file content request = urllib.request.Request (url) we can accept. Headers=head) # access url with header information # access html with request object = "" try: response = urllib.request.urlopen (request) # pass encapsulated request object html = response.read () through urlopen. Decode ("utf-8") # read read can decode and prevent garbled # print (html) except urllib.error.URLError as e: if hasattr (e) "code"): print (e.code) # print error code if hasattr (e, "reason"): print (e.reason) # print error reason return html# 3 Save data def saveData (datalist, savepath): book = xlwt.Workbook (encoding= "utf-8") Style_compression=0) # create workbook object style compression effect sheet = book.add_sheet ('China University ranking', cell_overwrite_ok=True) # create worksheet A form cell covers for i in range (0640): print ("% d"% (I + 1)) data = datalist [I] # print (data) for j in range (0 5): # each row of data is saved into sheet.write (I, j, data [j]) # data book.save (savepath) # Save data table # main function if _ _ name__ = = "_ _ main__": # when the program is executed # # call the function execution entry main () # init_db ("movietest.db") print ("crawl complete!")

The specific implementation results are as follows

This is the content of "how python climbs the ranking of Chinese Universities in 2020". Thank you for your reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report