Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to use python to capture embarrassing encyclopedic jokes

2025-02-22 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/02 Report--

This article will explain in detail how to use python to capture embarrassing encyclopedic jokes. The editor thinks it is very practical, so I share it with you as a reference. I hope you can get something after reading this article.

Crawl process: pass in the parameters starting url and output file name, use urllib2 to crawl the page, one page at a time, cycle through to the last page. Use regular expressions to extract the crawled page content and save it to a file. The procedure is as follows:

#-*-coding: utf-8import urllib2import urllibimport re,osimport timeclass Joke: # initialization data def _ _ init__ (self,start_url,out_put_file): self.start_url = start_url self.out_put_file = out_put_file self.page = 2 self.user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5) Windows NT) 'self.headers = {' User-Agent': self.user_agent} # method to get page content def get_cotent (self,page): try: url = self.start_url + str (page) +'/? request 4955352' request = urllib2.Request (url) Headers=self.headers) response = urllib2.urlopen (request) act_url = response.geturl () print 'init url=',url,'act url=',act_url if url= = act_url: content = response.read () return content else: return None except urllib2.URLError, e: if hasattr (e "reason"): print u "failed to connect to Encyclopedia of embarrassing stories, error reason", e.reason return None # passed the page code Return joke content def get_joke (self,page): joke_content = self.get_cotent (page) str =''if not joke_content: print "crawled" return None pattern = re.compile ('. *? (. *?). *? +. *? (. *?)', re.S) items = re.findall (pattern) Joke_content) for item in items: str = str + 'publisher:' + item [0] +'\ njoke + 'publish content:' +'\ njoke + item [1] +'\ njoke +'\ n' return str # method to save captured jokes to file def writeStr2File (self,out_put_file,str1,append ='a'): # remove the file Keep the path. For example, after the following code,'a _ will change to'a _ out_put_file [: out_put_file.rfind ('/')] # if the folder does not exist in the given path Then create if not os.path.exists (subPath): os.makedirs (subPath) # Open the file and write the str content to the given file with open (out_put_file, append) as f: f.write (str1.strip () +'\ n') # start crawling the page content, one page at a time All pages def start_crawl (self): while True: joke_str = self.get_joke (self.page) if not joke_str: break time.sleep (1) # print (joke_str) self.writeStr2File (self.out_put_file) until crawled Joke_str) self.page+=1spider = Joke (what are the five characteristics of http://www.qiushibaike.com/hot/page/','d:/python/test/out.txt')spider.start_crawl()python? what are the five characteristics of python: 1. It's easy to learn, and when you develop a program, you focus on solving problems, not understanding the language itself. two。 Object-oriented, compared with other major languages such as C++ and Java, Python implements object-oriented programming in a very powerful and simple way. 3. Portability, Python programs can run on a variety of platforms without modification. 4. Explanation, programs written in Python do not need to be compiled into binary code and can be run directly from the source code. 5. Open source, Python is one of FLOSS (Free / Open Source Software).

This is the end of this article on "how to use python to capture embarrassing encyclopedic jokes". I hope the above content can be of some help to you, so that you can learn more knowledge. if you think the article is good, please share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report