In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-23 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/03 Report--
Editor to share with you how to use metaclass attributes in Python3, I believe most people do not know much about it, so share this article for your reference, I hope you can learn a lot after reading this article, let's go to know it!
Use of metaclass attributes
Code
Mainly about the use of metaclass
Capture some of the properties of the crawler class generated by the metaclass. Here, for the crawl function, the crawl function, which starts with the same character, generates a list of attributes, which can be called continuously. The goal is to simply add different crawling functions to crawl different sites without adjusting the rest of the class.
Part of the code:
Class ProxyMetaclass (type): def _ new__ (cls, name, bases, attrs): count = 0 attrs ['_ CrawlFunc__'] = [] for k, v in attrs.items (): if 'crawl_' in k: attrs [' _ CrawlFunc__'] .append (k) count + = 1 attrs ['_ CrawlFuncCount__'] = count return type.__new__ (cls, name, bases, attrs) class Crawler (object) Metaclass=ProxyMetaclass): def get_proxies (self, callback): proxies = [] for proxy in eval ("self. {} ()" .format (callback)): print ('get proxy successfully', proxy) proxies.append (proxy) return proxies def crawl_daili66 (self Page_count=4): "" get agent 66: param page_count: page number: return: agent "" start_url = 'http://www.66ip.cn/{}.html' urls = [start_url.format (page) for page in range (1, page_count + 1)] for url in urls: print (' Crawling') " Url) html = get_page (url) if html: doc = pq (html) trs = doc ('.containerbox table tr:gt (0)'). Items () for tr in trs: ip = tr.find ('td:nth-child (1)'). Text () port = tr.find ('td:nth-child (2)'). Text () yield': '.join ([ip, port])
Testing method
#! / usr/bin/env python#-*-coding: utf-8-*-# @ Time: 4:10 on 12-19-19 PM# @ Author: yon# @ Email: @ qq.com# @ File: testimport jsonimport refrom pyquery import PyQuery as pqclass ProxyMetaclass (type): def _ new__ (cls, name, bases, attrs): count = 0 attrs ['_ CrawlFunc__'] = [] for k V in attrs.items (): print ("print k") print (k) print ("print v") print (v) if 'crawl_' in k: attrs [' _ CrawlFunc__'] .append (k) count + = 1 attrs ['_ CrawlFuncCount__'] = count return type.__new__ (cls, name, bases, attrs) class Crawler (object, metaclass=ProxyMetaclass): def get_proxies (self) Callback): proxies = [] for proxy in eval ("self. {} ()" .format (callback)): print ('successfully obtained proxy', proxy) proxies.append (proxy) return proxies def crawl_daili66 (self Page_count=4): "" get agent 66: param page_count: page number: return: agent "" start_url = 'http://www.66ip.cn/{}.html' urls = [start_url.format (page) for page in range (1, page_count + 1)] for url in urls: print (' Crawling') " Url) html = get_page (url) if html: doc = pq (html) trs = doc ('.containerbox table tr:gt (0)'). Items () for tr in trs: ip = tr.find ('td:nth-child (1)'). Text () port = tr.find ('td:nth-child (2)'). Text () yield': '.join ([ip) Port]) def crawl_ip3366 (self): for page in range (1,4): start_url = 'http://www.ip3366.net/free/?stype=1&page={}'.format(page) html = get_page (start_url) ip_address = re.compile ('\ s * (. *?)\ s * (. *?)) #\ s * matching spaces Re_ip_address = ip_address.findall (html) for address, port in re_ip_address: result = address +':'+ port yield result.replace ('','') def crawl_kuaidaili (self): for i in range (1) 4): start_url = 'http://www.kuaidaili.com/free/inha/{}/'.format(i) html = get_page (start_url) if html: ip_address = re.compile (' (. *)') Re_ip_address = ip_address.findall (html) port = re.compile ('(. *)') Re_port = port.findall (html) for address, port in zip (re_ip_address, re_port): address_port = address +':'+ port yield address_port.replace (',') def crawl_xicidaili (self): for i in range (1 3): start_url = 'http://www.xicidaili.com/nn/{}'.format(i) headers = {' Accept': 'text/html,application/xhtml+xml,application/xml Qcoach 0.9 imageCharpy webpCharge imageUniverse apngMagneTime * Html 0.8, 'Cookie':' _ free_proxy_session=BAh7B0kiD3Nlc3Npb25faWQGOgZFVEkiJWRjYzc5MmM1MTBiMDMzYTUzNTZjNzA4NjBhNWRjZjliBjsAVEkiEF9jc3JmX3Rva2VuBjsARkkiMUp6S2tXT3g5a0FCT01ndzlmWWZqRVJNek1WanRuUDBCbTJUN21GMTBKd3M9BjsARg%3D%3D--2a69429cb2115c6a0cc9a86e0ebe2800c0d471b3', 'Host':' www.xicidaili.com', 'Referer':' http://www.xicidaili.com/nn/3', 'Upgrade-Insecure-Requests':' 1,} html = get_page (start_url) Options=headers) if html: find_trs = re.compile ('(. *)' Re.S) trs = find_trs.findall (html) for tr in trs: find_ip = re.compile ('(\ d +\.\ d +)') re_ip_address = find_ip.findall (tr) find_port = re.compile ('(\ d +)') re_port = find_port.findall (tr) for address Port in zip (re_ip_address, re_port): address_port = address +':'+ port yield address_port.replace (',') def crawl_ip3366 (self): for i in range (1 4): start_url = 'http://www.ip3366.net/?stype=1&page={}'.format(i) html = get_page (start_url) if html: find_tr = re.compile (' (. *?)', re.S) trs = find_tr.findall (html) for s in range (1 Len (trs): find_ip = re.compile ('(\ d +\.\ d +)') re_ip_address = find_ip.findall (TRS [s]) find_port = re.compile ('(\ d +)') re_port = find_port.findall (TRS [s]) for address, port in zip (re_ip_address Re_port): address_port = address +':'+ port yield address_port.replace ('','') def crawl_iphai (self): start_url = 'http://www.iphai.com/' html = get_page (start_url) if html: find_tr = re.compile ((. *?), re.S) trs = find_tr.findall (html) for s in range (1 Len (trs): find_ip = re.compile ('\ s + (\ d +\.\ d +), re.S) re_ip_address = find_ip.findall (TRS [s]) find_port = re.compile ('\ s + (\ d +)\ re.S) re_port = find_port.findall (TRS []) for address, port in zip (re_ip_address) Re_port): address_port = address +':'+ port yield address_port.replace (',') def crawl_data5u (self): start_url = 'http://www.data5u.com/free/gngn/index.shtml' headers = {' Accept': 'text/html,application/xhtml+xml,application/xml Qimagination 0.9 gzip gzip, deflate', 'Accept-Language':' en-US,en;q=0.9,zh-CN;q=0.8,zh Upgrade-Insecure-Requests': 0.7, 'Cache-Control':' max-age=0', 'Connection':' keep-alive', 'Cookie':' JSESSIONID=47AA0C887112A2D83EE040405F837A86', 'Host':' www.data5u.com', 'Referer':' http://www.data5u.com/free/index.shtml', 'Upgrade-Insecure-Requests':' 1, 'User-Agent':' Mozilla/5.0 (Macintosh Intel Mac OS X 1013.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.108 Safari/537.36',} html = get_page (start_url, options=headers) if html: ip_address = re.compile ('(\ d +\.\ d +\.\ d +). *?
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.