In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-26 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
This article introduces the relevant knowledge of "what is the method of js confusing Crawler Weather Network". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!
Confusion encryption website today encountered the following website when crawling pollutants, generally speaking, it encountered two major crawler problems. (confuse encryption and debug detection).
Website to be crawled
Data acquisition
Right-click disable, debug detection is encountered at the beginning.
I can't help it, ctrl+s, just save the website locally.
In this way, you can f12, find the main page, look through, and find the code of the ajax request.
Pass in the city, month, and then call the method in js, then ctrl+shift+f, and search for this thing globally.
Copy to the local look, okay, js confusion, find an anti-confusion js.
Copy to the local, a ctrl+c, and finally find the following code, a look at the logic, first encrypt the incoming parameters, in the post request, get the encrypted result, in the decryption result.
Clear your mind and finally write the code.
Note that everyone's encryption parameters are different (or every once in a while), so you may not be able to run after copying mine, and you can use this routine to get your own js.
Python code #-*-coding: utf-8-*-import execjsimport jsonimport requestsimport datetimeclass pollutionSpider: "" crawling https://www.aqistudy.cn/historydata/daydata.php pollutant data "" def _ _ init__ (self): self.js_path = ".. / data/aqistudy.js" self.main_url = 'https://www.aqistudy.cn/historydata/api/historyapi.php' " Self.month_data = {"1": "01" "2": "02", "3": "03", "4": "04", "5": "05", "6": "06", "7": "07", "8": "08", "9": "09", "10": "10", "11": "11" "12": "12"} self.save_path = ".. / data/weather/" self.headers = {'Accept':' text/html,application/xhtml+xml,application/xml Qcoach 0.9 application/x-www-form-urlencoded', application/x-www-form-urlencoded', User-Agent': Mozilla/5.0 (Windows NT 10.0; Win64) X64) AppleWebKit/537.36 (KHTML, like Gecko)''Chrome/71.0.3578.80 Safari/537.36'} self.data_headers = "time_point aqi pm2_5 pm10 so2 no2 co o3 rank quality" def encrypt (self, city) Month): "" encrypted message "" js_str = self.get_js () ctx = execjs.compile (js_str) # load JS file return ctx.call ('pLoXOmdsuMMq', "GETDAYDATA", {"city": city, "month": month}) def decrypt (self) Data): "" decrypt information "" ctx = execjs.compile (self.get_js ()) # load JS file return ctx.call ('dSMkMq14l49Opc37Yx', data) def get_js (self): "get js" f = open (self.js_path,' r') Encoding='utf-8') # Open the JS file line = f.readline () html_str =''while line: html_str = html_str + line line = f.readline () return html_str def get_response (self, params): "" request data "" return requests.post (self.main_url) Data= {'hzbDyDmL0': params}, headers=self.headers) .text def get_single (self, city, month): "get a city's monthly data" encrypt_data = self.get_response (self.encrypt (city) " Month)) data = json.loads (self.decrypt (encrypt_data)) ['result'] [' data'] ['items'] result = ['\ t'.join ([str (value) for key, value in element.items ()]) for element in data] return result def get_all (self, city) Start_day): "get a city pollution data" print ("start getting" + city + "data -") start_day = datetime.datetime.strptime (start_day) "% Y-%m-%d") end_day = datetime.datetime.now () months = (end_day.year-start_day.year) * 12 + end_day.month-start_day.month month_range = [% s% s'% (start_day.year + mon / / 12, self.month_ data [STR (mon% 12 + 1)]) for mon in range (start_day.month-1) Start_day.month + months)] f = open (self.save_path + city + ".txt", "w", encoding= "utf8") f.write (self.data_headers + "\ n") for element in month_range: try: data = self.get_single (city Element) for line in data: f.write (line + "\ n") print (element + city + "data acquisition-success") except Exception as e: print (e) print (element + city + "data acquisition-failed") f.close () if _ _ name__ = ='_ main__': pollutionSpider () .get_all ("Shanghai") "2015-1-1") result
This is the end of the content of "what is the method of js confusing Crawler Weather Network". Thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 281
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.