In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/02 Report--
This article introduces the relevant knowledge of "python crawler encounters dynamic encryption". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!
A few days ago, the anti-crawling of fonts on the search page of a review merchant gave a solution, but there was another problem, that is, the method given at that time was to download the corresponding woff font file, and then establish a mapping relationship between encrypted fonts and coding to crack.
But there is a problem is that the font files of different pages are loaded dynamically, in other words, the mapping relationship you have established on this page cannot be used on another page.
So there's no solution? In fact, it is not difficult, or the other party still gives a very clear direction of thinking, because, although the font of each page is loaded dynamically, this dynamic only aims at the change of coding after font parsing, and the internal order of the font does not change. that is, as shown in the following figure
Every two pages, only the font coding has changed, and the font location order has not changed, so we just need to parse the data on each page, first extract the CSS style in the page, and then locate the font file storage link from the CSS content, and then request the corresponding font file on this page and parse to construct a matching dictionary, the following steps are the same as the previous article.
Well, at the beginning, the goal is to crawl all the business information of the specified delicacies in a city, such as locating Guangzhou to search for Shaxian snacks, and then crawl all the search pages.
The first is to construct all the URL. Since the URL of each page has a certain rule, this step is simple: extract all the pages from the first page and add them to the url_list according to the rule, and this data is not encrypted.
So this part of the code can be written like this.
Def get_url (url): headers = {"Host": "www.dianping.com", "Referer": f "{url}", "User-Agent": ua.random, "Sec-Fetch-Dest": "document", "Sec-Fetch-Mode": "navigate", "Sec-Fetch-Site": "none", "Sec-Fetch-User": "? 1" "Upgrade-Insecure-Requests": "1"} r = requests.get (url = url,headers = headers,proxies = get_ip () soup = BeautifulSoup (r.text) page_num = int (soup.find_all ('PageLink') [- 1] .text) url_list = [url + f "/ p {item1}" for i in range (page_num)] return url_list
This part of the code is not difficult to understand the construction request-parse the page-extract the number of pages-simulate URL, where get_ip () must return a usable ip, whether you are using a free or paid agent, which will not be explained in detail here.
After completing the URL, we come to the most critical step, write a function, send a page to return the text matching dictionary of that page, then the first step is to take down the font, which can be done in the following four lines of code
Css_url = "http://" + re.search (r's3plus.meituan.net/ (. *?) / svgtextcss/ (. *?) .group (0) # get the css file css_value = requests.get (css_url). Textaddr_font =" http: "+ re.search (r'address (. *?) .woff', css_value) .group (0). Split (' ') [- 1] [5:] price_font = "http:" + re.search (r'shopNum (. *?) .woff', css_value) .group (0) .split (',') [- 1] [5:]
Take a brief look at this code. After we pass in the page obtained after a request,
"
The first line of code uses a regular expression to extract the css link where the font is located
The second line of code uses requests to request css content
The last two lines of code use regular extraction of the URL where the woff font file is located.
"
If the page you sent in is normal, then we now have the font of the address and average price field URL. You can download these two font files using requests and save them locally as follows
X = requests.get (addr_font). Contentwith open ('addr.woff','wb+') as f: f.write (x) x = requests.get (price_font). Contentwith open (' price.woff','wb+') as f: f.write (x)
Now there are two font files in the working directory, and then you can follow the font encryption cracking method introduced in the previous article. So the complete code for this part is as follows:
Def get_font (page): the page after receiving the request returns two dictionary files corresponding to the url font woff file on the page''python css_url = "http://" + re.search (r's3plus.meituan.net/ (. *?) / svgtextcss/ (. *?) .css' Page.text) .group (0) # get the css file css_value = requests.get (css_url). Text addr_font = "http:" + re.search (r'address (. *?). Woff', css_value) .group (0). Split (',') [- 1] [5:] price_font = "http:" + re.search (r'shopNum (. *?) .woff', css_value) .group (0). Split (' ') [- 1] [5:] # download the font and save it locally x = requests.get (addr_font) .content with open (' addr.woff','wb+') as f: f.write (x) x = requests.get (price_font) .content with open ('price.woff') 'wb+') as f: f.write (x) # parse font font_addr = TTFont (' addr.woff') font1 = font_addr.getGlyphOrder () [2:] font1 = [Font1 [I] [- 4:] for i in range (len (font1))] font_price = TTFont ('price.woff') font2 = font_price.getGlyphOrder () [2:] font2 = [Font2 [I] [- 4:] for i in range (len (font2)) font3 = ['1' '2','3','4','5','6','7','8','9','0', 'shop', 'medium', 'beauty', 'home', 'pavilion', 'small', 'car', 'big', 'city', 'public', 'wine', 'line', 'country', 'product', 'electricity', 'gold' 'Heart', 'Karma', 'Shang','Si', 'Chao', 'Sheng', 'install', 'Garden', 'Field', 'Food', 'you', 'New', 'limit', 'Heaven', 'Noodle', 'work', 'Service', 'Sea', 'Hua', 'Water', 'Room', 'Decoration', 'City','Le', 'Automobile' 'Xiang','Bu','Li','Zi', 'Lao','Yi', 'Hua', 'Special', 'Dong', 'Meat', 'vegetable', 'Learning','Fu', 'Rice', 'Man', 'hundred', 'Food', 'Tea', 'Service', 'Tong', 'Wei', 'Suo', 'Mountain', 'District', 'Gate', 'Medicine' 'Silver', 'Nong', 'Dragon', 'stop', 'Shang','an', 'Guang', 'Xin','Yi', 'Rong', 'move', 'Nan', 'with', 'Yuan', 'Xing', 'fresh','Ji', 'Shi', 'machine', 'roast', 'Wen', 'Kang', 'letter', 'fruit', 'Yang' 'Li', 'pot', 'bao','da','Di','er', 'clothing', 'special', 'produce', 'west', 'batch', 'Fang', 'state', 'cattle', 'Jia', 'Hua', 'five', 'rice', 'repair', 'love', 'north', 'raise', 'sell', 'build', 'wood' 'San', 'Hui', 'Chicken', 'Room', 'Red', 'Station','de', 'Wang', 'Guang', 'name','Li', 'Oil', 'Hospital', 'Tang', 'Burn', 'Jiang', 'she','he', 'Xing', 'goods', 'Type', 'Village','Zi','Ke', 'Fast' 'Ben','Ri', 'Min', 'Camp', 'and', 'Live', 'Tong', 'Ming', 'utensils', 'smoke','Yu', 'Bin', 'Jing', 'House', 'Jing', 'Zhuang', 'Shi', 'Shun', 'Lin','er', 'County', 'hand', 'Hall', 'Sale' 'use', 'good', 'guest', 'fire', 'elegant', 'Sheng', 'body', 'travel', 'Zhi', 'shoes', 'spicy', 'make', 'powder', 'bag', 'building', 'school', 'fish', 'ping', 'color','up', 'bar', 'bao', 'Yong', 'Wan', 'things' 'teach', 'eat', 'set up', 'doctor', 'Zheng', 'build', 'Feng', 'Jian', 'point', 'soup', 'net', 'Qing', 'skill','Si', 'wash', 'material', 'match', 'Hui', 'wood', 'margin', 'plus','Ma', 'Lian', 'Wei', 'Sichuan' 'tai', 'Sex', 'Shi', 'Fang','Yu', 'Wind', 'Young', 'Sheep', 'Hot', 'Lai', 'Gao', 'Factory', 'Lan','A', 'Bei','Pi', 'Quan', 'female', 'pull', 'Cheng', 'Yun', 'Wei', 'Trade', 'Tao', 'Shu' 'Yun','du','Bo','he', 'Rui', 'Hong', 'Jing','Ji','Lu', 'Xiang', 'Qing', 'Town', 'Kitchen', 'Pei','Li', 'Hui', 'Lian','Ma', 'Hong', 'Gang', 'training', 'Shadow','A', 'help' 'window', 'cloth','Fu', 'card', 'head','Si', 'duo', 'makeup','Ji', 'Yuan', 'Sha', 'Heng', 'long', 'Chun', 'dry', 'cake', 'Shi','Li','er', 'Guan', 'Cheng', 'make', 'sell', 'Jia', 'long' 'xuan','Za', 'deputy', 'Qing', 'plan', 'Huang', 'Xun', 'Tai', 'duck','No', 'Street', 'Jiao', 'and', 'fork', 'attached', 'near', 'floor', 'side', 'right', 'alley', 'Dong', 'ring', 'province', 'bridge', 'lake' 'Duan', 'Xiang', 'Xia','Fu', 'shop', 'inside', 'side', 'Yuan', 'purchase', 'front', 'building', 'shore', 'place', 'Xiang', 'block', 'lower','Fu', 'Feng', 'Hong Kong', 'open', 'close', 'King', 'spring', 'pond', 'Fang' 'Chang', 'Line', 'Bay', 'Zheng', 'step', 'Ning', 'Jie', 'Bai', 'Tian', 'cho','Xi', 'Shi','Ba','Gu', 'Shuang', 'Sheng', 'Ben', 'Shan', 'Tong', 'Nine', 'Ying','Di', 'Tai', 'Jade', 'Jin' 'bottom', 'Hou','Qi', 'Xie', 'period','Wu', 'Ling', 'Song', 'Jiao','Ji', 'Chao', 'Feng', 'six', 'Zhen', 'Zhu', 'Bureau', 'Gang', 'Zhou', 'Heng', 'Bian','Ji', 'Jing', 'Ban', 'Han', 'Dai' 'Lin', 'Nong', 'Tuan', 'Wai','Ta', 'Yang', 'tie','Pu','Zi', 'Nian', 'Island', 'Ling', 'Yuan', 'Mei', 'Jin', 'Rong', 'you', 'Hong', 'Yang', 'Gui', 'along', 'Shi', 'Jin', 'Kai', 'Lian' 'Ding', 'Xiu', 'Liu','Ji', 'Purple', 'Flag', 'Zhang','Gu','of', 'yes','no','no', 'very', 'also','Yi', 'this','I', 'just','to', 'wrong','no' Go, go, feel, times, want, compare, feel, see, get, say, often, really, we, but, most, like, ha, Mo, don't, bit, can, more, environment, non, Wei 'Huan', 'ran','he', 'quite', 'price', 'that', 'meaning', 'kind', 'think', 'out', 'member', 'two', 'push','do', 'row', 'Shi', 'divide', 'between', 'sweet', 'degree','Qi', 'full', 'give', 'hot' 'finish','GE', 'recommend', 'drink', 'wait', 'its','re','a few', 'only', 'present', 'friend', 'Hou', 'sample', 'straight', 'and' buy','Yu', 'like', 'beans', 'quantity', 'selection', 'milk', 'beat', 'every', 'comment' 'less', 'count', 'you', 'cause', 'love', 'find', 'some', 'share', 'buy', 'fit', 'what', 'egg', 'teacher','Qi', 'you', 'sister', 'great', 'try', 'total', 'fixed','ah', 'foot', 'grade', 'whole', 'belt' Shrimp, such as, 'state', 'and', 'taste', 'Lord', 'words', 'strong', 'when', 'more', 'board', 'know', 'self', 'nothing', 'acid', 'let', 'enter','la', 'style', 'laugh', 'like', 'slice', 'sauce', 'poor', 'like' 'lift', 'team','go', 'tender', 'Cai', 'just', 'noon', 'pick up', 'heavy', 'string', 'back', 'late', 'micro', 'Zhou', 'value', 'Fei', 'sex', 'table', 'beat', 'follow', 'block', 'tune' Font_addr_data = dict (map (lambda xpeny: [xrecoy], font1,font3)) font_price_data = dict (map (lambda xrecoy: [xrecoy], font2,font3)) return font_addr_data,font_price_data
The only thing to note is that the page passed here is the content that you directly request the current page to return, such as
Page = requests.get (url = url,headers = headers,proxies = get_ip ()) .text
You need to make sure that the page here contains the content correctly, and it will not be executed correctly if it is a page after 403 or a page that prompts you to enter a CAPTCHA.
So at this point, we have figured out how to crawl all the search page information when each page's font file is dynamically loaded, and then we just need to write a loop to crawl all the URL in the url_list and use pandas to save it.
This is the end of the content of "python crawler encounters dynamic encryption". Thank you for your reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.