In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
This article mainly explains "python starting point monthly ticket list font anti-climbing method is what", interested friends might as well take a look. The method introduced in this paper is simple, fast and practical. Next let the editor to take you to learn the "python starting point monthly ticket list font anti-climbing method is what" it!
1. Analytical process
The old rule, ha, let's first enter the starting point monthly ticket list F12 to debug, find the monthly ticket data corresponding to the title of the book, and try to extract it using xpath.
You can see just 20 pieces of data, and then look for monthly ticket data:
This is why the xpath retrieved 20 pieces of data but the data is empty, and the data in element shows as unknown symbols, which seems to have no data. At this time, we look at the source code and search the keyword font-face to see this kind of code that we do not understand. This is the coding of the font mentioned in the preface.
Next, let's find the font file packet.
Woff file and the request address is the same as the address seen above, but it should be noted here that each request address is different and the file name is not the same, so we need to crawl the font encrypted data separately every time we crawl. The font encrypted data can be parsed using the third-party library fonttools.
We now have:
1. Title of the book
two。 Ciphertext of monthly ticket data
3. Font file corresponding to monthly ticket data ciphertext
two。 Start tapping the code.
First, define the function get_book_name that gets the book's title and test it:
Import requestsfrom lxml import etree def get_book_name (xml_obj): name_list = xml_obj.xpath ("/ / div [@ class='book-mid-info'] / h5/a/text ()") return name_list if _ _ name__ = ='_ _ main__': # set our generic request header to avoid being intercepted by anti-crawling headers_ = {'user-agent':' Mozilla/5.0 (Windows NT 10.0; Win64) X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.45 Safari/537.36', 'referer':' https://www.qidian.com/rank/', 'cookie':' e1=%7B%22pid%22%3A%22qd_P_rank_19%22%2C%22eid%22%3A%22%22%2C%22l1%22%3A5%7D E2 percent% 7B% 22pid% 22% 3A% 22qdcharacters ranking 19% 22% 2C% 22eid% 22% 3A% 22% 22% 2C% 2211% 22% 3A5% 7D; _ yep_uuid=6a2ad124-678f-04d3-7195-2e4e9f5c470e; _ gid=GA1.2.501012674.1638335311; newstatisticUUID=1638335311_1217304635; _ csrfToken=adBfL5dzru0KuzVgLJpxtsE8zQcfgZT8MzKf0aMs; e2percent; e1=%7B%22pid%22%3A%22qd_p_qidian%22%2C%22eid%22%3A%22qd_A16%22%2C%22l1%22%3A3%7D _ ga_FZMMH98S83=GS1.1.1638362844.2.1.1638362855.0; _ ga_PFYW0QLV3P=GS1.1.1638362844.2.1.1638362855.0; _ ga=GA1.2.2025243050.1638335311 _ gat_gtag_UA_199934072_2=1'} url_ = 'https://www.qidian.com/rank/yuepiao/' # request web page source code str_data = requests.get (url_, headers=headers_). Text # uses the xpath parsing book title xml_obj = etree.HTML (str_data) print (get_book_name (xml_obj)) # [' starting from the Red Moon', 'Terran garrison' "all-attribute martial arts", "the other side of the deep sky", "my Yunyang girlfriend", "I use leisure books to become a saint", "13 Mingke Street", "Stargate", "Eastern Jin Dynasty Beifu Yichuba", "Night naming art", "this fairy is too serious", "top luck". Practice quietly for thousands of years, 'unscientific royal beast','my healing game', 'this game is so real', 'long night afterfire', 'pure heart survey the sky', 'reincarnation paradise', 'Hedao', 'Universe Professionals']
two。 Request monthly ticket data ciphertext and test:
Import re import requestsfrom lxml import etree # get the title def get_book_name (xml_obj): name_list = xml_obj.xpath ("/ / div [@ class='book-mid-info'] / h5/a/text ()") return name_list # get the encrypted data of monthly ticket def get_yuepiao (str_data): # here we analyzed and found that the data extracted by xpath is null We use re regular matching directly to the web source code to get the encrypted data yuepiao_list=re.findall (rattling'(. *)', str_data) return yuepiao_list if _ _ name__ = ='_ _ main__': # to set our general request header to avoid being intercepted by anti-crawling headers_ = {'user-agent':' Mozilla/5.0 (Windows NT 10.0) Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.45 Safari/537.36', 'referer':' https://www.qidian.com/rank/', 'cookie':' e1=%7B%22pid%22%3A%22qd_P_rank_19%22%2C%22eid%22%3A%22%22%2C%22l1%22%3A5%7D E2 percent% 7B% 22pid% 22% 3A% 22qdcharacters ranking 19% 22% 2C% 22eid% 22% 3A% 22% 22% 2C% 2211% 22% 3A5% 7D; _ yep_uuid=6a2ad124-678f-04d3-7195-2e4e9f5c470e; _ gid=GA1.2.501012674.1638335311; newstatisticUUID=1638335311_1217304635; _ csrfToken=adBfL5dzru0KuzVgLJpxtsE8zQcfgZT8MzKf0aMs; e2percent; e1=%7B%22pid%22%3A%22qd_p_qidian%22%2C%22eid%22%3A%22qd_A16%22%2C%22l1%22%3A3%7D _ ga_FZMMH98S83=GS1.1.1638362844.2.1.1638362855.0; _ ga_PFYW0QLV3P=GS1.1.1638362844.2.1.1638362855.0; _ ga=GA1.2.2025243050.1638335311 _ gat_gtag_UA_199934072_2=1'} url_ = 'https://www.qidian.com/rank/yuepiao/' # request web page source code str_data = requests.get (url_, headers=headers_). Text # uses the xpath parsing book title xml_obj = etree.HTML (str_data) print (get_book_name (xml_obj)) # [' starting from the Red Moon', 'Terran garrison' "all-attribute martial arts", "the other side of the deep sky", "my Yunyang girlfriend", "I use leisure books to become a saint", "13 Mingke Street", "Stargate", "Eastern Jin Dynasty Beifu Yichuba", "Night naming art", "this fairy is too serious", "top luck". Practice quietly for thousands of years, 'unscientific Imperial Beast','my Healing Game', 'this game is so real', 'long Night afterfire', 'Heart Survey', 'reincarnation Paradise', 'Hedao', 'Universe Professionals'] print (get_yuepiao (str_data)) # ['𘢒𘢒𘢐𘢉𘢎', '𘢌𘢋𘢐𘢐', '𘢌𘢏𘢌𘢑', '𘢑𘢉𘢒𘢎' '𘢑𘢎𘢋𘢔', '𘢑𘢏𘢏𘢉', '𘢎𘢑𘢐𘢉', '𘢎𘢏𘢑𘢔', '𘢎𘢒𘢉𘢏', '𘢏𘢎𘢏𘢎', '𘢔𘢑𘢑𘢔', '𘢔𘢎𘢐𘢓', '𘢔𘢔𘢑𘢐', '𘢔𘢒𘢑𘢏', '𘢔𘢐𘢒𘢐', '𘢒𘢌𘢉𘢌', '𘢒𘢑𘢌𘢓', '𘢒𘢑𘢎𘢓', '𘢒𘢎𘢓𘢉', '𘢒𘢎𘢏𘢐']
3. Get the correspondence in the font encryption file:
Install the fonttools library
Due to the use of the fonttools library for the first time, I encountered the following error when querying Baidu that it may be due to the wrong font file name. Change the name to url and successfully extract the key value.
(it is also possible that url errors were caused by irregularities when I used re regular extraction font_url.)
It's just that how to code this key-value pair corresponds to English, why does the programmer make it difficult for the programmer? not to mention that we have to define a dictionary corresponding to English and Arabic numerals to replace English.
Def get_font (xml_obj, headers_): # use xpath and re to get font encryption packet address font_div = xml_obj.xpath ("/ / span/style/text ()") [0] font_url = re.findall ("eot.*? (https:.*?.woff)", font_div) [0] font_name = str (font_url) .rsplit ('/' 1) [1] # get the font file and save it locally font_data = requests.get (font_url, headers_) .content with open (f'{font_name}') 'wb') as f: f.write (font_data) # load the font file font_data = TTFont (f' {font_name}') # font_data.saveXML ('font .xml') font_doct01 = font_data.getBestCmap () font_doct02 = {'period':'., 'zero':' 0mm, 'one':' 1' 'two':' 2, 'three':' 3, 'four':' 4, 'five':' 5, 'six':' 6, 'seven':' 7, 'eight':' 8' 'nine':' 9'} for i in font_doct01: font_ doct01 [I] = font_ doct02 [font _ doct01] return font_doct01
The program runs perfectly:
The general code is as follows:
Import re import requestsfrom lxml import etreefrom fontTools.ttLib import TTFont # get the title def get_book_name (xml_obj): name_list = xml_obj.xpath ("/ / div [@ class='book-mid-info'] / h5/a/text ()") return name_list # get the encrypted data of monthly ticket def get_yuepiao (str_data): # here we analyzed and found that the data extracted by xpath is null We use re regular matching directly to the source code of the web page to get the encrypted data yuepiao_list = re.findall (ringing codes'(. *?)', str_data) return yuepiao_list def get_font (xml_obj Headers_): # use xpath and re to get font encryption packet address font_div = xml_obj.xpath ("/ / span/style/text ()") [0] font_url = re.findall ("eot.*? (https:.*?.woff)", font_div) [0] font_name = str (font_url) .rsplit ('/' 1) [1] # get the font file and save it locally font_data = requests.get (font_url, headers_) .content with open (f'{font_name}') 'wb') as f: f.write (font_data) # load the font file font_data = TTFont (f' {font_name}') # font_data.saveXML ('font .xml') font_doct01 = font_data.getBestCmap () font_doct02 = {'period':'., 'zero':' 0mm, 'one':' 1' 'two':' 2, 'three':' 3, 'four':' 4, 'five':' 5, 'six':' 6, 'seven':' 7, 'eight':' 8' For i in font_doct01: font_ doct01 [I] = font_ doct02 [font _ doct01 [I]] return font_doct01 def jiemi (miwen_list, font_list): yuepiao = [] for i in miwen_list: num =''mw_list=re.findall (' & # (. *?) ', I) for j in mw_list: num + = font_ list [int (j)] yuepiao.append (int (num)) return yuepiao if _ _ name__ = =' _ _ main__': # set our generic request header to avoid being intercepted by anti-crawling headers_ = {'user-agent':' Mozilla/5.0 (Windows NT 10.0; Win64 X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.45 Safari/537.36', 'referer':' https://www.qidian.com/rank/', 'cookie':' e1=%7B%22pid%22%3A%22qd_P_rank_19%22%2C%22eid%22%3A%22%22%2C%22l1%22%3A5%7D E2 percent% 7B% 22pid% 22% 3A% 22qdcharacters ranking 19% 22% 2C% 22eid% 22% 3A% 22% 22% 2C% 2211% 22% 3A5% 7D; _ yep_uuid=6a2ad124-678f-04d3-7195-2e4e9f5c470e; _ gid=GA1.2.501012674.1638335311; newstatisticUUID=1638335311_1217304635; _ csrfToken=adBfL5dzru0KuzVgLJpxtsE8zQcfgZT8MzKf0aMs; e2percent; e1=%7B%22pid%22%3A%22qd_p_qidian%22%2C%22eid%22%3A%22qd_A16%22%2C%22l1%22%3A3%7D _ ga_FZMMH98S83=GS1.1.1638362844.2.1.1638362855.0; _ ga_PFYW0QLV3P=GS1.1.1638362844.2.1.1638362855.0; _ ga=GA1.2.2025243050.1638335311 _ gat_gtag_UA_199934072_2=1'} url_ = 'https://www.qidian.com/rank/yuepiao/' # request web page source code str_data = requests.get (url_, headers=headers_). Text # uses the xpath parsing book title xml_obj = etree.HTML (str_data) # print (get_book_name (# xml_obj)) # [' starting from the Red Moon', 'Terran garrison' "all-attribute martial arts", "the other side of the deep sky", "my Yunyang girlfriend", "I use leisure books to become a saint", "13 Mingke Street", "Stargate", "Eastern Jin Dynasty Beifu Yichuba", "Night naming art", "this fairy is too serious", "top luck". Practice quietly for thousands of years, 'unscientific Imperial Beast','my Healing Game', 'this game is so real', 'long Night afterfire', 'Sky Survey', 'reincarnation Paradise', 'Hedao', 'Universe Professionals'] # print (get_yuepiao (# str_data)) # ['𘢒𘢒𘢐𘢉𘢎', '𘢌𘢋𘢐𘢐', '𘢌𘢏𘢌𘢑' '𘢑𘢉𘢒𘢎', '𘢑𘢎𘢋𘢔', '𘢑𘢏𘢏𘢉', '𘢎𘢑𘢐𘢉', '𘢎𘢏𘢑𘢔', '𘢎𘢒𘢉𘢏', '𘢏𘢎𘢏𘢎', '𘢔𘢑𘢑𘢔', '𘢔𘢎𘢐𘢓', '𘢔𘢔𘢑𘢐', '𘢔𘢒𘢑𘢏', '𘢔𘢐𘢒𘢐', '𘢒𘢌𘢉𘢌', '𘢒𘢑𘢌𘢓', '𘢒𘢑𘢎𘢓', '𘢒𘢎𘢓𘢉' '𘢒𘢎𘢏𘢐'] # print (get_font (xml_obj, headers_)) # title list book_name_list = get_book_name (xml_obj) # monthly ticket list yuepiao_list = jiemi (get_yuepiao (str_data), get_font (xml_obj) Headers_)) for i in range (len (book_name_list)): print (f'{book_name_ list [I]}: {yuepiao_ list [I]}') so far I believe that everyone on the "python starting point monthly ticket list font anti-climbing method is what" have a deeper understanding, might as well to the actual operation of it! Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.