In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-10 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/03 Report--
This article introduces the relevant knowledge of "how to use Python crawler to obtain URL Meitu". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!
The crawler of the Python study course: crawling the street to take beautiful pictures
1. Grab the bag
two。 View parameter information
Look at a few more pages to see the rule, the main change is nothing more than offset,timestamp, where the stamp is a 13-bit timestamp, and then change the search term according to keyword, you can change the offset value to achieve the page turning operation, the rest are fixed items
3. Data parsing
You can get a specific column in the returned data. In image_list, there are links to all the pictures. We parse this column, and then download the pictures according to title.
4. Process analysis
Build url to initiate a request, change the value of offset to perform convenient operations, parse the returned json data, set up a folder based on title name, and download pictures in title_num format if the column contains pictures.
Import requestsimport osimport timeheaders = {'authority':' www.toutiao.com', 'method':' GET', 'path':' / api/search/content/?aid=24&app_name=web_search&offset=100&format=json&keyword=%E8%A1%97%E6%8B%8D&autoload=true&count=20&en_qc=1&cur_tab=1&from=search_tab&pd=synthesis×tamp=1556892118295', 'scheme':' https', 'accept':' application/json, text/javascript', 'accept-encoding':' gzip, deflate Br', 'accept-language':' zh-CN,zh Win64 0.9, 'content-type':' application/x-www-form-urlencoded', 'referer':' https://www.toutiao.com/search/?keyword=%E8%A1%97%E6%8B%8D', 'user-agent':' Mozilla/5.0 (Windows NT 6.1; Win64) X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36', 'xMurray requestedMutual withdrawing: 'XMLHttpRequest',} def get_html (url): return requests.get (url) Headers=headers) .json () def get_values_in_dict (list): result = [] for data in list: result.append (data ['url']) return resultdef parse_data (url): text = get_html (url) for data in text [' data']: if 'image_list' in data.keys (): title = data [' title'] .replace ('|' '') img_list = get_values_in_dict (data ['image_list']) else: continue if not os.path.exists (' Street beat /'+ title): os.makedirs ('Street beat /' + title) for index, pic in enumerate (img_list): with open (Street beat / {} / {} .jpg '.format (title, index + 1) 'wb') as f: f.write (requests.get (pic) .content) print ("Download {} Successful" .format (title)) def get_num (num): if isinstance (num, int) and num% 20 = = 0: return num else: return 0def main (num): for i in range (0, get_num (num) + 1 20): url = 'https://www.toutiao.com/api/search/content/?aid={}&app_name={}&offset={}&format={}&keyword={}&'\' autoload= {} & count= {} & cur_tab= {} & from= {} & pd= {} & timestamp= {} '.format (24,' web_search', I, 'json',' Street beat', 'true', 20, 1, 1,' search_tab', 'synthesis' Str (time.time ()) [: 14] .replace ('.',') parse_data (url) if _ _ name__ = ='_ main__': main (40) "how to use Python crawler to obtain the URL Meitu" is introduced here. Thank you for your reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.