In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-29 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
This article will explain in detail how to use Python to find the girl you like. The content of the article is of high quality, so the editor shares it for you as a reference. I hope you will have a certain understanding of the relevant knowledge after reading this article.
Let's start with the effect picture, no pic say bird!
I wrote an article about crawling girls' data, which mainly uses selenium to simulate web page operation, then uses dynamic loading, and then uses xpath to extract web page data, but this method is not efficient.
So today I'm going to add another efficient way to get data. Since there is no simulation operation, everything can be controlled manually, so you don't need to open a web page to get the data!
But we need to analyze this page. After opening the web page http://www.lovewzly.com/jiaoyou.html, press F12 to enter the Network entry.
Url after the filter, only page is changing, and it is page by page accumulation, and we open the url in the browser, we will get a batch of json strings, so I can directly manipulate the json data in it, and then store it!
Code structure diagram:
Operation procedure:
Headers must build anti-hotlink and simulate browser operations. Write this first to avoid follow-up problems!
Conditional assembly
Then remember to convert the data to json format
Then extract the json data
Put the extracted data into a file or store it.
The main techniques learned are:
Learning requests+urllib
Manipulate execl
File operation
String
Exception handling
In addition, other foundations
Request data:
Def craw_data (self):''data crawling' 'headers = {' Referer': 'http://www.lovewzly.com/jiaoyou.html',' User-Agent': 'Mozilla/5.0 (Windows NT 10.0 WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.104 Safari/537.36 Core/1.53.4620.400 QQBrowser/9.7.13014.400'} page = 1 while True: query_data = {'page':page,' gender':self.gender, 'starage':self.stargage 'endage':self.endgage, 'stratheight':self.startheight,' endheight':self.endheight, 'marry':self.marry,' salary':self.salary } url = 'http://www.lovewzly.com/api/user/pc/list/search?'+urllib.urlencode(query_data) print url req = urllib2.Request (url, headers=headers) response = urllib2.urlopen (req). Read () # print response self.parse_data (response) page + = 1
Field extraction:
Def parse_data (self Response):''data parsing' 'persons = json.loads (response). Get (' data'). Get ('list') if persons is None: print' data has been requested. Return for person in persons: nick = person.get ('username') gender = person.get (' gender') age = 2018-int (person.get ('birthdayyear')) address = person.get (' city') heart = person.get ('monolog') height = person.get (' height') img_url = person.get ('avatar') education = person.get (' education') print nick Age,height,address,heart,education self.store_info (nick,age,height,address,heart,education,img_url) self.store_info_execl (nick,age,height,address,heart,education,img_url)
File storage:
Def store_info (self, nick,age,height,address,heart,education,img_url):''save photos, talk to their inner monologues' if age < 22: tag = 'under 22' elif 22
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.