In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-11 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly shows you "how Python climbs Wechat moments". The content is simple and clear. I hope it can help you solve your doubts. Let me lead you to study and learn this article "how Python climbs Wechat moments".
Preface
If you directly use Charles or mitmproxy to monitor the interface data of Wechat moments, it is impossible to crawl, because the data is encrypted. Unlike Appium, Appium, as an automated testing tool, can directly simulate the operation of App and get what you currently see. So as long as the App displays the content, we can grab it with Appium.
1. This goal
In this section, we take the Android platform as an example to capture the dynamic information of Wechat moments. Dynamic information includes friend nickname, body, release date. The release date also needs to be converted. If the date is shown as 1 hour ago, the time will be converted to today, and finally the dynamic information will be saved to MongoDB.
two。 Preparatory work
Make sure that PC has installed Appium, the Android development environment, and the Python version of Appium API. Android mobile phone installs Wechat App and PyMongo libraries, installs MongoDB and runs its services. For installation methods, please see Chapter 1.
3. Initialization
First, create a new Moments class and do some initialization configuration, as shown below:
PLATFORM = 'Android'DEVICE_NAME =' MI_NOTE_Pro'APP_PACKAGE = 'com.tencent.mm'APP_ACTIVITY ='. Ui.LauncherUI'DRIVER_SERVER = 'http://localhost:4723/wd/hub'TIMEOUT = 300MONGO_URL =' localhost'MONGO_DB = 'moments'MONGO_COLLECTION =' moments'class Moments (): def _ _ init__ (self): "initialize" # driver configuration self.desired_caps = {'platformName': PLATFORM,'deviceName': DEVICE_NAME 'appPackage': APP_PACKAGE,'appActivity': APP_ACTIVITY} self.driver = webdriver.Remote (DRIVER_SERVER, self.desired_caps) self.wait = WebDriverWait (self.driver, TIMEOUT) self.client = MongoClient (MONGO_URL) self.db = self.client [Mongo _ DB] self.collection = self.b [Mongo _ COLLECTION]
Some initialization configurations are implemented here, such as driver configuration, delay wait configuration, MongoDB connection configuration and so on.
4. Simulated login
The next thing to do is log on to Wechat. Click the login button, enter the user name and password, and submit the login. The example implementation is as follows:
Def login (self): # Login button login = self.wait.until (EC.presence_of_element_located ((By.ID, 'com.tencent.mm:id/cjk') login.click () # enter phone = self.wait.until ((By.ID) for mobile phone 'com.tencent.mm:id/h3') phone.set_text (USERNAME) # next step next = self.wait.until (EC.element_to_be_clickable ((By.ID,' com.tencent.mm:id/adj') next.click () # password password = self.wait.until ((By.XPATH) '/ / * [@ resource-id= "com.tencent.mm:id/h3"] [1]')) password.set_text (PASSWORD) # submit submit = self.wait.until (EC.element_to_be_clickable ((By.ID, 'com.tencent.mm:id/adj') submit.click ()
Here in turn to achieve some click and input operations, the idea is relatively simple. The process may not be consistent for different platforms and versions, which is for reference only.
After logging in, go to the moments page. Select the tab where the moments are located, and click the moments button to enter the moments. The code implementation is as follows:
Def enter (self): # tab tab = self.wait.until (EC.presence_of_element_located ((By.XPATH,'/ * [@ resource-id= "com.tencent.mm:id/bw3"] [3]')) tab.click () # moments moments = self.wait.until ((By.ID, 'com.tencent.mm:id/atz')) moments.click ()
The grabbing work officially begins.
5. Grasping dynamics
We know that moments can be dragged and refreshed all the time, so we need to simulate an infinite drag operation here, as shown below:
# slip point FLICK_START_X = 300FICK_START_Y = 300FLICK_DISTANCE = 700def crawl (self): while True:# up slide self.driver.swipe (FLICK_START_X, FLICK_START_Y + FLICK_DISTANCE, FLICK_START_X, FLICK_START_Y)
We use the swipe () method to pass in the start and end points to drag, and add an infinite loop to achieve infinite drag.
Get the block element corresponding to each status of the currently displayed moments, iterate through each block element, and then get the internally displayed user name, body and release time. The code implementation is as follows:
# all states displayed on the current page items = self.wait.until (EC.presence_of_all_elements_located ((By.XPATH) '/ / * [@ resource-id= "com.tencent.mm:id/cve"] / / android.widget.FrameLayout')) # iterate through each status for item in items:try: nickname nickname = item.find_element_by_id (' com.tencent.mm:id/aig'). Get_attribute ('text') # body content = item.find_element_by_id (' com.tencent.mm:id/cwm'). Get_attribute ('text') ) # date date = item.find_element_by_id ('com.tencent.mm:id/crh'). Get_attribute (' text') # processing date date = self.processor.date (date) print (nickname Content, date) data = {'nickname': nickname,'content': content,'date': date,} except NoSuchElementException:pass
Here we iterate through each state, then call the find_element_by_id () method to get the element corresponding to the nickname, body, and release date, and then get the content through the get_attribute () method. In this way, we can successfully get every dynamic information in our moments.
For date handling, we call a date () processing method of the Processor class, which implements the following:
Def date (self, datetime): "" processing time: param datetime: original time: return: post-processing time "if re.match ('d + minutes ago', datetime): minute = re.match ('(d +)', datetime) .group (1) datetime = time.strftime ('% Ymurf% Mmura% dashes, time.localtime (time.time ()-float (minute) * 60)) if re.match ('d + hours ago'" Datetime): hour = re.match ('(d +)', datetime) .group (1) datetime = time.strftime ('% Ymuri% mmurf% dlue, time.localtime (time.time ()-float (hour) * 60 * 60)) if re.match ('yesterday', datetime): datetime = time.strftime ('% Ymurf% mmurf% dice, time.localtime (time.time ()-24 * 60 * 60)) if re.match ('d + days ago') Datetime): day = re.match ('(d +)', datetime) .group (1) datetime = time.strftime ('% Ymuri% Maffe% dice, time.localtime (time.time ())-float (day) * 24 * 60 * 60) return datetime
This method uses the regular matching method to extract the specific values in time, and then uses the time conversion function to achieve time conversion. For example, if the time is 5 minutes ago, this method first extracts 5, subtracts 300 from the current timestamp to get the timestamp of the release time, and then converts it to standard time.
Finally, the API of MongoDB is called to realize the storage of crawling results. To remove repetition, the update () method is called here, which is implemented as follows:
Self.collection.update ({'nickname': nickname,' content': content}, {'$set': data}, True)
First, query the information according to the nickname and body, and if the information does not exist, insert the data, otherwise update the data. The key to this operation is the third parameter, True, which is set to True, which enables the operation of update if it exists or insert if it doesn't exist.
Finally, an entry method is implemented to call the above methods. Call this method to start crawling, and the code implementation is as follows:
Def main (self): # Log in to self.login () # enter moments self.enter () # crawl self.crawl ()
In this way, we have completed the crawler of the whole circle of friends. After the code runs, the mobile Wechat will start, and can successfully enter the moments and continue to drag the process. The console outputs the corresponding crawling results, and the results are successfully saved to the MongoDB database.
The above is all the content of this article "how Python climbs Wechat moments". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.