Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to use Python to count knowledge Planet sign-in homework

2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

This article mainly explains "how to use Python to count knowledge Planet sign-in homework". Interested friends may wish to have a look. The method introduced in this paper is simple, fast and practical. Next let the editor to take you to learn "how to use Python to count knowledge planet sign-in homework"!

The title is "correcting the knowledge Planet assignment with Python", which feels too partisan, so I changed the word, but it is possible to do it when AI is stronger. We on the knowledge planet count everyone's homework completion and clocking in every week, because the knowledge planet does not provide operational statistics to the star owner, so I can only solve it by myself. I especially recommend products and operators to learn some programming and know something about crawlers, because Internet people rely on data to speak.

Our goal is to count the sign-in and homework completion on the planet in the last week, so we have to find a way to get the data, and then analyze the data statistically. Because knowledge Planet provides the PC browser version, we find the entry directly from the Chrome browser for data crawling.

Step 1: train of thought analysis

The crawler uses the program to simulate the browser to initiate a web request and collect the data back, so let's first analyze what the web request looks like in the browser. After Wechat scans and logs in to the knowledge planet https://wx.zsxq.com/dweb/, the browser right-click "check", open developer mode and select "Network". You can see every web request sent by the browser, select the circle you want to count, and you will see a lot of requests.

These requests are all related to the circle. At this stage, you need to have a general understanding of the whole data on the page. For example, the functions provided on this page include the basic introduction of the circle, the basic information of the star owner, the list of posts in the middle, and the list of circles on the left. At this point, you need to make a judgment based on the return result of each request.

The data requested by groups corresponds to the list of circles on the left side of the page.

Topics?count=20 is the request interface for the post data we are looking for.

After finding the request interface to get the data, let's first learn about the data structure returned.

{

"topic_id": 48551524482128

"group": {

"group_id": 5188 55855524

"name": "Zen and Friends of Python"

}

"type": "talk"

"talk": {

"owner": {

"user_id": 15551441848112

"name": "Ye Xian"

"avatar_url": "https://file.zsxq.19.jpg""

}

"text": "I gave it a try, and it took about 140s to crack the 8-digit 0-9 MD5 brute force."

}

"likes_count": 0

"comments_count": 0

"rewards_count": 0

"digested": false

"sticky": false

"create_time": "2018-06-05T23:39:38.197+0800"

"user_specific": {

"liked": false

"subscribed": false

}

}

According to the results returned by the API, it is concluded that the result returned by each request contains 20 post data, and the data structure of each post is also very clear. Type indicates the type of post, talk is an ordinary post, and another is called solution, which indicates the job. The talk field specifies the information of the poster, the content posted, and the creation time. This is a nested json dictionary structure, using MongoDB to directly store these data is the most convenient, there is no need to build Schema, directly as a document (json) stored in the database, it is convenient to filter grouping statistics according to conditions.

Step 2: code implementation

After the train of thought is clear, writing code is actually very fast, the installation of Mongodb will not be introduced here, and the tutorials on the reference network can be solved. It only takes two dependent libraries to do it.

Pip install pymongopip install requests

Now that the interface to get the data has been found, and the solution for storing the data has been determined, you can officially start playing with the code implementation. Let's first determine which request data we need to provide if we use code to simulate a browser sending a request for post data.

Let's take a closer look at the details of the request to determine the complete url and request method GET, as well as the very important request header information. We encapsulate the header information into a dictionary and put it in the get method.

Def crawl ():

Url = "https://api.zsxq.com/v1.10/groups/518855855524/topics?count=20"

Res = requests.get (url, headers=headers) # get request

Topics = res.json () .get ("resp_data") .get ("topics")

For i in topics:

Print (i.get ("talk") .get ("text") [: 10])

Db.topics.insert_one (I)

Now that you have only got the first 20 pieces of data, you still need a paging query to get all the posts, so you need to use the browser to load more data to see what the paging parameters are in the request. You will find that it was obtained using the creation time of the last post in the data returned from the last request as the paging parameter end_time like the server, so we changed the code to:

Def crawl (url):

Res = requests.get (url, headers=str_to_dict (headers))

Topics = res.json () .get ("resp_data") .get ("topics")

If len (topics)

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report