How does python crawl Douyin user details 07/03 Update SLTechnology News&Howtos

How does python crawl Douyin user details

2025-07-03 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article mainly introduces python how to crawl Douyin user detailed data, has a certain reference value, interested friends can refer to, I hope you can learn a lot after reading this article, the following let the editor take you to understand it.

First, analyze the api1 that requests user data. User data capture packet

First of all, grab the user data packet through Fiddle in the built environment.

two。 User packet analysis 2.1. Request information analysis

Api analysis of the request

Now that we know the api of the request and what information is contained in the request header, we can crawl the user's data by manually constructing the corresponding request parameters. I have obtained the uid and sec_user_id data of 1W + users in the previous article, and then we can use these data to crawl the user's detailed data.

2.2. Response information analysis

Second, obtain user data 1. Construction request api

In the article "Douyin Crawler tutorial, getting Douyin user data from 0 to 1", we have introduced the api and its construction method for crawling Douyin follow list. In fact, the api for getting user details is basically the same as that for getting user follow list. It mainly requires us to fill in the user's user_id and user's sec_user_id, as well as a lot of timestamp information, other information is unchanged. Let's construct an api to get the user's details

Def construct_api (user_id, _ rticket, ts Sec_user_id): "" api constructor: param user_id: user's id: param _ rticket: timestamp: param ts: timestamp: param sec_user_id: user's encrypted id: return: api "" api = "https://aweme-eagle.snssdk.com"\" / aweme/ V1username? "\" user_id= {} "\" & retry_type=no_retry "\" & iid=1846815477740845 "\" & device_id=47012747444 "\" & ac=wifi&channel=wandoujia_aweme1 "\" & aid=1128&app_name=aweme "\" & version_code=630 "\" & version _ name=6.3.0 "\" & device_platform=android "\" & ssmix=a&device_type=HUAWEI+NXT-AL10 "\" & device_brand=HUAWEI&language=zh "\" & os_api=26&os_version=8.0.0 "\" & openudid=b202a24eb8c1538a "\" & manifest_version_code=630 "\" & resolution=1080*1812 " \ "& dpi=480&update_version_code=6302"\ "& _ rticket= {}"\ "& js_sdk_version=1.16.3.5" & ts= {} "\" & sec_user_id= {} "\" .format (user_id _ rticket, ts, sec_user_id) return api2. Construct request header

We have analyzed the request header above, and the construction of the request header is also convenient. Most of the contents are fixed. We mainly need to fill in several timestamps and the corresponding X-Gorgon, in which the construction method of X-Gorgon is more complex, but you can only get the available X-Gorgon by entering the correct Cookie and Token, otherwise your Gorgon will not be available. The following figure is the main information in the request header: below I wrote a function to construct the request header:

Def construct_header (user_id, sec_user_id, cookie, query, token, user_agent, _ rticket, ts): "" construct the request header The parameters to be passed are as follows: param user_id: uid of the user to be crawled: param sec_user_id: encrypted id of the user to crawl: param cookie: cookie: param query: request query: param token: your token: param user_agent: request user_agent: param _ rticket: timestamp (millisecond level): param ts : timestamp (seconds): return: constructed request header: headers "api = construct_api (user_id) _ rticket, ts, sec_user_id) headers = {"Host": "aweme-eagle.snssdk.com", "Connection": "keep-alive", "Cookie": cookie, "Accept-Encoding": "gzip", "X-SS-QUERIES": query, "X-SS-REQ-TICKET": _ rticket X-Tt-Token: token, "sdk-version": "1", "User-Agent": user_agent} x_gorgon = get_gorgon (api, cookie, token, query) headers ["X-Khronos"] = ts headers ["X-Gorgon"] = x_gorgon print (headers) return headersdef get_gorgon (url, cookies, token) Query): "" get X-Gorgon: param url: requested api: param cookies: your cookie: param token: your token: param query: your query: return: gorgon "# initiate a request to get X-Gorgon headers = {" dou-url ": url # fill in the api "dou-cookies" of the corresponding request: cookies, # fill in your cookies "dou-token": token, # fill in your token "dou-queries": query # fill in your request queries} gorgon_host = "http://8.131.59.252:8080" res = requests.get (gorgon_host Headers=headers) gorgon = "" if res.status_code = = 200: print ("request successful") res_gorgon = json.loads (res.text) if res_gorgon.get ("status") = 0: print ("successfully obtained X-Gorgon") print (res_gorgon.get ("X-gorgon")) # you can use it to climb data gorgon = res_gorgon.get ("X-gorgon") else: print ("failed to get X-Gorgon") print (res_gorgon.get ("reason")) raise ValueError (res_gorgon.get ("reason")) else: Print ("request error / may be your network error It may also be my fault, but most likely it is your side of the mistake ") raise ValueError (" request error / may be your network error, may be my mistake, but most likely your side of the mistake ") return gorgon3. Once the request header is ready, we can get the user data def get_user_detail_info (cookie, query, token, user_agent, user_id). Sec_user_id): "" crawl user data: param cookie: your own cookie: param query: your own query: param token: your own token: param user_agent: your own User-Agent: param user_id: user's uid: param sec_user_id: user's encrypted uid: return: response "" _ rticket = str (time.time () * 1000). Split (".") [0] ts = str (time.time ()) .split (".") [0] api = construct_api (user_id) _ rticket, ts, sec_user_id) headers = construct_header (user_id, sec_user_id, cookie, query, token, user_agent, _ rticket, ts) print (api) req = request.Request (api) for key in headers: req.add_header (key Headers [key]) with request.urlopen (req) as f: data = f.read () return gzip.decompress (data). Decode () 4. Parsing user data

According to the analysis of the response data above, the corresponding response data is in json format, and there is a lot of data. After analyzing it, I found some useful data for me:

# user's Douyin account unique_id=345345345O# user's sec_user_idsec_uid=MS4wLjABAAAA2_HUlxau0riJ8UBMwyd_bUtA8yzKdWepfg9nUc5wQy0# profile address avatar_uri=26e880003aefb8cddd496# user's nickname nickname= Chengdu hipster list # user's signature signature= Thank you for following ❤ # user's date of birth birthday=1995-01-01 user's country country= China # user's province province= Sichuan # user's city city= Chengdu # user's location Regional district= Wuhou # users' fans, follower_count=929219# users' followers, following_count=15# 's Douyin, aweme_count=453# 's dynamic number of dongtai_count=480# users' likes, favoriting_count=322# 's total number of likes, total_favorited=149007005. If _ _ name__ = ='_ _ main__': cookie = "" # your own cookie token = "" # your own token query = "#" # your own query user_agent = "" # your own user-agent user_id = 103600654544 sec_user_id = "MS4wLjABAAAA2_HUlxau0riJ8UBMwyd_bUtA8yzKdWepfg9nUc5wQy0" res = get_user_detail_info (cookie,query, token, user_agent, user_id Sec_user_id) print (res) Thank you for reading this article carefully. I hope the article "how python crawls the detailed data of Douyin users" shared by the editor is helpful to you. At the same time, I also hope you can support us and follow the industry information channel. More related knowledge is waiting for you to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.