Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

"docker practical stories" python's docker- Douyin video capture-summary (part two) (26)

2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)06/02 Report--

Douyin data crawling is mentioned from sections 19 to 24, including user information crawl on web side, fan data crawl on app side, and video data crawl.

(1) grab three large pieces

Fetching user information on the 1.web side

Technical difficulties:

Personal data interface-TTF confusion

Solution:

Analyze the numbers by enumerating

Note:

Through the TTF font data correspondence, if the Douyin TTF font library changes, the crawler also needs to make corresponding changes.

Fan data acquisition on 2.app

Technical difficulties:

Appium Simulation Slide + mitmdump parsing data

It is relatively slow to crawl Douyin data through one device, and multiple devices and processes crawl data through multiple devices.

Note:

1.appium simulates sliding Douyin fan data. On average, a celebrity can only get 5000 fan data.

two。 After the mobile device setup agent grabs the package, if it is unable to connect to the Internet or cannot parse https data, it needs to install the Xposed framework + JustTrustme component to check the shielding certificate. If you use a real mobile phone, it is recommended to directly brush a system with Xposed framework and Root permission, in order to avoid [changing bricks].

3. When setting multi-device and multi-process data fetching, you need to set the bootstrap port of the appium server and the udid field of the client.

Video data capture on 3.web

Technical difficulties:

Crack js to get signature, and get signature through browser

Note:

For video capture, you need to crack the signature field, use stitching html, and parse js.

Technical reference:

Https://douyin.wlansq.cn/

Among the two requests, there is a tac in getjs, which could not get the data at the earliest time, but later found out that tac did not get the data.

PS:

1. When fetching data, you need to add an agent to disguise the crawler

two。 Conditions permit it is best to use real mobile devices, it is best to use Xiaomi, Huawei's security is too high. The safety requirements of domestic Huawei mobile phones are very high, and the usb debugging mode cannot be opened without inserting the mobile phone card. There is still a charge to crack the system.

3. Xiaomi brush machine generally uses [brush machine master], [brush machine wizard], ([line brush treasure] will install some rogue software, but line brush treasure is really easy to use, put up with it)

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report