In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)06/02 Report--
Douyin data crawling is mentioned from sections 19 to 24, including user information crawl on web side, fan data crawl on app side, and video data crawl.
(1) grab three large pieces
Fetching user information on the 1.web side
Technical difficulties:
Personal data interface-TTF confusion
Solution:
Analyze the numbers by enumerating
Note:
Through the TTF font data correspondence, if the Douyin TTF font library changes, the crawler also needs to make corresponding changes.
Fan data acquisition on 2.app
Technical difficulties:
Appium Simulation Slide + mitmdump parsing data
It is relatively slow to crawl Douyin data through one device, and multiple devices and processes crawl data through multiple devices.
Note:
1.appium simulates sliding Douyin fan data. On average, a celebrity can only get 5000 fan data.
two。 After the mobile device setup agent grabs the package, if it is unable to connect to the Internet or cannot parse https data, it needs to install the Xposed framework + JustTrustme component to check the shielding certificate. If you use a real mobile phone, it is recommended to directly brush a system with Xposed framework and Root permission, in order to avoid [changing bricks].
3. When setting multi-device and multi-process data fetching, you need to set the bootstrap port of the appium server and the udid field of the client.
Video data capture on 3.web
Technical difficulties:
Crack js to get signature, and get signature through browser
Note:
For video capture, you need to crack the signature field, use stitching html, and parse js.
Technical reference:
Https://douyin.wlansq.cn/
Among the two requests, there is a tac in getjs, which could not get the data at the earliest time, but later found out that tac did not get the data.
PS:
1. When fetching data, you need to add an agent to disguise the crawler
two。 Conditions permit it is best to use real mobile devices, it is best to use Xiaomi, Huawei's security is too high. The safety requirements of domestic Huawei mobile phones are very high, and the usb debugging mode cannot be opened without inserting the mobile phone card. There is still a charge to crack the system.
3. Xiaomi brush machine generally uses [brush machine master], [brush machine wizard], ([line brush treasure] will install some rogue software, but line brush treasure is really easy to use, put up with it)
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.