In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)05/31 Report--
This article mainly introduces the relevant knowledge of the python Douyin data collection method, the content is detailed and easy to understand, the operation is simple and fast, and has a certain reference value. I believe everyone will gain something after reading this python Douyin data collection method article. Let's take a look.
Preparatory work
To start the preparatory work for data acquisition, the first step is to build the environment. This time, we use python3.6.6 environment in windows environment, and the package grab and agent tool is mitmproxy. We can also use Fiddler to grab packets, and use Night Simulator to simulate the Android operating environment (or a real machine). This time, we mainly grab data by manually sliding app. Next time, we will introduce the use of Appium automation tools. Realize the full automation of data acquisition (liberating hands).
1. Install python3.6.6 environment. Baidu can be installed on its own. It should be noted that centos7 comes with python2.7 and needs to be upgraded to python3.6.6 environment. Install the ssl module before upgrading, otherwise the upgraded version cannot access the request of https.
2. Install mitmproxy. After installing the python environment, execute pip install mitmproxy to install mitmproxy on the command line. Note: only mitmdump and mitmweb can be used under windows. After installation, you can start by typing mitmdump on the command line. The default proxy port is 8080.
3. To install the Night God simulator, you can download the installation package on the official website and install the tutorial on Baidu, which is basically the next step. After installing the Night God Simulator, the Night God Simulator needs to be configured. First of all, you need to set the network of the simulator as a manual agent, the IP address is IP of windows, and the port is the proxy port of mitmproxy.
4. Next is the certificate installation. Open the browser in the simulator, enter the address mitm.it, select the corresponding version of the certificate, and after installation, you can grab the package.
5. To install the app,app installation package, you can download it from the official website, and then install it by dragging it into the simulator or in the application market.
At this point, the data acquisition environment is all set up.
Packet capture for data interface analysis
After building the environment, we began to capture the data package of Douyin app, and analyzed the interface used by each function. This time, we take the interface for collecting video data as an example.
Close the previously opened mitmdump and reopen the mitmweb tool. Mitmweb is a graphical version, so you don't have to look at the black box, as shown below:
Open the Douyin app of the simulator after startup, and you can see that some packets have been parsed, then enter the user's home page and begin to slide down the video. The API https://aweme.snssdk.com/aweme/v1/aweme/post/ for requesting video data can be found in the packet list.
You can see the request data and response data of the interface on the right. We copy the response data and move on to the next step of parsing.
Data parsing
Through the combination of mitmproxy and python code, we can get the packets in mitmproxy in the code, and then we can deal with them as needed. Create a new test.py file and put two methods in it:
Def request (flow): passdef response (flow): pass
See the name, these two methods, one is executed at the time of the request, the other is executed at the time of response, and the packet exists in the flow. The request url,flow.request.headers can be obtained through flow.request.url and the request header information can be obtained. The data in flow.response.text is the response data.
Def response (flow): if str (flow.request.url) .startswith ("https://aweme.snssdk.com/aweme/v1/aweme/post/"): index_response_dict = json.loads (flow.response.text) aweme_list = index_response_dict.get ('aweme_list') if aweme_list: for aweme in aweme_list: print (aweme))
This aweme is a complete video data. You can extract the information as needed. Here we extract some of the information for introduction.
"statistics": {"aweme_id": "6765058962225204493", "comment_count": 24, "digg_count": 1465, "download_count": 1, "play_count": 0, "share_count": 3, "forward_count": 0, "lose_count": 0, "lose_comment_count": 0}
Statistics messages are the likes, comments, downloads and retweets of this video.
Share_url is the sharing address of the video. Through this address, you can watch the video shared by Douyin on the PC, or resolve the video without watermark through this link.
Play_addr is the playback information of the video, and the url_list is the unwatermarked address. However, the official address cannot be played directly, and there is a time limit. After the timeout, the link becomes invalid.
With this aweme, you can parse the information inside and save it to your own database, or download unwatermarked videos and save them to your computer.
After you have written the code, save the test.py file, enter cmd on the command line, enter the directory of saving test.py files, type mitmdump-s test.py,mitmdump on the command line to start, open app, start sliding the simulator, and enter the user's home page:
With the beginning of continuous decline, the test.py file can parse all the captured video data. Here are some of the data information I intercepted:
Video information:
Video Statistics:
Video comment data:
Download unwatermarked videos:
This is the end of the article on "python Douyin data Collection method". Thank you for reading! I believe that everyone has a certain understanding of the "python Douyin data collection methods" knowledge, if you want to learn more knowledge, welcome to follow the industry information channel.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.