In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-31 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
How to use Python to climb the reasons for your single, I believe that many inexperienced people are helpless about this, for this reason this article summarizes the causes and solutions of the problem, through this article I hope you can solve this problem.
Two days ago, I happened to see such a legal education on 520, Qixi, Valentine's Day, etc. on Weibo...! I believe that a few days ago, many little fairies received love from each other!
However, there are still many partners who do not have the opportunity to send red packets...
I'm one of those people who doesn't get love and doesn't get a chance to give it! I don't know why I'm still single! Hahaha, I want to know, so many single people, you are single because of what! Crawlers! Crawlers! Crawlers!
Everyone said, so many people are single, have you analyzed why you are so outstanding but still single?
I. Background of demand
I found an interesting topic when I visited Weibo today.#90 Reasons for Being Single TOP3 #
On the occasion of Qixi Festival, a sample survey on the concept of marriage and love of post-90s youth nationwide announced the results. The results show that the proportion of singles in first-tier cities continues to lead. Post-90s single reason TOP 3: small circle, busy work, too perfect for love fantasy.
Three reasons for being single: small circle, busy work, too perfect for love fantasy!
I think these three reasons do not seem reasonable, is not the reason for single because poor? Crying...
II. Functional description
Curious how this investigation came about? Authenticity remains to be verified, just these days we have also learned how to crawl micro blog topics, today to analyze why many students are so good but still single!
III. Technical scheme
Login to Weibo
crawl topic
save the file
data cleaning
data analysis
IV. Simulated login
Before the simulation login, I have already said it when I crawl #Jay Chou Hypertalk #, so I won't repeat it here and post the code directly!
V. Climbing the topic
1. Find topic load data url
https://m.weibo.cn/api/container/getIndex? containerid=100103type%3D61%26q%3D%2390%E5%90%8E%E5%8D%95%E8%BA%AB%E5%8E%9F%E5%9B%A0TOP3%23%26t%3D0&isnewpage=1&extparam=pos%3D41%26c_type%3D31%26realpos%3D40%26flag%3D0%26filter_type%3Drealtimehot%26cate%3D0%26display_time%3D1565179797&luicode=10000011&lfid=106003type%3D25%26t%3D3%26disable_hot%3D1%26filter_type%3Drealtimehot&page_type=searchall
2. code emulation request data
We still use the requests library to crawl the data, this time the pig brother added a parameter: timeout when requesting, this is to prevent a request that has not been answered from blocking other requests!
3. Extract Weibo content
To extract microblogging content, you need to understand the data format returned by the request
After understanding the data format, we can write code to extract the microblogging content we want!
In the above picture, we have already got the Weibo content, but there are still a lot of web tags. Let's use regular to remove the web tags and start with the topic at the beginning!
4. save the file
After the microblogging content is extracted, we will save it!
VI. Batch crawling
Batch crawling involves pagination. Last time we put Jay Chou on the phone, its pagination mechanism was:
Microblog hypertalk paging mechanism: according to the time paging, each microblog has a since_id, the longer the time since_id is, the larger the request since_id is, the microblogs smaller than this since_id under the corresponding topic will be loaded, and then the smallest since_id will be obtained again. Pass in the smallest since_id, and request in turn, so as to realize paging.
Is the pagination mechanism of this topic like this? Let's compare the urls of the first and second requests
We found that the pagination mechanism of common topics is actually in the form of pages. It seems that Weibo has different pagination mechanisms for topics of different levels!
Page form of paging mechanism, before we talked about a lot of cases, directly for loop into i, and this i as a page can!
VII. Data analysis
Data analysis we use pyecharts library, which is a very good visual analysis library!
First read the data, then use jieba library for word segmentation and data cleaning, and finally use pyecharts library to do display!
Previous surveys have shown that the top three reasons for being single are: small circle, busy work and too perfect for love fantasies. The results of our data analysis seem to be true!
After reading the above, do you know how to use Python to crawl the reasons why you are single? If you still want to learn more skills or want to know more related content, welcome to pay attention to the industry information channel, thank you for reading!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.