Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to build a crawler program in Python

2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/03 Report--

In this issue, the editor will bring you about how to build the crawler program in Python. The article is rich in content and analyzes and narrates it from a professional point of view. I hope you can get something after reading this article.

Development tools

Python version: 3.6.4

Related modules:

Scrapy module

Pyecharts==1.5.1 module

Wordcloud module

Jieba module

And some modules that come with python.

Environment building

Install Python and add it to the environment variable, and pip installs the relevant modules you need.

Data crawling

Let's start with a wave of self-open source libraries that use requests to simulate login:

Https://github.com/CharlesPikachu/DecryptLogin

Currently, the websites that support simulated login in the library include:

1. micro-blog

The functions of the library and some small applications related to the library will be continuously added and improved in the future. Of course, I can't use it today, because I found that his Miao Zhihu fan data has always been a naked API, and even after the revision, he doesn't need to verify the login cookies or anything like that.

To get to the point, let's briefly talk about how to grab this data. in fact, it is very simple. F12 opens the developer tool and refreshes the follower page to find:

Request this API to return the fan data of the target user directly. The composition of the API is as follows:

Https://www.zhihu.com/api/v4/members/{ user domain name} / followers?

There is nothing special to pay attention to, and there is no doubt that it is as simple as that. Scrapy will finish climbing a new project:

Scrapy startproject zhihuFansSpider

Define items:

Class ZhihufansspiderItem (scrapy.Item):

Then create a new crawler and write a main program called OK:

'' Zhihu fan crawler''

Run the following command to start crawling fan data for the target user:

Scrapy crawl zhihuFansSpider-o followers_info.json-t json

Data visualization

According to the old rule, you can visually climb to the data (here, take the follower data of my own Zhihu account as an example, Tweet).

First draw a fan home page title of the word cloud shock?

! [https://upload-images.jianshu.io/upload_images/2539976-ada286149ecb2285?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240)] the above is how to build a crawler program in Python shared by the editor. If you happen to have similar doubts, please refer to the above analysis to understand. If you want to know more about it, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report