In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
This article mainly explains "how Python crawls some simple forums, posts and web pages". The explanation content in this article is simple and clear, easy to learn and understand. Please follow the ideas of Xiaobian to study and learn "how Python crawls some simple forums, posts and web pages" together.
introduction
Write a simplest crawler in the shortest time, you can catch some simple forums, posts, web pages.
entry
1. preparations
install Python
Install Scrapy Framework
An IDE or you can use your own IDE.
2. Start writing about crawlers.
Create a python file in the spiders folder, such as miao.py, as a crawler script.
The code is as follows:
3. run the
If you use the command line, it looks like this:
parsing
1. Try the magic of xpath
2. See how xpath works.
Add the quote at the top:
from scrapy import Selector
Change the parse function to:
Let's run it again, and you'll see the titles and urls of all the posts on the *** page of the output "Star Zone."
recursive
The complete code is as follows:
Pipelines--Pipelines
Now it's time to process the content that has been fetched and parsed, and we can write it to local files and databases through pipelines.
1. Defining an Item
Create an items.py file in the miao folder
Here we define two simple classes to describe the results of our crawling.
2. processing method
3. Call this processing method in the crawler.
4. Specify this pipeline in the configuration file
Multiple pipelines can be configured like this:
Middleware--Middleware
1. Configuration of Middleware
2. Broken website check UA, I want to change UA
Here is a simple middleware for randomly changing UA, and the content of agents can be expanded on its own.
3. Broken website to seal IP, I want to use proxy
Thank you for reading, the above is "Python how to crawl some simple forums, posts, web pages" content, after the study of this article, I believe we have a deeper understanding of Python how to crawl some simple forums, posts, web pages this problem, the specific use of the situation also needs to be verified by practice. Here is, Xiaobian will push more articles related to knowledge points for everyone, welcome to pay attention!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.