In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/02 Report--
This article mainly explains "the basic principles and processes of python crawler". The explanation in this article is simple and clear, easy to learn and understand. Please follow the ideas of Xiaobian and go deep into it slowly to study and learn "the basic principles and processes of python crawler" together!
1. Basic principles
Crawler is a program that simulates the user's operation on the browser or App application and automates the operation process. There are four basic processes.
(1) Initiate a request
Send a request to the target site through the HTTP library, that is, send a Request. The request can contain additional header information and wait for the server to respond.
(2) Obtain response content
If the server can respond normally, it will get a Response. The content of the Response is the page content to be obtained. The type may be HTML,Json string, binary data (image or video), etc.
(3) Analysis of content
The content obtained may be HTML, can be parsed with regular expressions, page parsing library, may be Json, can be directly converted to Json object parsing, may be binary data, can be saved or further processed
(4) Preservation of data
Save in a variety of forms, can be saved as text, can also be saved to the database, or save a specific format file
2. Process
And what happens in the background when we type a url into the browser and press enter?
In short, this process takes place in four steps:
(1) Find the IP address corresponding to the domain name.
DNS(Domain Name System) is the first thing that browsers access. DNS's main job is to convert domain names into corresponding IP addresses.
(2) Send a request to the server corresponding to IP.
(3) The server responds to the request and sends back the web page content.
(4) The browser displays the content of the web page.
Web crawler to do, simply put, is to achieve the functions of the browser. By specifying the url, directly return the data required by the user, without the need to manually manipulate the browser step by step.
Thank you for your reading, the above is the content of "the basic principles and process of python crawler", after learning this article, I believe that everyone has a deeper understanding of the basic principles and process of python crawler, and the specific use situation needs to be verified by practice. Here is, Xiaobian will push more articles related to knowledge points for everyone, welcome to pay attention!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.