In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-03 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
This article introduces the relevant knowledge of "what are the three ways of writing python crawler". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!
What is a reptile?
If we compare the Internet to a large spider's web, the data is stored at each node of the spider's web, and the reptile is a small spider.
Crawlers crawling their prey (data) along the web refer to a program that initiates a request to a website, acquires resources, analyzes and extracts useful data.
From the technical level, it is through the program to simulate the behavior of the browser to request the site, crawl the HTML code / JSON data / binary data (pictures, videos) returned by the site locally, and then extract the data you need and store it for use.
Second, the basic process of the crawler:
How users obtain network data:
Method 1: the browser submits the request-> download the web page code-> parse into the page.
Method 2: simulate the browser to send a request (get the web page code)-> extract useful data-> store it in a database or file
What a crawler has to do is way 2.
1. Initiate a request
Use the http library to initiate a request to the target site, that is, send a Request
Request includes: request header, request body, etc.
Request module bug: unable to execute JS and CSS code
2. Get the response content
If the server responds normally, you will get a Response
Response includes: html,json, pictures, videos, etc.
3. Analyze the content
Parsing html data: regular expressions (RE modules), third-party parsing libraries such as Beautifulsoup,pyquery, etc.
Parsing json data: json module
Parsing binary data: writing to a file in wb
4. Save data
Database (MySQL,Mongdb, Redis)
File
This is the end of the content of "what are the three ways to write python crawler". Thank you for your reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.