In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly introduces the zero basic learning python crawler entry knowledge points have what related knowledge, the content is detailed and easy to understand, the operation is simple and fast, has a certain reference value, I believe you read this zero basic learning python crawler introduction knowledge points which articles will have a harvest, let's take a look at it.
1. What is a reptile?
Crawler is a program that automatically grabs Internet information, which can be understood as a spider crawling around on the Internet. If it encounters a resource, it will grab it. For example, it is crawling a web page, and in this network he finds a way, which is actually a hyperlink to the web page, then it can climb to another web to get data, and the crawler crawls the data through these channels.
2. The process of browsing the web
In the process of browsing the web, we may see many good-looking pictures, such as http://image.baidu.com, we will see several pictures and Baidu search box, this process is actually after the user enters the URL, goes through the DNS server, finds the server host, and sends a request to the server. After parsing, the server sends the browser HTML, JS, CSS and other files to the user. When the browser parses it, the user can see all kinds of pictures.
Therefore, the web page that users see is essentially made up of HTML code, and the crawler crawls these contents. By analyzing and filtering these HTML codes, we can achieve access to pictures, text and other resources. In fact, a crawler simulates the process of browsing to get data.
3. The meaning of URL
URL, that is, uniform resource locator, that is, what we call URL, uniform resource locator is a concise expression of the location and access method of resources that can be obtained from the Internet. It is the address of standard resources on the Internet. Every file on the Internet has a unique URL that contains information indicating the location of the file and what the browser should do with it.
The format of URL consists of three parts:
The first part of ① is the protocol (or service mode).
The second part of the ② is the IP address (and sometimes the port number) of the host where the resource is stored.
The third part of the ③ is the specific address of the host resource, such as directory and file name.
The first part and the second part are separated by the symbol ": / /".
The second part and the third part are separated by a "/" symbol.
The first part and the second part are indispensable, and the third part can sometimes be omitted.
This is the end of this article on "what are the basic knowledge points of python crawlers?" Thank you for your reading! I believe you all have a certain understanding of "what are the basic knowledge points of python crawler". If you still want to learn more knowledge, you are welcome to follow the industry information channel.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.