In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly introduces the urllib library commonly used functions of what related knowledge, the content is detailed and easy to understand, the operation is simple and fast, has a certain reference value, I believe you read this urllib library common functions which articles will have a harvest, let's take a look at it.
What is the urllib library?
Python3 integrates the urllib and urllib2 libraries in Python2 into a urllib library, so now we are generally talking about the urllib library in Python3, so what is the urllib library? How do I use the urllib library? What are the common functions of the urllib library?
Urllib is mainly divided into the following four functional modules:
Urllib.request (request module)
Urllib.parse (parsing module)
Urllib.error (exception handling module)
Urllib.robotparser (robots.txt file parsing module)
Urllib is Python's built-in HTTP request library, which can be used directly without installation. It is also a library commonly used by crawler developers. Today, the editor will summarize the basic usage of some common functions of the urllib library.
II. Explanation of urllib usage
1. Urllib.request.urlopen () function
Create a file object that identifies the remote url, and then manipulate the class file object like a local file to get the remote data. The syntax is as follows:
Urllib.request.urlopen (url,data = None, [timeout] *, cafile = None,capath = None,cadefault = False,context = None)
Url: the requested url
Data: the data of the request. If this value is set, it will become a post request.
Timeout: sets the access timeout handle object for a website
Cafile and capath: used in HTTPS requests to set the CA certificate and its path.
Example
From urllib import request
Response = request.urlopen ('http://www.baidu.com') # get request
Print (response.read (). Decode ('utf-8') # gets the content of the response and decodes it
The methods provided by the urlopen () return object are:
Read (), readline (), readlines (), fileno (), close (): operate on HTTPR esponse type data
Info (): returns a HTTPMessage object that represents the header information returned by the remote server
Getcode (): returns the HTTP status code geturl (): returns the requested url
Getheaders (): header information of the response
Getheader ('Server'): returns the value of the parameter Server specified in the response header
Status: return status code
Reason: returns details of the status.
2. Urllib.request.urlretrieve () function
This function can easily save a file on the web page locally. The syntax is as follows:
Urllib.request.urlretrieve (url, filename=None, reporthook=None, data=None)
Url: the address of the remote data
Filename: the path where the file is saved. If empty, it will be downloaded as a temporary file.
Reporthook: the hook function is called once when the server is successfully connected and when each data block is downloaded, including three parameters, including the downloaded data block, the size of the data block, and the total file size, which can be used to display the download progress.
Data from data:post to server.
Example
From urllib import request
Request.urlretrieve ('http://www.baidu.com/','baidu.html') # downloads Baidu's home page information locally
3. Urllib.parse.urlencode () function
Urlencode can convert dictionary data into URL-encoded data. The syntax is as follows:
Urllib.parse.urlencode (query, doseq=False, safe='', encoding=None, errors=None, quote_via=quote_plus)
Query: query parameter
Doseq: whether sequence elements are converted separately
Safe: security default
Encoding: encodin
Errors: error default
Quote_via: when the query parameter is composed of str, safe, encoding, and errors are passed to the specified function. The default is quote_plus (), and the enhanced version quote ().
Example
From urllib import parse
Data = {'name': 'W3C Schoolboys' greetings': 'Hello W3C Schoollings' age': 100}
Qs = parse.urlencode (data)
Print (qs)
#% E5%A7%93%E5%90%8D=W3CSchool&%E9%97%AE%E5%A5%BD=Hello+W3CSchool&%E5%B9%B4%E9%BE%84=100
4. Urllib.parse.parse_qs () function
The encoded url parameters can be decoded. The syntax is as follows:
Urllib.parse.parse_qs (qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace')
Keep_blank_values: indicates whether to display key when value is empty. Default is False.
Strict_parsing: a flag indicating how to handle parsing errors. If False (the default), the error is automatically ignored. Otherwise, the error throws a ValueError exception.
Example
From urllib import parse
Data = {'name': 'W3C Schoolboys' greetings': 'hello W3C Schoollings' age': 100}
Qs = parse.urlencode (data)
Print (qs)
#% E5%A7%93%E5%90%8D=W3CSchool&%E9%97%AE%E5%A5%BD=hello+W3CSchool&%E5%B9%B4%E9%BE%84=100
Print (parse.parse_qs (qs))
# {'name': ['W3C School'],' greetings': ['hello W3C School'],' age': ['100']}
5. Urllib.parse.parse_qsl () function
The basic usage is the same as the parse_qs () function, except that the urllib.parse.parse_qs () function returns a dictionary and the urllib.parse.parse_qsl () function returns the list. The syntax is as follows:
Urllib.parse.parse_qsl (qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace')
Example
From urllib import parse
Data = {'name': 'W3C Schoolboys' greetings': 'hello W3C Schoollings' age': 100}
Qs = parse.urlencode (data)
Print (parse.parse_qs (qs))
# [('name', 'W3C School`), (' Hello', 'hello W3C School`), (' age', '100')]
6. Urllib.parse.urlparse () and urllib.parse.urlsplit () functions
Sometimes when you get a url and want to split the various components of the url, you can use urlparse or urlsplit to split it. The respective syntax is as follows:
Urllib.parse.urlparse (urlstring, scheme= ", allow_fragments=True)
Urllib.parse.urlsplit (urlstring, scheme= ", allow_fragments=True)
Urlparse and urlsplit are basically exactly the same.
The only difference is that there is a params attribute in urlparse, while urlsplit does not have this params attribute.
Example
From urllib import parse
Url = 'http://www.baidu.com/index.html;user?id=S#comment'
Result = parse.urlparse (url)
# result = parse.urlsplit (url)
Print (result)
Print (result.scheme)
Print (result.netloc)
Print (result.path)
The params attribute is included in print (result.params) # urlparse, but not in urlsplit.
7. Urllib.error module
The error module of urllib defines the exception generated by the urllib.request request. If the request fails, urllib.request throws an exception for the error module.
URLError
The URLError class comes from the error module of urllib, inherits the OSError module, is the base class of the exception module, and has the attribute reason, which returns the cause of the error.
Example
From urllib import request, error
Try:
Resp = request.urlopen ('https://w3cschool.c/index.html')
Except error.URLError as e:
Print (e.reason)
# [Errno 11001] getaddrinfo failed
HTTPError
It is a subclass of URLError, specifically used to handle HTTP request errors, and has the following three properties:
Code: returns the HTTP status code
Reason: cause of exception
Headers: request header
Example
From urllib import request
Response = request.urlopen ('http://www.baidu.com') # get request
Print (response.read (). Decode ('utf-8') # gets the content of the response and decodes it
# 404
Of course, most of the time, exception handling is carried out by combining URLError and HTTPError. First, the error status code, exception reason, request header and other information of url are captured through HTTPError. If it is not an error of this type, URLError is captured, and the cause of the error is output. Finally, else handles the normal logic.
Example
From urllib.request import Request, urlopen
From urllib.error import URLError, HTTPError
Resp = Request ('http://www.baidu.cnc/')
Try:
Response = urlopen (resp)
Except HTTPError as e:
Print ('(www.baidu.cnc) server could not complete the request.')
Print ('error code:', e.code)
Except URLError as e:
Print ('We cannot connect to the server.')
Print ('cause:', e.reason) else:
Print ("Link succeeded!")
Print (response.read () .decode ("utf-8")
These are the functions commonly used in the urllib library. I hope you pay more attention to the connection in front of the screen. The combination of theory and practice is the best way to learn! Recommended reading: Python static crawler, Python Scrapy web crawler.
Finally, let's summarize the common meanings of various status codes:
200: the request is normal, and the server returns data normally
301: permanent redirection. For example, when accessing www.jingdong.com, it will be redirected to www.jd.com.
302: temporary redirection. For example, when you visit a page that requires login, and you do not log in at this time, you will be redirected to the login page.
The requested url cannot be found on the server. In other words, request url error
403: server denied access, insufficient permissions
500: server internal error. Maybe there is a bug on the server.
This is the end of the article on "what are the common functions of the urllib library?" Thank you for reading! I believe you all have a certain understanding of the knowledge of "what are the common functions of the urllib library?" if you want to learn more, you are welcome to follow the industry information channel.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.