In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
This article mainly explains "what are the libraries of Python reptiles". The content of the explanation is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn what libraries Python crawlers have.
1. Request library
1. Requests
GitHub: https://github.com/psf/requests
Requests library should be the hottest and most practical library for crawlers now, which is very user-friendly. About its use I have also written an article to take a look at the Requests library of Python, you can take a look at it.
For the most detailed use of requests, you can refer to the official document: https://requests.readthedocs.io/en/master/
Use small cases:
> > import requests > r = requests.get ('https://api.github.com/user', auth= (' user', 'pass')) > r.status_code 200 > > r.headers [' content-type'] 'application/json; charset=utf8' > r.encoding' utf-8' > r.text u' {"type": "User"...'> > r.json () {upright diskusages: 368627
2. Urllib3
GitHub: https://github.com/urllib3/urllib3
Urllib3 is a very powerful http request library that provides a series of functions for manipulating URL.
For more information on how to use it, please refer to: https://urllib3.readthedocs.io/en/latest/
Use small cases:
> import urllib3 > http = urllib3.PoolManager () > r = http.request ('GET',' http://httpbin.org/robots.txt') > r.status 200 > r.data 'User-agent: *\ nDisallow: / deny\ n'
3.selenium
GitHub: https://github.com/SeleniumHQ/selenium
Automated testing tools. A driver that calls the browser, through this library you can directly call the browser to complete certain operations, such as entering a CAPTCHA.
For this library, not only Python can be used, but selenium can be used by JAVA, Python, C # and so on.
For information about how the Python language uses this library, you can visit https://seleniumhq.github.io/selenium/docs/api/py/ to see the official documentation.
Use small cases:
From selenium import webdriver browser = webdriver.Firefox () browser.get ('http://seleniumhq.org/')
4.aiohttp
GitHub: https://github.com/aio-libs/aiohttp
HTTP framework based on asyncio. The efficiency of asynchronous operation can be greatly improved by using asynchronous database to grab data with the help of async/await keyword.
This is an asynchronous library that must be mastered by advanced reptiles. For more details about aiohttp, you can go to the official document: https://aiohttp.readthedocs.io/en/stable/
Use small cases:
Import aiohttp import asyncio async def fetch (session, url): async with session.get (url) as response: return await response.text () async def main (): async with aiohttp.ClientSession () as session: html = await fetch (session, 'http://python.org') print (html) if _ _ name__ =' _ _ main__': loop = asyncio.get_event_loop () loop.run_until_complete (main ()) 2 parse library
1 、 beautifulsoup
Official document: https://www.crummy.com/software/BeautifulSoup/
Html and XML parsing, extracting information from web pages, with powerful API and a variety of parsing methods. A parsing library that I often use, which is very useful for html parsing. This is also a library that must be mastered for people who write crawlers.
2 、 lxml
GitHub: https://github.com/lxml/lxml
Support HTML and XML parsing, support XPath parsing mode, and parsing efficiency is very high.
3 、 pyquery
GitHub: https://github.com/gawel/pyquery
The Python implementation of jQuery, which can operate and parse HTML documents with jQuery syntax, is easy to use and has good parsing speed.
3. Data repository
1 、 pymysql
GitHub: https://github.com/PyMySQL/PyMySQL
Official document: https://pymysql.readthedocs.io/en/latest/
A pure Python implementation of the MySQL client operation library. Very practical, very simple.
2 、 pymongo
GitHub: https://github.com/mongodb/mongo-python-driver
Official document: https://api.mongodb.com/python/
As the name implies, a library for directly connecting to a mongodb database for query operations.
3 、 redisdump
Redis-dump is a tool for converting redis and json; redis-dump is based on ruby development and requires a ruby environment, and the new version of redis-dump requires a version of ruby above 2.2.2, and only version 2.0 of ruby can be installed in yum in centos. You need to install rvm, the administrative tool of ruby, and install a higher version of ruby first.
Thank you for your reading, the above is the content of "what libraries Python crawlers have". After the study of this article, I believe you have a deeper understanding of what libraries Python crawlers have, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.