In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
Editor to share with you how to use regular expressions in Python to extract the data of the text class, I believe most people do not know much about it, so share this article for your reference. I hope you will gain a lot after reading this article. Let's learn about it together.
Environment introduction:
Python 3.6
Pycharm
Requests
Re
Json
General ideas of reptiles
1. Determine the crawled url path and headers parameters.
2. Send a request-- requests simulates the browser to send a request to obtain response data
3. Parsing data-- re module: provides all regular expression functions
4. Save data-save data in json format
Complete steps:
1. Install the library and import module
If you do not have the library installed, you can WIN+R, then type cmd, and finally enter the code to install
Pip install requests
After the installation is complete, you can enter the code to view all the libraries you have installed:
Pip list
Import module
Import requests import re import json
2. Determine the crawled url path and headers parameters.
Base_url = 'https://www.guokr.com/ask/highlight/'headers = {' User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36'}
3. Send a request-- requests simulates the browser to send a request to obtain response data
Response = requests.get (base_url, headers=headers) data = response.text
4. Parsing data-re module: provides all regular expression functions
Href= "https://www.guokr.com/question/669761/"> Indians call a man's genitals Linga, and a woman's genitals Yoni. The combination of Linga and Yoni is yoga. Is this true or false?
Compiling code objects precompiled by regular expressions is faster than using strings directly, because the interpreter must compile strings into code objects before executing code in the form of strings
Pattern = re.compile ('
5. Save the file in json format
With open ("guoke01.json", 'walled, encoding='utf-8') as f: f.write (json_data_list)
6. Build a loop crawl
For page in range (1,101): print ("= crawling industry data =\ n" .format (page))
Optimize the code to pass the page in
Base_url = 'https://www.guokr.com/ask/highlight/?page={}'.format(str(page))
Add the list to the for loop
Data_list = []
Finally, run the code, as shown in the following figure:
These are all the contents of the article "how to use regular expressions to extract data from text classes in Python". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.