Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to use regular expressions to extract data from text classes in Python

2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

Editor to share with you how to use regular expressions in Python to extract the data of the text class, I believe most people do not know much about it, so share this article for your reference. I hope you will gain a lot after reading this article. Let's learn about it together.

Environment introduction:

Python 3.6

Pycharm

Requests

Re

Json

General ideas of reptiles

1. Determine the crawled url path and headers parameters.

2. Send a request-- requests simulates the browser to send a request to obtain response data

3. Parsing data-- re module: provides all regular expression functions

4. Save data-save data in json format

Complete steps:

1. Install the library and import module

If you do not have the library installed, you can WIN+R, then type cmd, and finally enter the code to install

Pip install requests

After the installation is complete, you can enter the code to view all the libraries you have installed:

Pip list

Import module

Import requests import re import json

2. Determine the crawled url path and headers parameters.

Base_url = 'https://www.guokr.com/ask/highlight/'headers = {' User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36'}

3. Send a request-- requests simulates the browser to send a request to obtain response data

Response = requests.get (base_url, headers=headers) data = response.text

4. Parsing data-re module: provides all regular expression functions

Href= "https://www.guokr.com/question/669761/"> Indians call a man's genitals Linga, and a woman's genitals Yoni. The combination of Linga and Yoni is yoga. Is this true or false?

Compiling code objects precompiled by regular expressions is faster than using strings directly, because the interpreter must compile strings into code objects before executing code in the form of strings

Pattern = re.compile ('

5. Save the file in json format

With open ("guoke01.json", 'walled, encoding='utf-8') as f: f.write (json_data_list)

6. Build a loop crawl

For page in range (1,101): print ("= crawling industry data =\ n" .format (page))

Optimize the code to pass the page in

Base_url = 'https://www.guokr.com/ask/highlight/?page={}'.format(str(page))

Add the list to the for loop

Data_list = []

Finally, run the code, as shown in the following figure:

These are all the contents of the article "how to use regular expressions to extract data from text classes in Python". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report