In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/02 Report--
Python how to use regular expressions to extract information from the web page, many novices are not very clear about this, in order to help you solve this problem, the following editor will explain in detail for you, people with this need can come to learn, I hope you can get something.
Confirm the data source
When practicing crawling data from a web page, in order not to trigger the anti-crawling mechanism of the website, it is recommended to open the web page and save it as a html file. I saved a page about real estate information from a website and tried to crawl information from it.
Then open the file with notepad++ and look at the contents of the file.
Write regular expressions
Doesn't it feel like there's no way to start? Don't panic. Take your time. We confirm the characteristics of the information by comparing the web page and the web page code.
Real estate name:
A sharp drop of 600000 to sell all-money customers to a good location in Baoshan second Village.
Copy this information, go to the html file and find it through ctrl+F, and then carefully check the character characteristics before and after the "real estate name":
The preceding character features:
; ">
The following character features:
Now write a regular expression according to the character characteristics before and after the real estate name, and group the "real estate name" (? P. name?):
; "> (? P.C.?)
Note:. *? It is often used when crawling a web page to indicate that it matches any number of content until the end of the character feature that follows.
Room type:
Now look at the character characteristics before and after the next "room type" message:
The preceding character features:
Span >
The following character features:
To do the same, extract the "room type" information and name it in groups (? P.C.):
Span > (? P.C.)
Note: there is a large piece of web code between the real estate name and the house type, we can write. *? The code should be skipped.
Area:
Now look at the character characteristics before and after the next "area" information:
The preceding character features:
The following character features:
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.