In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/03 Report--
This article will explain in detail how to use the regular processing function get_matches based on curl data acquisition. The editor thinks it is very practical, so I share it with you as a reference. I hope you can get something after reading this article.
We can get the html file we need before, but now we need to process the file to get the data we need to collect.
For parsing html documents, there is no parsing class like XML, because HTML documents have a lot of unpaired tags and are not strict. At this point, you need to use some other helper classes. Simplehtmldom is a parsing class that manipulates HTML documents in a similar way to JQuery. It is very convenient to get the desired data, but the speed is slow. This is not the focus of our discussion here. I mainly use regularities to match the data I need to collect. I can get the information I need to collect very quickly.
Considering that get_html can judge the returned data, but get_htmls has no way to judge it, the following two functions are written to facilitate the mode and call:
The copy code is as follows:
Function get_matches ($pattern,$html,$err_msg,$multi=false,$flags=0,$offset=0) {
If (! $multi) {
If (! preg_match ($pattern,$html,$matches,$flags,$offset)) {
Echo $err_msg. "! error message:" .get _ preg_err_msg (). "\ n"
Return false
}
} else {
If (! preg_match_all ($pattern,$html,$matches,$flags,$offset)) {
Echo $err_msg. "! error message:" .get _ preg_err_msg (). "\ n"
Return false
}
}
Return $matches
}
Function get_preg_err_msg () {
$error_code = preg_last_error ()
Switch ($error_code) {
Case PREG_NO_ERROR:
$err_msg = 'PREG_NO_ERROR'
Break
Case PREG_INTERNAL_ERROR:
$err_msg = 'PREG_INTERNAL_ERROR'
Break
Case PREG_BACKTRACK_LIMIT_ERROR:
$err_msg = 'PREG_BACKTRACK_LIMIT_ERROR'
Break
Case PREG_RECURSION_LIMIT_ERROR:
$err_msg = 'PREG_RECURSION_LIMIT_ERROR'
Break
Case PREG_BAD_UTF8_ERROR:
$err_msg = 'PREG_BAD_UTF8_ERROR'
Break
Case PREG_BAD_UTF8_OFFSET_ERROR:
$err_msg = 'PREG_BAD_UTF8_OFFSET_ERROR'
Break
Default:
Return 'unknown error!'
}
Return $err_msg.':'. $error_code
}
You can call it like this:
The copy code is as follows:
$url = 'http://www.baidu.com';
$html = get_html ($url)
$matches = get_matches ('!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.