How to use the regular processing function get_matches based on curl data acquisition 07/01 Update SLTechnology News&Howtos

How to use the regular processing function get_matches based on curl data acquisition

2025-07-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/03 Report--

This article will explain in detail how to use the regular processing function get_matches based on curl data acquisition. The editor thinks it is very practical, so I share it with you as a reference. I hope you can get something after reading this article.

We can get the html file we need before, but now we need to process the file to get the data we need to collect.

For parsing html documents, there is no parsing class like XML, because HTML documents have a lot of unpaired tags and are not strict. At this point, you need to use some other helper classes. Simplehtmldom is a parsing class that manipulates HTML documents in a similar way to JQuery. It is very convenient to get the desired data, but the speed is slow. This is not the focus of our discussion here. I mainly use regularities to match the data I need to collect. I can get the information I need to collect very quickly.

Considering that get_html can judge the returned data, but get_htmls has no way to judge it, the following two functions are written to facilitate the mode and call:

The copy code is as follows:

Function get_matches ($pattern,$html,$err_msg,$multi=false,$flags=0,$offset=0) {

If (! $multi) {

If (! preg_match ($pattern,$html,$matches,$flags,$offset)) {

Echo $err_msg. "! error message:" .get _ preg_err_msg (). "\ n"

Return false

}

} else {

If (! preg_match_all ($pattern,$html,$matches,$flags,$offset)) {

Echo $err_msg. "! error message:" .get _ preg_err_msg (). "\ n"

Return false

}

Return $matches

}

Function get_preg_err_msg () {

$error_code = preg_last_error ()

Switch ($error_code) {

Case PREG_NO_ERROR:

$err_msg = 'PREG_NO_ERROR'

Break

Case PREG_INTERNAL_ERROR:

$err_msg = 'PREG_INTERNAL_ERROR'

Break

Case PREG_BACKTRACK_LIMIT_ERROR:

$err_msg = 'PREG_BACKTRACK_LIMIT_ERROR'

Break

Case PREG_RECURSION_LIMIT_ERROR:

$err_msg = 'PREG_RECURSION_LIMIT_ERROR'

Break

Case PREG_BAD_UTF8_ERROR:

$err_msg = 'PREG_BAD_UTF8_ERROR'

Break

Case PREG_BAD_UTF8_OFFSET_ERROR:

$err_msg = 'PREG_BAD_UTF8_OFFSET_ERROR'

Break

Default:

Return 'unknown error!'

}

Return $err_msg.':'. $error_code

}

You can call it like this:

The copy code is as follows:

$url = 'http://www.baidu.com';

$html = get_html ($url)

$matches = get_matches ('!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.