In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/03 Report--
This article mainly introduces how to use the single-page parallel acquisition function get_htmls based on curl data acquisition, which has a certain reference value, and interested friends can refer to it. I hope you can learn a lot after reading this article.
Before using get_html () to achieve simple data collection, because it is an execution to collect data transmission time will be the total download time of all pages, a page assuming 1 second, then 10 pages is 10 seconds. Fortunately, curl also provides the function of parallel processing.
To write a parallel collection function, we must first know what kind of pages to collect and what requests to use for the collected pages before we can write a relatively commonly used function.
Functional requirements Analysis:
Return to what?
Of course, the html set of each page is a composite array.
What parameters are passed?
When we wrote get_html (), we knew that we could use the options array to pass more curl parameters, and the ability to write functions that collect multiple pages at the same time had to be preserved.
What type of parameter?
Whether you are requesting a web page HTML or calling the Internet api interface, get and post always request the same page or interface with different parameters. Then the type of parameter is:
Get_htmls ($url,$options)
$url is string
Options, is a two-dimensional array, and the parameters of each page are an array.
In that case, it seems to have solved the problem. But I looked all over the curl manual and didn't see where the get parameter was passed, so I could only pass $url as an array and add a method parameter.
The prototype of the function is get_htmls ($urls,$options = array, $method = 'get'). The code is as follows:
The copy code is as follows:
Function get_htmls ($urls, $options = array (), $method = 'get') {
$mh = curl_multi_init ()
If ($method = = 'get') {/ / get is the most commonly used way to pass values
Foreach ($urls as $key= > $url) {
$ch = curl_init ($url)
$options [CURLOPT _ RETURNTRANSFER] = true
$options [CURLOPT _ TIMEOUT] = 5
Curl_setopt_array ($ch,$options)
$curls [$key] = $ch
Curl_multi_add_handle ($mh,$curls [$key])
}
} elseif ($method = = 'post') {/ / post
Foreach ($options as $key= > $option) {
$ch = curl_init ($urls)
$option [CURLOPT _ RETURNTRANSFER] = true
$option [CURLOPT _ TIMEOUT] = 5
$option [CURLOPT _ POST] = true
Curl_setopt_array ($ch,$option)
$curls [$key] = $ch
Curl_multi_add_handle ($mh,$curls [$key])
}
} else {
Exit ("Parameter error!\ n")
}
Do {
$mrc = curl_multi_exec ($mh,$active)
Curl_multi_select ($mh); / / reduce CPU pressure and comment out CPU pressure increases
} while ($active)
Foreach ($curls as $key= > $ch) {
$html = curl_multi_getcontent ($ch)
Curl_multi_remove_handle ($mh,$ch)
Curl_close ($ch)
$htmls [$key] = $html
}
Curl_multi_close ($mh)
Return $htmls
}
Commonly used get requests are realized by changing url parameters, and because our function is for data collection. It must be classified collection, so the URL is similar to this:
Http://www.baidu.com/s?wd=shili&pn=0&ie=utf-8
Http://www.baidu.com/s?wd=shili&pn=10&ie=utf-8
Http://www.baidu.com/s?wd=shili&pn=20&ie=utf-8
Http://www.baidu.com/s?wd=shili&pn=30&ie=utf-8
Http://www.baidu.com/s?wd=shili&pn=50&ie=utf-8
The above five pages are very regular, changing only the value of pn.
The copy code is as follows:
$urls = array ()
For ($iTunes 1; $I
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.