In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/03 Report--
This article focuses on "what are the differences between php cURL and Rolling cURL concurrency". Interested friends may wish to take a look. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn "what are the differences between php cURL and Rolling cURL concurrency?"
In the actual project or in the process of writing gadgets (such as news aggregation, commodity price monitoring, price comparison), you usually need to obtain data from a third-party website or API interface. When you need to deal with a URL queue, in order to improve performance, you can use the curl_multi_* family functions provided by cURL to achieve simple concurrency.
This paper will discuss two specific implementation methods, and make a simple performance comparison of different methods.
1. Classical cURL concurrency Mechanism and its existing problems
Classic cURL implementation mechanisms are easy to find online, such as the following implementation methods in the PHP online manual:
The copy code is as follows:
Function
Classic_curl ($urls
$delay)
{
$queue
= curl_multi_init ()
$map
= array ()
Foreach
($urls
As
$url)
{
/ /
Create cURL resources
$ch
= curl_init ()
/ /
Set URL and other appropriate options
Curl_setopt ($ch
CURLOPT_URL, $url)
Curl_setopt ($ch
CURLOPT_TIMEOUT, 1)
Curl_setopt ($ch
CURLOPT_RETURNTRANSFER, 1)
Curl_setopt ($ch
CURLOPT_HEADER, 0)
Curl_setopt ($ch
CURLOPT_NOSIGNAL, true)
/ /
Add handle
Curl_multi_add_handle ($queue
$ch)
$map [$url]
= $ch
}
$active
= null
/ /
Execute the handles
Do
{
$mrc
= curl_multi_exec ($queue
$active)
}
While
($mrc
= = CURLM_CALL_MULTI_PERFORM)
While
($active
> 0 & & $mrc
= = CURLM_OK) {
If
(curl_multi_select ($queue
0.5)! =-1) {
Do
{
$mrc
= curl_multi_exec ($queue
$active)
}
While
($mrc
= = CURLM_CALL_MULTI_PERFORM)
}
}
$responses
= array ()
Foreach
($map
As
$url= > $ch)
{
$responses [$url]
= callback (curl_multi_getcontent ($ch))
$delay)
Curl_multi_remove_handle ($queue
$ch)
Curl_close ($ch)
}
Curl_multi_close ($queue)
Return
$responses
}
First, push all URL into the concurrent queue, then execute the concurrent process, and wait for all requests to be received for subsequent processing such as data parsing. In the actual processing process, due to the influence of network transmission, the contents of some URL will be returned first over other URL, but the classical cURL concurrency must wait for the slowest URL to return before it begins to process, which means that CPU is idle and wasted. If the URL queue is short, the idle and waste is still acceptable, but if the queue is long, the wait and waste will become unacceptable.
two。 Improved Rolling cURL concurrency mode
After careful analysis, it is not difficult to find that there is room for optimization in classical cURL concurrency. In the way of optimization, when a URL request is completed, it is processed as soon as possible, waiting for other URL to return, rather than waiting for the slowest interface to return before starting processing, so as to avoid the idle and waste of CPU. Without much gossip, the specific implementation is posted below:
The copy code is as follows:
Function
Rolling_curl ($urls
$delay)
{
$queue
= curl_multi_init ()
$map
= array ()
Foreach
($urls
As
$url)
{
$ch
= curl_init ()
Curl_setopt ($ch
CURLOPT_URL, $url)
Curl_setopt ($ch
CURLOPT_TIMEOUT, 1)
Curl_setopt ($ch
CURLOPT_RETURNTRANSFER, 1)
Curl_setopt ($ch
CURLOPT_HEADER, 0)
Curl_setopt ($ch
CURLOPT_NOSIGNAL, true)
Curl_multi_add_handle ($queue
$ch)
$map [(string)
$ch]
= $url
}
$responses
= array ()
Do
{
While
($code
= curl_multi_exec ($queue
$active))
= = CURLM_CALL_MULTI_PERFORM)
If
($code
! = CURLM_OK) {break
}
/ /
A request was just completed-find out which one
While
($done
= curl_multi_info_read ($queue))
{
/ /
Get the info and content returned on the request
$info
= curl_getinfo ($done ['handle'])
$error
= curl_error ($done ['handle'])
$results
= callback (curl_multi_getcontent ($done ['handle'])
$delay)
$responses [$map [(string)]
$done ['handle']]
= compact ('info'
'error'
'results')
/ /
Remove the curl handle that just completed
Curl_multi_remove_handle ($queue
$done ['handle'])
Curl_close ($done ['handle'])
}
/ /
Block for data in / output; error handling is done by curl_multi_exec
If
($active
> 0) {
Curl_multi_select ($queue
0.5)
}
}
While
($active)
Curl_multi_close ($queue)
Return
$responses
}
3. Performance comparison of two concurrent implementations
The performance comparison test before and after the improvement is carried out on the LINUX host, and the concurrent queues used in the test are as follows:
Http://a.com/item.htm?id=14392877692
Http:/a.com/item.htm?id=16231676302
Http://a.com/item.htm?id=5522416710
Http://a.com/item.htm?id=16551116403
The principles of the experimental design and the format of the performance test results are briefly described: in order to ensure the reliability of the results, each group of experiments are repeated 20 times. In a single experiment, given the same interface URL set, the time consumption of Classic (classical concurrency mechanism) and Rolling (improved concurrency mechanism) is measured respectively (in seconds), the shorter one wins (Winner), and the time saved (Excellence) is calculated. Second) and the percentage of performance improvement (Excel. %). In order to be as close to the real request as possible while keeping the experiment simple, only a simple regular expression matching is done in the processing of the returned results, but no other complex operations are carried out. In addition, in order to determine the impact of result processing callbacks on performance comparison test results, usleep can be used to simulate the responsible data processing logic (such as extraction, participle, writing to files or databases, etc.).
The callback functions used in the performance test are:
The copy code is as follows:
Function
Callback ($data
$delay)
{
Preg_match_all ('/ (. +) / iU'
$data
$matches)
Usleep ($delay)
Return
Compact ('data'
'matches')
}
When there is no delay in data processing callback: Rolling Curl is slightly better, but the performance improvement is not obvious.
At this point, I believe you have a deeper understanding of "what are the differences between php cURL and Rolling cURL concurrency?" you might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.