Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to solve the problem of PHP CURL or file_get_contents getting the code of web page title and the stability of their efficiency?

2025-01-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/03 Report--

This article mainly explains "how to solve the problem of the stability of the code and the efficiency of obtaining page titles by PHP CURL or file_get_contents". The content of the explanation in the article is simple and clear, and it is easy to learn and understand. please follow the editor's way of thinking to study and learn "how to solve the stability of the code and the efficiency of PHP CURL or file_get_contents to obtain page titles".

Recommended method CURL acquisition

Use file_get_contents

Look at the performance of file_get_contents

1) every time fopen/file_get_contents requests the data in the remote URL, the DNS query will be re-done, and the DNS information will not be cached. However, CURL automatically caches DNS information. Requests for web pages or images under the same domain name need only one DNS query. This greatly reduces the number of DNS queries. So the performance of CURL is much better than fopen/file_get_contents.

2) fopen/file_get_contents uses http_fopen_wrapper instead of keeplive when requesting HTTP. And curl can. This makes curl more efficient when multiple links are requested multiple times. (setting the header header should work)

3) the fopen/file_get_contents function is affected by the configuration of the allow_url_open option in the php.ini file. If the configuration is turned off, the function becomes invalid. Curl is not affected by this configuration.

4) curl can simulate a variety of requests, such as POST data, form submission, etc., and users can customize the request according to their own needs. On the other hand, fopen/file_get_contents can only use get to obtain data.

5) fopen/file_get_contents cannot download binaries correctly

6) fopen/file_get_contents does not handle ssl requests correctly

7) curl can take advantage of multithreading

8) if there is a problem with the network when using file_get_contents, it is easy to pile up some processes here

9) if you want to make a continuous connection, request multiple pages multiple times. Then something will go wrong with file_get_contents. The content obtained may also be wrong. So when doing some similar collection work, there must be a problem. Use curl for collection and capture, and if you still believe it or not, let's do another test.

Performance comparison between curl and file_get_contents the PHP source code is as follows:

1829.php

Test access

File_get_contents speed: 4.2404510975 seconds

Curl speed: 2.8205530643 seconds

Curl is about 30% faster than file_get_contents, and the most important thing is that the server load is lower.

Efficiency and Stability of file_get_contents and curl of ps:php function

I am used to using the convenient and quick file_get_contents function to crawl the content of other websites, but I always encounter the problem of getting failed. Although the timeout is set according to the example in the manual, it doesn't work well most of the time:

$config ['context'] = stream_context_create (array (' http' = > array ('method' = > "GET",' timeout' = > 5))

'timeout' = > 5 beat / this timeout is unstable and often does not work well. At this point, if you look at the connection pool of the server, you will find a bunch of errors similar to the following, which give you a terrible headache:

File_get_contents (http://***): failed to open stream...

As a last resort, I installed the curl library and wrote a function replacement:

Function curl_get_contents ($url) {$ch = curl_init (); curl_setopt ($ch,CURLOPT_ URL, $url); / / set the url address for access / / curl_setopt ($ch,CURLOPT_HEADER,1); / / whether to display header information curl_setopt ($ch,CURLOPT_ TIMEOUT, 5); / / set timeout curl_setopt ($ch,CURLOPT_ USERAGENT, _ USERAGENT_) / / user access agent User-Agent curl_setopt ($ch,CURLOPT_ REFERER,_REFERER_); / / set referer curl_setopt ($ch,CURLOPT_FOLLOWLOCATION,1); / / track 301 curl_setopt ($ch,CURLOPT_ RETURNTRANSFER, 1); / / return result $r = curl_exec ($ch); curl_close ($ch); return $r;}

In this way, there are no more problems except for the real network problems.

This is a test that others have done about curl and file_get_contents:

The number of seconds it takes for file_get_contents to grab google.com:

2.31319094

2.30374217

2.21512604

3.30553889

2.30124092

Time used by curl:

0.68719101

0.64675593

0.64326

0.81983113

0.63956594

Thank you for your reading, the above is the content of "how to solve the problem of PHP CURL or file_get_contents getting the code of web page title and the stability of both efficiency". After the study of this article, I believe you have a deeper understanding of how to solve the problem of how to solve the problem of PHP CURL or file_get_contents getting the code of page title and the stability of both efficiency. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report