In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/03 Report--
This article mainly introduces the Python crawler agent ip exception and timeout how to do, has a certain reference value, interested friends can refer to, I hope you can learn a lot after reading this article, the following let the editor take you to understand it.
First, anti-crawler mechanism: I will not introduce it in detail here.
I believe that most reptile workers know better, so I won't introduce them in detail here.
2. Timeout setting: timeout setting of selenium+chrome
Timeout setting for selenium+chrome:
Show wait: wait for a condition to occur, and then continue coding.
Driver=webdriver.Firefox ()
Driver.get ("http://somedomain/url_that_delays_loading")"
Try:
Element=WebDriverWait (driver,10) .modification (# modification time here
EC.presence_of_element_located ((By.ID, "myDynamicElement"))
)
Finally:
Driver.quit ()
Implicit wait: tell WebDriver that when they try to find one or more elements (if they can't use them immediately), they take turns asking DOM. The default setting is 0. Once set, the lifecycle of waiting for an instance of the WebDriver object for hiding.
Driver=webdriver.Firefox ()
Driver.implicitly_wait (10) # seconds
Driver.get ("http://somedomain/url_that_delays_loading")"
MyDynamicElement=driver.find_element_by_id ("myDynamicElement")
Third, exception handling: python usually uses try& exception statements to handle exceptions
It often happens in programs that python usually uses try& check statements to handle exceptions, while the purpose of try&except statements is to catch exceptions, and of course, a more important use is to ignore exceptions. Since most of the exceptions in the crawler cannot be re-requested, fixing its task queue is actually the most labor-saving way when an exception is found.
Fourth, self-restart settings.
If a program fails many times in some cases, or if it runs long enough, its performance may degrade, just like a computer, the longer it takes, the slower it recovers, which is a good way. Of course, this is a temporary cure rather than a permanent cure, but it is undoubtedly one of the most labor-saving ways. Self-restart is also a good way to keep the program running when the set restart conditions are met.
How to solve the problem of ip exception and timeout of Python crawler agent? When programmers type the code, there must be some errors, especially for programs like Python crawler, which are not sure that every request will steadily return the same result, such as enhanced anti-crawler mechanism, proxy IP timeout and so on. This kind of situation can be solved in time in order to ensure a good return of the crawler.
Thank you for reading this article carefully. I hope the article "what to do about ip anomalies and timeouts in Python crawlers" shared by the editor will be helpful to everyone. At the same time, I also hope that you will support us and pay attention to the industry information channel. More related knowledge is waiting for you to learn!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.