In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly introduces "how to use python Crawler Douban Movie data". In daily operation, I believe many people have doubts about how to use python Crawler Douban Movie data. The editor consulted all kinds of materials and sorted out simple and easy-to-use operation methods. I hope it will be helpful for you to answer the doubts about "how to use python Crawler Douban Movie data". Next, please follow the editor to study!
Next, we obtain the agent IP data from the domestic Gaoni agent IP. Import os
Import time
Import requests
From bs4 import BeautifulSoup
# num obtains the proxy data def in the num page of the domestic Gao Nic ip web page
Fetch_proxy (num): # modify the current working folder os.chdir (r'/Users/apple888/PycharmProjects/proxy IP') api = 'http://www.xicidaili.com/nn/{}' header = {' User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS)
X 10 / 12 / 3) AppleWebKit/537.36 (KHTML
Like Gecko) Chrome/56.0.2924.87 Safari/537.36'} fp = open ('host.txt',' axioms, encoding= ('utf-8'))
For i in range (num+1): api = api.format (1) respones = requests.get (url=api, headers=header) soup = BeautifulSoup (respones.text, 'lxml') container = soup.find_all (name='tr',attrs= {' class':'odd'})
For tag in container:
Try: con_soup = BeautifulSoup (str (tag) 'lxml') td_list = con_soup.find_all (' td') ip = str (td_list [1]) port = str (td_list [2]) IPport = ip +'\ t'+ port +'\ n' fp.write (IPport)
Except Exception as e: print ('No IP address') Time.sleep (1) fp.close ()
We are going to capture the agents of the ten pages of the domestic Gaoni agent IP network.
However, it is not good to have an agent IP, because we do not know whether this agent can be used or whether it is effective.
Below we use Baidu network to test (big companies are not afraid of our high-frequency access in a short period of time), the code: import os
Import time
Import requests
From bs4 import BeautifulSoup
Def test_proxy (): n = 1 os.chdir (r'/Users/apple888/PycharmProjects/proxy IP') url = 'https://www.baidu.com' fp = open (' host.txt', 'r') ips = fp.readlines () proxys = list ()
For p in ips: ip = p.strip ('\ n'). Split ('\ t') proxy = 'http:\\' + ip [0] +':'+ ip [1] proxies = {'proxy': proxy} proxys.append (proxies)
For pro in proxys:
Try: s = requests.get (url, proxies=pro) print ('No. {} ip: {} status {}' .format (NMagneProp. Statusproof code)) except Exception as e: print (e) Natural1
At this point, the study on "how to use python Crawler Douban Movie data" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.