Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What is the method to see the data only after Python crawler collects and logs in?

2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/03 Report--

This article mainly introduces "what is the method to see the data after the collection and landing of Python reptiles". In the daily operation, it is believed that many people have doubts about the method that many people can see the data only after the collection and landing of Python reptiles. The editor consulted all kinds of materials and sorted out simple and useful operation methods. I hope it will be helpful to answer the question of "what is the way to see the data only after the collection and landing of Python reptiles?" Next, please follow the editor to study!

In the process of collecting websites, some websites with high data value will restrict the visiting behavior of visitors. In this case, it is recommended to log in to obtain the cookie of the target website, and then use cookie to cooperate with the agent IP for data collection and analysis.

1 sign in using the form

This is an post request, which first sends the form data to the server, and then the server stores the returned cookie locally.

#!-*-encoding:utf-8-*-import requests import random import requests.adapters # the target page to visit targetUrlList = ["https://httpbin.org/ip"," https://httpbin.org/headers", "https://httpbin.org/user-agent", ] # proxy server (product website www.16yun.cn) proxyHost = "t.16yun.cn" proxyPort = "31111" # proxy tunnel verification information proxyUser = "username" proxyPass = "password" proxyMeta = "http://%(user)s:%(pass)s@%(host)s:%(port)s"% {" host ": proxyHost," port ": proxyPort," user ": proxyUser "pass": proxyPass,} # set http and https access using HTTP proxy proxies = {"http": proxyMeta, "https": proxyMeta,} # visit the website three times Using the same Session (keep-alive), all can maintain the same public network IP s = requests.session () # setting cookie cookie_dict = {"JSESSION": "123456789"} cookies = requests.utils.cookiejar_from_dict (cookie_dict, cookiejar=None, overwrite=True) s.cookies = cookies for i in range (3): for url in targetUrlList: r = s.get (url, proxies=proxies) print r.text

2 use cookie to log in

Using cookie to log in, the server will think you are a logged-in user, so it will return you a logged-in content. Therefore, the situation where a CAPTCHA is needed can be solved by using cookie with CAPTCHA login.

Response_captcha = requests_session.get (url=url_login, cookies=cookies) response1 = requests.get (url_login) # not logged in response2 = requests_session.get (url_login) # logged in because I got Response Cookie! Response3 = requests_session.get (url_results) # has been logged in because I got Response Cookie!

If there is a CAPTCHA, it is not possible to use response = requests_session.post (url=url_login, data=data) at this time. This should be done as follows:

Response_captcha = requests_session.get (url=url_login, cookies=cookies) response1 = requests.get (url_login) # not logged in response2 = requests_session.get (url_login) # logged in because I got Response Cookie! Response3 = requests_session.get (url_results) # has been logged in because I got Response Cookie! At this point, the study of "what is the way to see the data after the collection and landing of Python reptiles" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report