In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/02 Report--
How to implement form interaction in Python, for this problem, this article details the corresponding analysis and solution, hoping to help more small partners who want to solve this problem find a simpler and easier way.
1. Form interaction
import requests
params={ 'key1':'value1', 'key2':'value2', 'key3':'value3'}html=requests.post(url,data=params)print(html.text)
Since most websites now have multiple login methods, such as logging in via SMS or WeChat, it is relatively troublesome to interact directly through the call form. We will not introduce it in detail here. The main form interaction can be found by logging in later. See below.
2. Reverse engineering How to build forms
For web pages loaded with changes after login, you can reverse engineer your form to find different web page information. Let's first look at how to build a form.
(1) Log in, open Chrome, select Netwoek
(2) Search for the keyword python to get the post form information, see Figure 1 and Figure 2 below.
figure I
figure II
3. Cookie simulation login
Sometimes form fields may be constructed with encryption or other forms of wrapping, which is difficult and cumbersome, so it is necessary to choose to simulate login by submitting Cookie information at this time.
What are cookies?
Cookies are data stored on the user's local terminal by certain websites in order to identify the user and track the session. Generally, the Internet and e-commerce use tracking cookies as unique identifiers to identify users.
Visible cookies are with user information, so cookies can be used to simulate landing on the website.
Continue to tick the net as shown below to see cookies as follows:
Pass Code:
import requestsurl='https://www.lagou.com'headers={ 'cookie': 'xxx'}html=requests.get(url,headers=headers)print(html.text)
Get the login page source code content:
4. Case practice: Crawling for recruitment information
Pullhook combines asynchronous loading techniques with form submission. Let's analyze this site.
Analytical thinking:
(1) Log in and open the ticked website. The page is as follows. Search keywords: big data.
(2) By observation, the web page element is not in the web page source code, indicating that asynchronous loading technology is used.
(3) Refresh the page, view the newwork tab, select XHR, you can see asynchronous load (AJAX) and the json information returned in the corresponding Response, proving that you can get data from here.
(4) Form data can be found by turning pages, and pn will change continuously by turning pages. The above form interaction has been introduced, so it will not be repeated here.
(5) In netwoek->Preview, you can see the number of messages per page and the total number of messages at the same time.
(6) Since a single cookie will be restricted, a dynamic cookie is added to grab data. After the experiment, it is feasible. You can refer to it.
def get_cookie(): #URL of the original page, url_start url = 'https://www.lagou.com/jobs/list_%E6%95%B0%E6%8D%AE%E5%88%86%E6%9E%90? labelWords=&fromSearch=true&suginput=' s = requests.Session() s.get(url, headers=headers, timeout=3) #Request Home Get cookies cookie = s. cookie #is the cookie obtained this time return cookie
The detailed codes are as follows:
import requestsimport jsonimport timeimport pandas as pd#import csv
headers = { 'origin': 'https://www.lagou.com', 'accept': 'xxxx', 'user-agent': 'xxxx', 'referer': 'xxxx'}
#Get cookie value def get_cookie(): #URL of the original page, url_start url = 'https://www.lagou.com/jobs/list_%E6%95%B0%E6%8D%AE%E5%88%86%E6%9E%90? labelWords=&fromSearch=true&suginput=' s = requests.Session() s.get(url, headers=headers, timeout=3) #Request Home Get cookies cookie = s. cookie #is the cookie obtained this time return cookie
#define function def get_page(url, params) to get pages: html = requests.post(url, data=params, headers=headers, cookies=get_cookie(), timeout=5) #Load Html file of web page as json file json_data = json.loads(html.text) #Parse the json file, followed by brackets for the path to parse total_Count = json_data['content']['positionResult']['totalCount'] page_number = int(total_Count/15) if int(total_Count/15) < 30 else 30 #Call the get_info function, passing in url and number of pages get_info(url, page_number)
#define the function def get_info(url, page): for pn in range(1, page+1): # post Request parameters params = { "first": "true", "pn": str(pn), "kd": "Big Data" } #Get information and catch exceptions try: html = requests.post(url, data=params, headers=headers, cookies=get_cookie(), timeout=5) #print(url, html.status_code) #Load Html file of web page as json file json_data = json.loads(html.text) #Parse the json file, followed by brackets for the path to parse results = json_data['content']['positionResult']['result'] df = pd.DataFrame(results) #print(df.iloc[:,0:6]) if pn == 1: total_df = df else: total_df = pd.concat([total_df,df],axis=0) Sleep for 2 seconds time.sleep(2) except requests.exceptions.ConnectionError: print("requests.exceptions.ConnectionError") pass #total_df.to_csv ('Recruitment Info. csv', sep =',', header = True, index = False) total_df.to_excel ('Big Data. xls', header=True, index=False)
#URL of original web page #URLurl of requested JSON data = "https://www.lagou.com/jobs/positionAjax.json? needAddtionalResult=false"params = { "first": "true", "pn": 1, "kd": "Big Data"}get_page(url,params) The answer to how to implement form interaction in Python is shared here. I hope the above content can help you to some extent. If you still have a lot of doubts, you can pay attention to the industry information channel for more relevant knowledge.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.