In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-05 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/03 Report--
This article mainly explains "how to use Charles and requests to simulate Weibo login", the content of the article is simple and clear, easy to learn and understand, the following please follow the editor's ideas slowly in depth, together to study and learn "how to use Charles and requests to simulate Weibo login" bar!
1. Record the whole login process with Charles
First, we run Charles and start recording. Then open a Chrome browser, choose to use Charles proxy, open the home page of Weibo, and the login page appears (if you have previously logged in to Weibo, log out first). Enter a user name and password to log in, and after a successful login, you can stop the recording of Charles. In this way, we used Charles to fully record the login process of Weibo. See figure:
We write a Python class for the entire login process, which is defined as:
Class WeiboLogin: user_agent = ('Mozilla/5.0 (Windows NT 5.1) AppleWebKit/536.11 (KHTML, like Gecko)' 'Chrome/20.0.1132.57 Safari/536.11') def _ init__ (self, username, password) Cookies_tosave='weibo.cookies'): self.weibo_user = username self.weibo_password = password self.cookies_tosave = cookies_tosave self.session = requests.session () self.session.headers ['User-Agent'] = self.user_agent
Next we analyze the login process and implement the methods of this class one by one.
two。 Analyze the login process
Switch the main window of Charles to the "Sequence" tab
We can observe the Weibo login process recorded by Charles in chronological order, and we find that the Host of the first suspicious request is:
Login.sina.com.cn
Click the record, and the full content of the request appears below. Its path is:
GET / sso/prelogin.php?entry=weibo&callback=sinaSSOController.preloginCallBack&su=&rsakt=mod&client=ssologin.js (v1.4.19) & _ = 1542456042531 HTTP/1.1
The parameter of the GET request _ = 1542456042531 looks like a timestamp, which is defined as preloginTimeStart in ssologin.js (see how it is found later) and can be obtained with int (time.time () * 1000).
Judging from the name prelogin.php, it is a pre-login, that is, it takes something from the server before you enter your user name and password:
Implement this prelogin with Python:
Def prelogin (self): preloginTimeStart = int (time.time () * 1000) url = ('https://login.sina.com.cn/sso/prelogin.php?' 'entry=weibo&callback=sinaSSOController.preloginCallBack&'' su=&rsakt=mod&client=ssologin.js (v1.4.19) &'_ =% s')% preloginTimeStart resp = self.session.get (url) pre_login_str = re.match (r'[^ {] + ({. +?})' Resp.text) .group (1) pre_login = json.loads (pre_login_str) pre_login ['preloginTimeStart'] = preloginTimeStart print (' pre_login 1 pre_login_str) return pre_login
What is the use of these things that have been brought in advance? I don't know yet. Move on.
Supplement: about authentication code
When I first wrote this tutorial yesterday, I didn't encounter the CAPTCHA. Today, I encountered the CAPTCHA pop-up, which is really gratifying. I can add this part to it.
Comparing the URL parameters of yesterday's prelogin, you can't find that today's parameters are two more:
Su=xxxxx is the encrypted (actually base64-encoded) user name
Checkpin=1 tells the server to check the CAPTCHA (I'll go and write the crawler myself would never do that)
Request the server with these two parameters, and the value of showpin will be returned:
Since you want to display the pin (verification code), download the verification code, which is located at:
Https://login.sina.com.cn/cgi/pin.php?r=2855501&s=0&p=aliyun-a34a347956ab8e98d6eb1a99dfddd83bc708
How did this come from? Directly press Ctrl+F to open the "Text to Find" window to search for "pin.php":
This Find window is useful because it allows us to find specific text in all recorded requests and responses, and it also supports regular expressions, case-sensitive, and full-word search. Finding only the whole word is very helpful in finding short words like su, and you can filter a large number of words that contain it, such as super.
In particular, I would like to explain why I only choose to look in "Response Body".
Because we are looking for how the above URL is generated, we think it is implemented in a certain piece of code in a js file, so it must be in Response Body, which can also filter out a lot of extraneous information.
Through the above filtering, directly locate the relevant code, double-click in, and then a little search, you will find the corresponding code:
Var pincodeUrl = "https://login.sina.com.cn/cgi/pin.php";...return pincodeUrl +"? r = "+ Math.floor (Math.random () * 100000000) +" & s = "+ size + (pcid.length > 0?" & p = "+ pcid:")
With this js, it is easy to implement it with Python, and the little apes can try it for themselves.
With the URL of the CAPTCHA, we download it with self.session and save it as a file. Before POST all the login data, get it through pin = input ('> please input pin:'), add it to the POST data and send it with POST.
The Host of the second suspicious request is the same as the first, and the path is:
POST / sso/login.php?client=ssologin.js (v1.4.19) HTTP/1.1
This is a POST. Let me take a look at its POST data, select this record, click the "Contents" tab, and then click the "Form" tab to see its POST data:
At this point, we can relate the parameters that write POST to what prelogin gets.
Parameter: su
This looks like an "encrypted" username, that is, the user name. So how is it encrypted? The browser is running JavaScript, so we guess it is encrypted through JS, so which part of JS is it? Seeing that the parameter client=ssologin.js (v1.4.19) is given in the login.php path above, let's go to ssologin.js and select the request to load this js file. The JS code will be displayed under the "Contents" tag, and press Ctrl+F to find username:
Charles weibo login su
Sure enough, it is actually encoded in base64, which is not encrypted, so we have a way to get su:
Def encrypt_user (self, username): user = urllib.parse.quote (username) su = base64.b64encode (user.encode ()) return su
Parameter: sp
In the same way as su, we look for password in ssologin.js, and we find an algorithm for encrypting password:
Image
So there is a way to get sp:
Def encrypt_passwd (self, passwd, pubkey, servertime, nonce): key = rsa.PublicKey (int (pubkey, 16), int ('10001mm, 16)) message = str (servertime) +'\ t' + str (nonce) +'\ n' + str (passwd) passwd = rsa.encrypt (message.encode ('utf-8'), key) return binascii.b2a_hex (passwd)
Parameter: prelt
Since ssologin.js is in charge of login, let's find prelt,Ctrl+F here.
Request.prelt = preloginTime
Principle prelt is the abbreviation of preloginTime, so let's search for preloginTime:
PreloginTime = (new Date ()) .getTime ()-preloginTimeStart-(parseInt (result.exectime, 10) | | 0)
The preloginTimeStart here is the timestamp of the request prelogin.php, and the result.exectime is the exectime in the returned result of the prelogin request.
Ha, ha,
Def get_prelt (self, pre_login): prelt = int (time.time () * 1000)-pre_login ['preloginTimeStart']-pre_login [' exectime'] return prelt
Now that we have obtained the important parameters of login, let's take a look at the process of login request. Enter login in "Filter" of "Sequence", and we can see the filtered request, in which the first three are the login sequence:
The detailed process is as follows:
Prelogin gets some parameters from the server
POST the encrypted user name, password and other parameters to https://login.sina.com.cn/sso/login.php?client=ssologin.js(v1.4.19)
Step 2 returns the html code, which redirects to another url in the html code (so we also need to implement this redirection in the code)
Step 3 returns the html code, which first implements several cross-domain settings through JS, and finally redirects to another url (we also need to implement this part of the operation)
Redirecting to another URL,request in the HTTP header returned in step 4 will follow the redirection, which we don't have to implement.
The way to implement JS redirection in html code with Python is to extract the redirect URL from JS code with regular expressions, and then use requests to make GET requests.
The code for the complete login process is:
Def login (self): # step-1\. Prelogin pre_login = self.prelogin () su = self.encrypt_user (self.weibo_user) sp = self.encrypt_passwd (self.weibo_password, pre_login ['pubkey'], pre_login [' servertime'] Pre_login ['nonce']) prelt = self.get_prelt (pre_login) data = {' entry': 'weibo',' gateway': 1, 'from':', 'savestate': 7,' qrcode_flag': 'false',' userticket': 1 'pagerefer':', 'vsnf': 1,' su': su, 'service':' miniblog', 'servertime': pre_login [' servertime'], 'nonce': pre_login [' nonce'], 'vsnf': 1,' pwencode': 'rsa2',' sp': sp 'rsakv': pre_login [' rsakv'], 'encoding':' UTF-8', 'prelt': prelt,' sr': "1280 '800",' url': 'http://weibo.com/ajaxlogin.php?framelogin=1&callback=parent.' 'sinaSSOController.feedBackUrlCallBack',' returntype': 'META'} # step-2 login POST login_url =' https://login.sina.com.cn/sso/login.php?client=ssologin.js(v1.4.19)' resp = self.session.post (login_url, data=data) print (resp.headers) print (resp.content) print ('Step-2 response:' Resp.text) # step-3 follow redirect redirect_url = re.findall (r'location\ .replace\ ("(. *?)", resp.text) [0] print ('Step-3 to redirect:', redirect_url) resp = self.session.get (redirect_url) print (' Step-3 response:') Resp.text) # step-4 process step-3's response arrURL = re.findall (r'"arrURL": (. *?)\}', resp.text) [0] arrURL = json.loads (arrURL) print ('CrossDomainUrl:', arrURL) for url in arrURL: print (' set CrossDomainUrl:' Url) resp_cross = self.session.get (url) print (resp_cross.text) redirect_url = re.findall (r'location\ .replace\ (\'(. *?)\', resp.text) [0] print ('Step-4 redirect_url:', redirect_url) resp = self.session.get (redirect_url) print (resp.text) with open (self.cookies_tosave) 'wb') as f: pickle.dump (self.session.cookies, f) return True
A lot of information is printed in the code to facilitate the whole login process.
It's easy to test our implementation:
If _ _ name__ = ='_ main__': weibo_user = 'your-weibo-username' weibo_password =' your-weibo-password' wb = WeiboLogin (weibo_user, weibo_password) wb.login ()
Change it to your Weibo account and password and you can test it.
Thank you for reading, the above is the content of "how to use Charles and requests to simulate Weibo login". After the study of this article, I believe you have a deeper understanding of how to use Charles and requests to simulate Weibo login, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.