How does the Python crawler bypass the login page 07/06 Update SLTechnology News&Howtos

How does the Python crawler bypass the login page

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

This article mainly explains the "Python crawler how to bypass the login page", the article explains the content is simple and clear, easy to learn and understand, the following please follow the editor's ideas slowly in depth, together to study and learn "Python crawler how to bypass the login page" bar!

Preface

Many times we need to use the selenium library when we do Python crawlers or automated testing. We often get stuck when logging in, and login CAPTCHA is the biggest headache, especially today's text CAPTCHA and graphic CAPTCHA. Text and graphics CAPTCHA also add interference lines, this article will talk about how to bypass the login page.

Verify the login page, such as the following graphic CAPTCHA.

And the 12306 graphic CAPTCHA that we have basically seen.

Bypass login method

There are basically two ways to bypass login. The first method is to check the cookie of the website after login, and take the cookie with you when you request url. The second method is to start the browser with all the information of the browser, including the added bookmarks and the cookie information of visiting the web page.

The first cookie method we need to analyze the cookie value of other people's websites, find out the corresponding values and then add them. For websites we are not familiar with, they may also do encryption or dynamic processing, so some websites are not so easy to operate. If the website of our company needs to be tested, we can ask the corresponding developer which cookie value is used independently, and just take it out and put it in the request.

Add cookie to bypass login

For example, it is more difficult for us to log in to our Baidu account, and it is more cumbersome to log in every time. We F12 open the page debugging tool and find the www.baidu.com file after logging in. In cookie, we find a lot of values, in which the value we are looking for is circled in the picture.

We add this cookie value when we visit the baidu link, which is the Baidu account after logging in directly.

Download browser driver

When we want selenium to start the browser, we need to download the corresponding driver file and put it in the root directory of the Python installation. For example, I will use Google Chrome browser and Firefox Firefox browser.

Download address of Google browser driver:

Http://chromedriver.storage.googleapis.com/index.html

Download address of Firefox browser driver:

Https://github.com/mozilla/geckodriver/releases/ launches Chrome browser to bypass login

Every time we open the browser to do the corresponding operation, the corresponding cache and cookie will be saved to the default path of the browser. Let's first check the path of the profile. Take chrome as an example, we enter chrome://version/ in the address bar.

The profile path in the picture is what we need. We remove the back\ Default and add "- user-data-dir=" in front of the path to piece together the path we want.

Profile_directory = r'--user-data-dir=C:\ Users\ xxx\ AppData\ Local\ Google\ Chrome\ User Data'

Next, we start the browser with the option to start the browser. It is important to note that you need to close all running chrome programs before running the code, otherwise an error will be reported. The full code is as follows.

After selenium Automation launches the browser, we will find that the bookmarks I saved before are complete at the top of the browser, and the baidu account is also logged in.

Launch a Firfox browser to bypass login

Firfox Firefox browsing can also start it in this way, with slightly different settings.

First of all, check the storage path of the configuration file, view the method: help-troubleshooting information-configure the folder, and copy the path inside.

Again, we put the path in the variable.

Profile_path = ringing C:\ Users\ guixianyang\ AppData\ Roaming\ Mozilla\ Firefox\ Profiles\ dvm6wqam.default'

We also log in to Baidu's account in Firefox, start the Firefox browser with configuration files automatically with selenium, and we will find that we have already started the plug-ins installed by the browser and the login Baidu account.

Websites that bypass graphic CAPTCHA

The first picture in the article is the graphic verification code when the brief book is logged in. After we log in the brief book (cookie has a certain time limit, it seems to have about 10 days and a half months), replace the link in the above code with the simplified book, and then use the above method to realize the graphic verification code that bypasses the login page.

For example, I directly open my brief book home page.

Https://www.jianshu.com/u/52353ffa8b86

The login state is also retained after the automation is started.

The login door of the website has been opened, and then you can do what you want to do, such as crawlers, automated test verification and so on.

Thank you for your reading, the above is the content of "Python crawler how to bypass the login page", after the study of this article, I believe you have a deeper understanding of how Python crawler bypasses the login page, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.