In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
This article mainly talks about "what are the problems of Python crawlers". Interested friends may wish to have a look. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn what are the problems with Python crawlers.
1. Is it easy for reptiles to find a job now?
If it was a year ago, I might have said that the job of the reptile is still very easy to find, but now it is not easy to find, one market is saturated, and the other is that the crawler is becoming more and more demanding. Now looking for reptiles requires you to have more than a year of practical work experience, and also requires a certain degree of anti-crawling ability.
two。 What is the average salary of a reptile?
In first-tier cities, the salary of a reptile in a year or so is more than 1W, and it will be no problem if you have a strong ability of 15K ~ 18K. For recent graduates, it is between 7K and 9K.
3. How do crawlers usually solve the encryption problem?
For the web side, the encryption algorithm is usually written in js code, so first of all you should know something about js language, or at least know the basic content of js. Secondly, find the corresponding js encryption code, and then find out the key functions. Debug the js code in node.js environment, and finally use execjs library to execute the debugged code in Python environment.
The second is to simulate the browser environment to directly obtain the rendered data, the most commonly used means is to use the Selenium framework. This approach is very convenient, of course, the corresponding disadvantage is very inefficient. But now there is a new framework to replace Selenium, that is, Puppeteer, which you can see as an asynchronous version of Selenium.
4. What kind of knowledge is needed to learn reptiles?
Three parts: 1 Python basic; 2 crawler foundation; 3 anti-crawling learning
The content of these three parts is the necessary knowledge to do reptiles, the mainstream language to do reptiles is to use Python, because Python has a very rich crawler library, can be directly used very convenient.
From the beginning to the whole stack, if you don't understand anything in the learning process, you can join my python zero basic system to learn and exchange Qiuqiu qun:784758,214, and share with you the current talent needs of Python enterprises and how to learn Python from zero, and what to learn. Related learning video materials and development tools are shared.
I personally summed up a universal formula for crawlers:
Crawler = network request + data parsing + data storage
These three parts correspond to the basis of the crawler, and any crawler program will save the contents of these three parts. Some complex crawlers just add something else on this basis.
A reptile engineer is as strong as his anti-crawling ability. Anti-crawling learning is the most difficult part for crawlers to learn, and this part of learning is mainly based on actual combat. When I have the opportunity, I will write a special article about it.
5. How to solve the problem of CAPTCHA
In general, there are two ways of thinking:
Forward cracking
For example, the common graphic verification code, you can first save the picture, and then use some picture and text recognition pictures to identify the corresponding content. For slider CAPTCHA, you can use the Selenium frame to calculate the distance of the gap, and then simulate dragging the slider with the mouse.
Reverse cracking
This involves the implementation logic of the CAPTCHA. You need to understand the logic of the CAPTCHA implementation of the other party, see what parameters are needed when sending CAPTCHA requests, and how these parameters are generated, simulate the request. Reverse cracking is a short-term labor-saving method, but the corresponding difficulty is very large.
Directly use the coding platform
As mentioned above, both methods are very time-consuming and labor-consuming, and once the anti-crawling strategy of the other person's website is updated, your code will fail. So if you can spend money to solve the problem, you just choose to use the coding platform directly.
6. Dry crawler, will you be in jail?
When it comes to personal sensitive information, grabbing more than 1K items constitutes an illegal and criminal act. Many reptiles belong to gray areas, as long as you are not too high-profile and too much, the other party will not pursue anything. Therefore, generally speaking, if we abide by the principles and keep a low profile, we will not get into the Bureau.
7. Where can I find a list of reptiles? I want to earn a phone bill.
Reptile's private work is not recommended to do, the income is low, but also very waste their own energy. Pay is not proportional to income.
8. How do you get your first job without reptile experience?
You can't find a job without crawler experience, but crawler experience doesn't mean you have to actually do a crawler job. As long as you have climbed any website yourself, you will have crawler experience. So if you want to find a job as a crawler, you must actually find some websites to crawl. Imitate other people's projects, try to write some crawler code yourself, and summarize the potholes encountered. After climbing a few more websites, you will have your own crawler experience, and it will be easy to find a job by brushing some interview questions at this time.
9. What are the prospects of various fields of Python now?
The most promising field of Python is the direction of AI artificial intelligence, followed by Python background, web front end, data analysis, and finally crawlers.
10. How to use Python to create a High Star Project
To provide you with two ideas:
Resource integration
For students whose skills are not very good, you can sort out all the practical information related to the Python field, such as Python classic books, Python algorithm Daquan, Python classic articles and so on. Do the most comprehensive resource collection project.
Develop practical projects
If your technical skills are very strong, then you should pay more attention to the pain points encountered in real life and develop a practical project for this pain point.
11. To what extent can I find a job by self-study?
When I first studied the crawler, I made a mind map of what the reptile needs to learn. if you learn all the contents of the mind map below, you can find a job.
twelve。 Crawler interview material
Later, we will update the latest Python crawler interview questions and Python crawler tutorial.
13. How to use Python to create after-sleep income
I can start a flash mob alone on this topic. I haven't spent a penny on my salary after working for so long. It is not being carried out here, waiting for my next flash activity to be shared with you.
14. Does the data need to be simply deduplicated in the process of crawling to determine whether the data meets the requirements?
Some de-duplication and format specifications of the data are based on your specific business needs. Generally speaking, the data crawled down by the crawler is deduplicated and then converted into a data format defined with other groups for others to use.
15. The main tasks of reptiles in their work
The daily work of the crawler is to crawl the data, and then to maintain the existing crawler code so that it can run normally.
16. Do you plan to switch to machine learning or data analysis or back-end development in the later stage of the crawler?
Crawlers are suitable for skills, but not for career development. So if you want to learn crawlers and want to live on them in the future, you must learn reverse, js cracking, distributed, and asynchronous. At a later stage, if you don't want to continue to learn reptiles, you should think about where you want to go in the future. Be sure to choose whether it is data analysis, back-end development, or machine learning if you are interested in it.
At this point, I believe you have a deeper understanding of "what are the problems of Python crawlers"? you might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.