In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-05 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
This article mainly explains "what are the high-frequency interview questions of Python". The content of the explanation is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn "what are the high-frequency interview questions of Python"?
one。 How to improve crawling efficiency?
The slow download of the crawler is mainly due to the blocking of requests to the website and the return of the website
1, using asynchronous and multithreading to expand the cpu utilization of the computer; 2, adopting message queue mode 3 to increase the bandwidth
two。 Tell me what is the reptile protocol?
Robots protocol (also known as crawler protocol, crawler rules, robot protocol, etc.), that is, robots.txt, website uses robots protocol to tell search engines which pages can be crawled and which pages can not be crawled.
Robots protocol is a common code of ethics in the Internet community of websites. Its purpose is to protect website data and sensitive information, and to ensure that users' personal information and privacy are not violated. Because it is not an order, it needs to be followed by the search engine consciously.
three。 What if the other side's website is anti-crawling and IP blocked?
Slow down the grasping speed and reduce the pressure on the target website, but this will reduce the amount of data fetching per unit of time.
Use proxy IP (free may be unstable, fee may not be cost-effective)
four。 There is a file file in jsonline format
Def get_lines (): with open ('file.txt','rb') as f: return f.readlines () if _ _ name__ = =' _ main__': for e in get_lines (): process (e) # processes each row of data
Now you have to deal with a file with a size of 10 gigabytes, but there is only 4 gigabytes of memory, so what if you only modify the get_lines function while the other code remains the same? What are the issues that need to be considered?
Def get_lines (): with open ('file.txt','rb') as f: for i in f: yield I
Methods provided by Pandaaaa906
From mmap import mmapdef get_lines (fp): with open (fp, "r +") as f: M = mmap (f.fileno (), 0) tmp = 0 for I, char in enumerate (m): if char==b "\ n": yield m [tmp: iTun1] .decode () tmp = i+1if _ name__== "_ main__": for i in get_lines ("fp_some_huge_file"): print (I)
The problems to be considered are: only 4G of memory can not be read into 10G files at one time, so you need to read the data in batches to record the location of each read. If the size of each read in batches is too small, it will take too much time in the read operation.
five。 Supplement the missing code
Def print_directory_contents (sPath): "" this function takes the name of the folder and returns as an input parameter the path to the files in the folder and the path to which it contains the files in the folder "" import osfor s_child in os.listdir (s_path): s_child_path = os.path.join (s_path, s_child) if os.path.isdir (s_child_path): print_directory_contents (s_child_path) else: print (s_child_path)
six。 Enter the date to judge the day of the year.
Import datetimedef dayofyear (): year= input ("Please enter year:") month= input ("Please enter month:") day= input ("Please enter day:") date1 = datetime.date (year=int (year), month=int (month), day=int (day)) date2 = datetime.date (year=int (year), month=1,day=1) return (date1-date2). Days+1
seven。 Disrupt a sorted list object, alist?
Import randomalist = [1 alist 2 3 4 5] random.shuffle (alist) print (alist)
eight。 The existing dictionary d = {'axiaxianghuo 24thecontrolling gregarial52magnolicalization12recoverykanglus33} Please sort by value value?
Sorted (d.items (), key=lambda XRX [1])
nine。 Dictionary derivation
D = {key:value for (key,value) in iterable}
ten。 Please reverse the string "aStr"?
Print ("aStr" [:-1]) Thank you for your reading. The above is the content of "what are the high-frequency interview questions of Python?" after the study of this article, I believe you have a deeper understanding of what the high-frequency interview questions of Python have, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.