In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
Many novices are not very clear about how to clean up the invalid websites in Python favorites. In order to help you solve this problem, the following editor will explain it in detail. People with this need can come and learn. I hope you can get something.
Invalid bookmarks
When we browse the website, we come across something new from time to time (you know. Jpg), so we silently click on a collection or bookmark. However, when we are faced with hundreds of bookmarks and favorites, we always have a headache.
In particular, the programming blog, which was being updated yesterday, is dead today and will never be updated. Or the movie website that I watched yesterday, which is directly 404 today. There are so many invalid pages that every time I open them, I know that they are invalid and need to be deleted manually. Can this be something a programmer does?
However, both Google browsers and domestic browsers can only provide a backup service for favorites, so we can only start with Python.
Favorite file formats supported by Python
There is little support for favorites, mainly because favorites are hidden in browsers, so we can only export htm files manually for management.
The content is relatively simple, I do not know much about the front end, but also can clearly see the tree structure and internal logic.
Fixed format URL fixed format page name fixed format
It's easy to think of a regular match, where there are two substrings. Extract it and visit it one by one to see which one is invalid, delete it, and you can get the cleaned favorites.
Read favorites files
Path= "C:\\ Users\\ XU\ Desktop" fname= "bookmarks.html" os.chdir (path) bookmarks_f=open (fname, "r +", encoding='UTF-8') booklists=bookmarks_f.readlines () bookmarks_f.close ()
Because of the unfamiliarity with the front end, this exported favorites can be abstractly divided into
Structure code
Key code for saving web page bookmarks
Among them, we can not move the structure code, we want to keep it intact, while to save the key code of web bookmarks, we need to extract the content and judge to retain and delete.
So here we use the readlines function to read each line and judge separately.
Regular matching
Pattern=r'HREF= "(. *?)". *? > (. *?) 'whilelen (booklists) > 0:bookmark=booklists.pop (0) detail=re.search (pattern,bookmark)
If it is the key code: the extracted substring is in detail.group (1) and detail.group (2).
But if it is the structure code: detail = = None
Visit the page
Importrequestsr=requests.get (detail.group (1), timeout=500)
After trying to write the code, we find that there are four situations.
R.status_code = = requests.codes.ok
R.status_code==404
R.statusdisabled codewords 404 & & cannot be accessed (may be a shielded crawler, it is recommended to keep it)
Requests.exceptions.ConnectionError
Similar Zhihu and simplified books are basically anti-climbing, so simple get can not be accessed effectively, and the details are not worth the trouble, just keep them directly. For error, just use try to throw an exception, otherwise the program will stop running.
After adding logic:
Whilelen (booklists) > 0:bookmark=booklists.pop (0) detail=re.search (pattern,bookmark) ifdetail:#print (detail.group (1) + "- -" + detail.group (2)) try:# access r=requests.get (detail.group (1)) Timeout=500) # add ifr.status_code==requests.codes.ok:new_lists.append (bookmark) print ("ok- reservation:" + detail.group (1) + "" + detail.group (2)) else:if (r.status_code==404): print ("inaccessible deletion:" + detail.group (1) + "" + detail.group (2) + 'error code' + str (r.status_code)) else:print ("reserved for other reasons : "+ detail.group (1) +" + detail.group (2) + 'error code' + str (r.status_code)) new_lists.append (bookmark) except:print ("inaccessible deletion:" + detail.group (1) + "" + detail.group (2)) # new_lists.append (bookmark) else:# does not match the structural statement new_lists.append (bookmark)
Program execution
Export htm
Bookmarks_f=open ('new_'+fname, "w +", encoding='UTF-8') bookmarks_f.writelines (new_lists) bookmarks_f.close ()
Import browser
Actually applied to my browser
It is true that many movie networks have failed, and it is possible to clean up inaccessible bookmarks with one click through Python. Life is short, Python can really make life more efficient.
Is it helpful for you to read the above content? If you want to know more about the relevant knowledge or read more related articles, please follow the industry information channel, thank you for your support.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.