Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How does Python clean up invalid websites in its favorites

2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/02 Report--

Many novices are not very clear about how to clean up the invalid websites in Python favorites. In order to help you solve this problem, the following editor will explain it in detail. People with this need can come and learn. I hope you can get something.

Invalid bookmarks

When we browse the website, we come across something new from time to time (you know. Jpg), so we silently click on a collection or bookmark. However, when we are faced with hundreds of bookmarks and favorites, we always have a headache.

In particular, the programming blog, which was being updated yesterday, is dead today and will never be updated. Or the movie website that I watched yesterday, which is directly 404 today. There are so many invalid pages that every time I open them, I know that they are invalid and need to be deleted manually. Can this be something a programmer does?

However, both Google browsers and domestic browsers can only provide a backup service for favorites, so we can only start with Python.

Favorite file formats supported by Python

There is little support for favorites, mainly because favorites are hidden in browsers, so we can only export htm files manually for management.

The content is relatively simple, I do not know much about the front end, but also can clearly see the tree structure and internal logic.

Fixed format URL fixed format page name fixed format

It's easy to think of a regular match, where there are two substrings. Extract it and visit it one by one to see which one is invalid, delete it, and you can get the cleaned favorites.

Read favorites files

Path= "C:\\ Users\\ XU\ Desktop" fname= "bookmarks.html" os.chdir (path) bookmarks_f=open (fname, "r +", encoding='UTF-8') booklists=bookmarks_f.readlines () bookmarks_f.close ()

Because of the unfamiliarity with the front end, this exported favorites can be abstractly divided into

Structure code

Key code for saving web page bookmarks

Among them, we can not move the structure code, we want to keep it intact, while to save the key code of web bookmarks, we need to extract the content and judge to retain and delete.

So here we use the readlines function to read each line and judge separately.

Regular matching

Pattern=r'HREF= "(. *?)". *? > (. *?) 'whilelen (booklists) > 0:bookmark=booklists.pop (0) detail=re.search (pattern,bookmark)

If it is the key code: the extracted substring is in detail.group (1) and detail.group (2).

But if it is the structure code: detail = = None

Visit the page

Importrequestsr=requests.get (detail.group (1), timeout=500)

After trying to write the code, we find that there are four situations.

R.status_code = = requests.codes.ok

R.status_code==404

R.statusdisabled codewords 404 & & cannot be accessed (may be a shielded crawler, it is recommended to keep it)

Requests.exceptions.ConnectionError

Similar Zhihu and simplified books are basically anti-climbing, so simple get can not be accessed effectively, and the details are not worth the trouble, just keep them directly. For error, just use try to throw an exception, otherwise the program will stop running.

After adding logic:

Whilelen (booklists) > 0:bookmark=booklists.pop (0) detail=re.search (pattern,bookmark) ifdetail:#print (detail.group (1) + "- -" + detail.group (2)) try:# access r=requests.get (detail.group (1)) Timeout=500) # add ifr.status_code==requests.codes.ok:new_lists.append (bookmark) print ("ok- reservation:" + detail.group (1) + "" + detail.group (2)) else:if (r.status_code==404): print ("inaccessible deletion:" + detail.group (1) + "" + detail.group (2) + 'error code' + str (r.status_code)) else:print ("reserved for other reasons : "+ detail.group (1) +" + detail.group (2) + 'error code' + str (r.status_code)) new_lists.append (bookmark) except:print ("inaccessible deletion:" + detail.group (1) + "" + detail.group (2)) # new_lists.append (bookmark) else:# does not match the structural statement new_lists.append (bookmark)

Program execution

Export htm

Bookmarks_f=open ('new_'+fname, "w +", encoding='UTF-8') bookmarks_f.writelines (new_lists) bookmarks_f.close ()

Import browser

Actually applied to my browser

It is true that many movie networks have failed, and it is possible to clean up inaccessible bookmarks with one click through Python. Life is short, Python can really make life more efficient.

Is it helpful for you to read the above content? If you want to know more about the relevant knowledge or read more related articles, please follow the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report