In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
This article introduces the relevant knowledge of "how to solve the picture saved as a TXT file in Python". In the operation of the actual case, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!
Third-party libraries:
Requests
Parsel
Pdfkit
Development environment:
Version: anaconda5.2.0 (python3.6.5)
Editor: pycharm
The code is as follows:
1. Import tool
Import pdfkitimport requestsimport parsel
two。 Request website
Headers = {"Host": "blog.csdn.net", "Referer": "https://blog.csdn.net/qq_41359265/article/details/102570971"," User-Agent ":" Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.90 Safari/537.36 ",}
3. Print label string
Html_str = "" Document {article}
4. User information
Cookie = {'Cookie':' uuid_tt_dd=10_6143182820-1560085972444-562851; Hm_ct_6bcd52f51e9b3dce32bec4a3997715ac=6525*1*10_6143182820-1560085972444-562851; Hm_ct_e5ef47b9f471504959267fd614d579cd=6525*1*10_6143182820-1560085972444-562851; Hm_ct_e5ef47b9f471504959267fd614d579cd=6525*1*10_6143182820-1560085972444-562851; Hm_ct_e5ef47b9f471504959267fd614d579cd=6525*1*10_6143182820-1560085972444-562851; Hm_ct_e5ef47b9f471504959267fd614d579cd=6525*1*10_6143182820-1560085972444-562851; Hm_ct_e5ef47b9f471504959267fd614d579cd=6525*1*10_6143182820-1560085972444-562851 Hm_lvt_62052699443da77047734994abbaed1b=1568382389,1568384316; Hm_lvt_26c6581897cb7113caba3941e5aa57b0=1567222806,1569331239; Hm_lvt_e5ef47b9f471504959267fd614d579cd=1569495260,1570722031; UserName=weixin_40327641; UserInfo=5efb72806ec7429fb885f8cf12233b54; UserToken=5efb72806ec7429fb885f8cf12233b54; UserNick=%E5%A1%AB%E5%9D%91%E5%B0%8F%E6%87%B5%E9%80%BC; AU=DA1; BT=1570886763298; 62052699443da77047734994abbaed1b3276411560085972444-562851; Hm_lvt_62052699443da77047734994abbaed1b=1568382389,1568384316; Hm_lvt_26c6581897cb7113caba3941e5aa57b0=1567222806,1569331239; Hm_lvt_e5ef47b9f471504959267fd614d579cd=1569495260,1570722031; UserName=weixin_40327641; UserInfo=5efb72806ec7429fb885f8cf12233b54; UserToken=5efb72806ec7429fb885f8cf12233b54; UserNick=%E5%A1%AB%E5%9D%91%E5%B0%8F%E6%87%B5%E9%80%BC; AU=DA1; BT=1570886763298; 6143182820-1560085972444-562851; notice=1; Hm_lvt_85a6e71063e38ed893de1d8b6a71f5fe=1570889956; Hmbread ctals 85a6e71063e38ed893de1d8b6a71f5feeds 57441Weixiny403276416525110101143182820-1560085972444-562851Hmbread ctals 85a6e71063e38ed893de1d8b6a71f5feeds 57441Weixiny403276416525110101143182820-1560085972444-562851acw_tc=2760823a15710394714692918e17ecbdca6dba528441074c2c8e1ad8ebea5e Announcement=%257B%2522announcementUrl%2522%253A%2522https%253A%252F%252Fblogdev.blog.csdn.net%252Farticle%252Fdetails%252F102605809%2522%252C%2522announcementCount%2522%253A1%252C%2522announcementExpire%2522%253A535744931%257D; firstDie=1; Hm_lvt_6bcd52f51e9b3dce32bec4a3997715ac=1571375632,1571376263,1571474096,1571481979; Hm_lvt_3fc28b5205f6aa5f3b16547ffddad367=1571481982; remove=true; Hm_lpvt_3fc28b5205f6aa5f3b16547ffddad367=1571481988; Humble ctters 3fc28b5205f6aa5f3b16547ffddad36757441Weixiny40327641652511010143182820-1560085972444-562851; acw_sc__v2=5dab061ff4d5b7f68cb6b4fdff578b2c8e4b0add; dc_tos=pzmgx6; Hm_lpvt_6bcd52f51e9b3dce32bec4a3997715ac=1571489323'}
5. Crawl article data and convert it to PDF format
Def get_html (url): # send a request (URL) # responder response = requests.get (url, headers=headers) Cookies=cookie) # text text (string) # encountered anti-scraping # print (response.text) "how to change HTML into PDF format" # extract part of the article sel = parsel.Selector (response.text) # css selector article = sel.css ('article'). Get () title = sel.css (' h2response.text text'). Get () print (title) Print (article) html = html_str.format (article=article) with open (f'{title} .html') Mode='w', encoding='utf-8') as f: f.write (html) # exe file storage path config = pdfkit.configuration (wkhtmltopdf='C:\\ Program Files\\ wkhtmltopdf\\ bin\\ wkhtmltopdf.exe') # change html into pdf file pdfkit.from_file (f'{title} .html', f'{title} .pdf', configuration=config) get_html ('https://blog.csdn.net/nosprings/article/details/102609296')) through pdfkit
Run the code:
This is the end of the content of "how to solve the picture saved as a TXT file in Python". Thank you for your reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.