Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

The solution of crawling website garbled by requests Library in Python

2025-01-20 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/02 Report--

Python requests library crawling site garbled solution, many novices are not very clear about this, in order to help you solve this problem, the following small series will be explained in detail for everyone, there are people who need this can come to learn, I hope you can harvest.

[Written in front]

When crawling my CSDN personal blog (https://blog.csdn.net/yuzipeng) with the requests library, I found a garbled error (\xe4\xb8\xb0\xe5\xaf\x8c\xe7\x9), as shown in the following figure:

Online search for a number of methods, thought it was encountered encryption site processing. Later found that through F12 is still able to obtain the elements of the web page, then what can be done to avoid garbled code problem? The answer is: Selenium.

[The effect is as follows]

[Sample Code]

#coding = utf-8#@Auther:"Peng Ge Thief Excellent"#@Date: 2019/10/16#@Software: PyCharm

from selenium import webdriver

url ='https://blog.csdn.net/yuzipeng' driver = webdriver. Chrome ("F:\\Python\\chromeddriver. exe ") driver. get (url) urls = driver. find_elements_by_xpath ('//div [@class ="article-item-box csdn-tracking-statistics"$>') blogurl =['https:/blog.csdn.net/yuzipeng/article/details/'+ url. get_attribute ('data-articleid') for url in urls] titles = driver. find_elements_by_xpath ('//div [@class ="article-item-box csdn-tracking-statistics"]/h5/a') blogtitle =[title. text for title in titles] myblog ={k: v for k, v in zip (blogtitle, blogurl)} for k, v in myblog. items (): print(k,v)driver.close()

[Knowledge Points]

1. Selenium use

Basic selenium installation method, use method

(https://blog.csdn.net/yuzipeng/article/details/100179696)

2. Use of derivation

(1) List derivations: [expression for variables in list] or [expression for variables in list if condition]

Similar to this, you can condense the multiline program of the for loop into a single sentence of code, such as

blogtitle = [title.text for title in titles]

If you write with a for loop, you need to do this:

blogtitle = []for title in titles: blogtitle.append(title)

(2) Dictionary derivation: {key expression: value expression for value in collection if condition}

This is usually used when key and value can be converted to each other; however, if key and value are completely different lists, you need to use zip to integrate key/value.

myblog = {k:v for k,v in zip(blogtitle,blogurl)}

If you are not familiar with the zip function, you can use the following example to introduce it.

a =['a',' b','c'] b =[1, 2, 3] c ={k: v for k, v in zip (a, b)} print (c) The result is: {'a': 1,' b': 2,'c': 3} Is it helpful to read the above? If you still want to have further understanding of related knowledge or read more related articles, please pay attention to the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report