In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-20 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/02 Report--
Python requests library crawling site garbled solution, many novices are not very clear about this, in order to help you solve this problem, the following small series will be explained in detail for everyone, there are people who need this can come to learn, I hope you can harvest.
[Written in front]
When crawling my CSDN personal blog (https://blog.csdn.net/yuzipeng) with the requests library, I found a garbled error (\xe4\xb8\xb0\xe5\xaf\x8c\xe7\x9), as shown in the following figure:
Online search for a number of methods, thought it was encountered encryption site processing. Later found that through F12 is still able to obtain the elements of the web page, then what can be done to avoid garbled code problem? The answer is: Selenium.
[The effect is as follows]
[Sample Code]
#coding = utf-8#@Auther:"Peng Ge Thief Excellent"#@Date: 2019/10/16#@Software: PyCharm
from selenium import webdriver
url ='https://blog.csdn.net/yuzipeng' driver = webdriver. Chrome ("F:\\Python\\chromeddriver. exe ") driver. get (url) urls = driver. find_elements_by_xpath ('//div [@class ="article-item-box csdn-tracking-statistics"$>') blogurl =['https:/blog.csdn.net/yuzipeng/article/details/'+ url. get_attribute ('data-articleid') for url in urls] titles = driver. find_elements_by_xpath ('//div [@class ="article-item-box csdn-tracking-statistics"]/h5/a') blogtitle =[title. text for title in titles] myblog ={k: v for k, v in zip (blogtitle, blogurl)} for k, v in myblog. items (): print(k,v)driver.close()
[Knowledge Points]
1. Selenium use
Basic selenium installation method, use method
(https://blog.csdn.net/yuzipeng/article/details/100179696)
2. Use of derivation
(1) List derivations: [expression for variables in list] or [expression for variables in list if condition]
Similar to this, you can condense the multiline program of the for loop into a single sentence of code, such as
blogtitle = [title.text for title in titles]
If you write with a for loop, you need to do this:
blogtitle = []for title in titles: blogtitle.append(title)
(2) Dictionary derivation: {key expression: value expression for value in collection if condition}
This is usually used when key and value can be converted to each other; however, if key and value are completely different lists, you need to use zip to integrate key/value.
myblog = {k:v for k,v in zip(blogtitle,blogurl)}
If you are not familiar with the zip function, you can use the following example to introduce it.
a =['a',' b','c'] b =[1, 2, 3] c ={k: v for k, v in zip (a, b)} print (c) The result is: {'a': 1,' b': 2,'c': 3} Is it helpful to read the above? If you still want to have further understanding of related knowledge or read more related articles, please pay attention to the industry information channel, thank you for your support.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.