Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to use python to crawl an East comment

2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

Today, the editor will share with you how to use python to crawl some East comments about the relevant knowledge points, the content is detailed, logic is clear, I believe most people still know too much about this knowledge, so share this article for your reference, I hope you can learn something after reading this article, let's take a look at it.

Import requestsimport jsonimport osimport timeimport randomimport jiebafrom wordcloud import WordCloudfrom imageio import imread

Comment_file_path = 'jd_comments.txt'

Def get_spider_comments (page = 0):

# crawl a comment url = 'https://sclub.jd.com/comment/productPageComments.action?callback=fetchJSON_comment98vv7990&productId=1070129528&score=0&sortType=5&page=%s&pageSize=10&isShadowSku=0&rid=0&fold=1'%page headers = {

'user-agent':'Mozilla/5.0 (Windows NT 10.0 WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36', 'referer':' https://item.jd.com/1070129528.html'} try: response = requests.get (url) Headers = headers) except: print ("something wrong!") # get the dataset in json format comments_json = response.text[ 26:-2] # convert the json dataset into a json object comments_json_obj = json.loads (comments_json) # get all the content in comments comments_all = comments_json_obj ['comments'] # get the content for of the comment content in comments Comment in comments_all: with open (comment_file_path As fin: fin.write (comment ['content'] +'\ n') print (comment ['content'])

Def batch_spider_comments (): # clear the file if os.path.exists (comment_file_path): os.remove (comment_file_path) for i in range (100): print ('crawling' + str (iTun1) + 'page data before each write.') Get_spider_comments (I) time.sleep (random.random () * 5)

Def cut_word (): with open (comment_file_path,encoding='utf-8') as file: comment_text = file.read () wordlist = jieba.lcut_for_search (comment_text) new_wordlist = '.join (wordlist) return new_wordlist

Def create_word_cloud (): mask = imread ('ball.jpg') wordcloud = WordCloud (font_path='msyh.ttc',mask = mask). Generate (cut_word ()) wordcloud.to_file (' picture.png')

If _ _ name__ = ='_ _ main__': create_word_cloud () above is all the content of the article "how to crawl a comment on something with python". Thank you for reading! I believe you will gain a lot after reading this article. The editor will update different knowledge for you every day. If you want to learn more knowledge, please pay attention to the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report