Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to use python to crawl the data of each issue of Today's statement

2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/01 Report--

Today, the editor will share with you how to use python to crawl the relevant knowledge points of each issue of today's data. The content is detailed and the logic is clear. I believe most people still know too much about this knowledge, so share this article for your reference. I hope you can get something after reading this article. Let's take a look.

The code import xlwtimport reimport requests# url = "https://tv.cctv.com/lm/jrsf/index.shtml"def get_data (page): url = 'https://api.cntv.cn/NewVideo/getVideoListByColumn?id=TOPC145146466500891'\' 4roomnsortSsortSequenceFor {pageNo} & mode=0&serviceId=tvcctv&cb=Callback'.format (pageNo=page) headers = {'user-agent':' Mozilla/5.0 (Windows NT 10.0; Win64)" X64) AppleWebKit/537.36''(KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36'} response = requests.get (url=url, headers=headers) return response.text # print (response.text) if _ _ name__ = "_ _ main__": headers= {'user-agent':' Mozilla/5.0 (Windows NT 10.0; Win64) X64) AppleWebKit/537.36''(KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36'} book = xlwt.Workbook (encoding='utf-8', style_compression=0) sheet = book.add_sheet Cell_overwrite_ok=True) count = 0 for page in range (1Power5): page_content = get_data (page) obj = re.compile (r'url ":" (.*? .shtml)', re.S) imgUrl = re.findall (obj, page_content.replace ('\\',') for i in range (len (imgUrl)): resp = requests.get (url= imgUrl [I] Headers=headers) resp.encoding = 'utf-8' obj2 = re.compile (r' update time: (. *?)

', re.S) time = re.findall (obj2, resp.text) obj3 = re.compile (r' video introduction: (. *?)

', re.S) jianjie = re.findall (obj3, resp.text) content = [] content.append (time) content.append (jianjie) for j in range (2): sheet.write (count, j, content [j]) count+=1 book.save (". / data_5.xls") experimental results

These are all the contents of this article entitled "how to use python to crawl the data of each issue of Today's statement". Thank you for reading! I believe you will gain a lot after reading this article. The editor will update different knowledge for you every day. If you want to learn more knowledge, please pay attention to the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report