Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to use R language to capture web page pictures

2025-02-25 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

This article shows you how to use R language to capture web page pictures, the content is concise and easy to understand, it can definitely brighten your eyes. I hope you can get something through the detailed introduction of this article.

What I want to climb today is a multi-map Zhihu web page, which is an external shooting post, which introduces a lot of external shooting skills, very practical practical information.

Library (rvest)

Library (downloader)

Library (stringr)

Library (dplyr)

Https://www.zhihu.com/question/19647535

After opening the page, locate any picture in the post content, then right-click-check the element (Ctrl+Shift+I), the page structure that pops up on the right side of the page will automatically navigate to the address of the picture, and you will see the name label of the picture in the html structure:-- (img); address label-- (src).

What we want to get is the address information of the image. You can try to download a single image using the downlond function.

Url%html_attr ("src")

What we need to get is the src content (that is, the image address) under the img tag in the div branch structure where the picture is located. Then if you don't want to capture a lot of irrelevant images, you must know where the target image is stored. The above code process locates the url (the URL of the post page) to the div branch structure where the target image is located. Then locate the src information (that is, the target image URL) in the img (picture tag) in the branch structure.

Run the above two lines of code and preview the first few lines of the link vector with the head function to see if the image address you got is correct.

Unfortunately, in the string vector that we obtained to store the image address information, there is an invalid URL on every other line. If we do not clear these invalid URLs or filter out those complete URLs, the download function executes to the invalid URL will be terminated, and the download process will fail.

Here you need to use the stringr package for conditional filtering.

Pat = "https"

Link

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report