Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to use RCurl package in R language

2025-04-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)05/31 Report--

This article mainly introduces the relevant knowledge of "how to use RCurl package in R language". Xiaobian shows you the operation process through actual cases. The operation method is simple, fast and practical. I hope this article "how to use RCurl package in R language" can help you solve the problem.

During this period of time, I have been studying the data capture related packages in R, and I can discover some surprises from time to time.

Today, for example, I found a parsing package with its own requester, but also embedded pantomjs headless browser, so that you don't have to be stupid to install a selenium driver, and you don't have to rely on any requester (RCurl or httr) package to automatically parse HTML documents with js scripts.

Hearing is false, seeing is believing, still remember the section explaining table data capture before, encountered weather data table, the data inside can not get, some difficult. I had to use RSelenium to call plantomjs to solve it, but!

This package reduces all tasks to one sentence!

library("rvest")URL% xml2::url_escape(reserved ="][!$& '()*+,;=:/?@# ")

You can try to use the ordinary request method can successfully obtain the table inside (if successful, I lose!!!)

Use the RCurl package request!

header% readHTMLTable(header=TRUE)$`NULL`NULL

How much hatred is this ~_~

Try using rvest:

mytable % read_html(encoding ="UTF-8") %>% html_table(header=TRUE) %>% `[`(1)[1] Month AQI range Quality class PM2.5 PM10 SO2 CO NO2 O3

En, right, I improved this time, I got the header, but what's the use of this!!!

Using selenium to drive browsers

#java -jar D:/selenium-server-standalone-3.3.1.jar

#system("java -jar D:/selenium-server-standalone-3.3.1.jar",intern=FALSE)

start_session(root="http://localhost:4444/wd/hub/",browser ="phantomjs") post.url(URL)mytable% stri_conv(from="utf-8") %>% read_html() %>% html_table()quit_session()

I finally got it this time! Count the number of words written!

Is there a faster way, of course!

Next, open your eyes wide and look at this artifact!

Using rdom packages:

stopifnot(Sys.which("phantomjs") != "")

#The above code checks whether the system path contains phantomjs browser

#If you haven't downloaded phantomjs browser or downloaded but haven't added system path,

#Remember to operate again, otherwise the function will not work! devtools::install_github("cpsievert/rdom")

#Install rdom package (if timeout is always prompted, remember to load curl package)

library("rdom")

tbl % readHTMLTable(header=TRUE) %>% `[[`(1)

Did he see clearly what was going on? No, it was already done. There was really only one line of code!

Rdom called plantomjs browser in the background to render the entire html target document (including all the js dynamic scripts in the script tag), so the readHTMLTable function has a chance to extract the table inside (and this process, ordinary requesters such as RCurl or httr do not have permission to do it!), It's so boring!

The following sentence is just a slight fix to the code!

names(tbl) % stri_conv(from="utf-8")DT::datatable(tbl)

rdom is a very small package, but its design concept is a bit against the sky, the whole package has only one function--rdom, and the package name is the same, it only has one job, that is, according to the real browser rendering HTML document mode to render the entire HTML document. Plantomjs is called in the background to handle the rendering process, after which you are free to use other efficient shortcut functions in R for element extraction.

About "how to use RCurl package in R language" content introduced here, thank you for reading. If you want to know more about industry-related knowledge, you can pay attention to the industry information channel. Xiaobian will update different knowledge points for you every day.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 233

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report