In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly explains "how to use jsoup". Friends who are interested might as well take a look. The method introduced in this paper is simple, fast and practical. Now let the editor take you to learn how to use jsoup.
Jsoup is a HTML parser for Java, which can parse the desired data from html. It is a necessary tool for writing crawlers with java.
Daily increment and total data are switched at will
Number of readings, fans, comments, likes, total ranking, weekly ranking. Random selection
Choose the time interval at will
Since I have this tool, I have a panoramic view of all the data on my blog. I still have a sense of achievement and happy watching the changes of this data every day.
How to do it?
After the show, you should tell everyone how to do it. First of all, you have to have a host that can perform scheduled tasks, a cloud host or a host in your bedroom, and then you have to have a database. As for the overall function, it is actually a simple addition, deletion, modification and search. Oh no, only add and check, but not delete. For data display, I use Ant Financial Services Group's open source visualization library antv G2. I use 3.8bug, which is not recommended, but highchart is recommended.
I think the more complicated part is the part of html data parsing, which I will tell you directly later. The second is the storage and query of the database. I built a web service with spring-boot. I used spring-boot-starter-quartz to write the scheduled task at 11:55 every night, and used mybatis-spring-boot-starter to read and write the database.
Html parsing code, you need to understand the html layout of the csdn blog page, and then gradually debug the data. Of course, as soon as the csdn is officially revised, the code cannot be executed. Fortunately, the frequency of this lethal revision will not be particularly high. I have encountered 2-3 times in more than half a year. The code is as follows, you can use it directly and replace url with your own blog url.
Public class CommonUtils {private static Logger log = LoggerFactory.getLogger (CommonUtils.class); private static Map headers; static {headers = new HashMap (); headers.put ("referer", "https://www.google.com/"); headers.put (" User-Agent "," Mozilla/5.0 (Macintosh) ") Intel Mac OS X 10: 15: 6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0 "+" .4183.83 Safari/537.36 ");} public static BlogInfoDao getBlogInfo () {int retry = 3; while (--retry > 0) {try {BlogInfoDao blogInfoDao = new BlogInfoDao (); blogInfoDao.setDate (new Date ()) Document doc = Jsoup.connect ("https://blog.csdn.net/xindoo").headers(headers).get(); Element blogElement = doc.getElementsByClass (" data-info d-flex item-tiling ") .get (0); / / number of articles int articleCnt = Integer.parseInt (blogElement.getElementsByTag (" dl ") .get (0) .attr (" title ")) BlogInfoDao.setArticleCnt (articleCnt); / / Weekly ranking int wranking = Integer.parseInt (blogElement.getElementsByTag ("dl"). Get (1). Attr ("title"); blogInfoDao.setWranking (wranking); / / Total ranking int ranking = Integer.parseInt (blogElement.getElementsByTag ("dl"). Get (2). Attr ("title")) BlogInfoDao.setRanking (ranking); / / Total readings int viewCnt = Integer.parseInt (blogElement.getElementsByTag ("dl"). Get (3). Attr ("title")); blogInfoDao.setViewCnt (viewCnt); blogElement = doc.getElementsByClass ("data-info d-flex item-tiling"). Get (1) / / Total score int scoreCnt = Integer.parseInt (blogElement.getElementsByTag ("dl"). Get (0). Attr ("title")); blogInfoDao.setScore (scoreCnt); / / number of fans int fansCnt = Integer.parseInt (blogElement.getElementsByTag ("dl"). Get (1). Attr ("title")) BlogInfoDao.setFansCnt (fansCnt); / / likes int likeCnt = Integer.parseInt (blogElement.getElementsByTag ("dl"). Get (2). Attr ("title")); blogInfoDao.setLikeCnt (likeCnt); / / comments int commentCnt = Integer.parseInt (blogElement.getElementsByTag ("dl"). Get (3). Attr ("title")) BlogInfoDao.setCommentCnt (commentCnt); / / Collection int collectCnt = Integer.parseInt (blogElement.getElementsByTag ("dl"). Get (4). Attr ("title")); blogInfoDao.setCollectCnt (collectCnt); return blogInfoDao;} catch (Exception e) {log.error ("get bloginfo error, {}", e) }} return null;}}
BlogInfoDao is a class that I encapsulated to interact with the database. If there is no content, it will no longer be posted here.
At this point, I believe you have a deeper understanding of "how to use jsoup". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.